Skip to content

Performance bottleneck in TeX-like math parsing #2343

@Omikhleia

Description

@Omikhleia

With heavily nested math constructs (e.g. nested mroots), a very (very) long time is spent in convertTexlike() (that just calls the LPEG/EPNF parser).

The slowdown is likely due to the heavy recursion logic in the LPEG/EPNG grammar overall. LPeg’s backtracking makes this cost grow exponentially with nesting depth --- the real bottleneck is likely the general deep recursion in "mathlist" constructs.

See discussion and examples in the description for #2342.

Image

The 7-level "nested roots" target scenario, where I commented external levels iteratively -- actually I stopped measuring a the 6th level, waiting 11mn or so for one single formula is enough for illustrating the issue... (and I run the tests 3 times for averaging these values...)

\math{
  \msqrt{1 +
    \mroot{2 +
      \mroot{3 +
        \mroot{4 +
          \mroot{5 +
            \mroot{6 +
              \mroot{7 +
                \mroot{A}{19}}{17}}{13}}{11}}{7}}{5}}{3}}
}

Of course, such scenarios belong to "torture tests" -- but I also had noticed earlier a performance impacts of heavily nested \left-\right constructs (more usual in "real life").... and this is hinting toward a bad parsing strategy.
SILE should reconsider its approach to TeX-like math parsing, replacing our LPEG-based parser with something more efficient, or at least use a two-stage process (basic tokenization with LPEG but without recursion, and intermediate tree reconstruction).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSoftware bug issuerefactorCode quality improvements

    Type

    Projects

    Status

    To do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions