WIP: Initial change to move bidi to core, get bidi working in hboxes #2278

mimbrown · 2025-05-03T12:59:22Z

The PR has a couple changes:

Moves most of the bidi functionality to the core folder (for lack of maybe a better place). Minimal tweaks to some bidi functions to take, for example, a direction instead of a typesetter, to reduce direct dependency on the typesetter.
Move the splitIntoRuns and reorder calls directly into typesetters/base.lua#boxUpNodes
Add the splitIntoRuns and reorder calls into typesetters/base.lua#makeHbox
Strip the bidi package down to be mostly just deprecated exports.

typesetters/base.lua

mimbrown · 2025-05-03T13:12:10Z

@alerque this does not yet include any test updates or anything like that, just the basic code change. The thing that's controversial is, I've also updated the output logic for hboxes, since I've found them to be wrong by default in RTL mode.

Given the content:

Testing, testing, \hbox{Test \font[family = "Al Tarikh", direction = "RTL"]{مفتاح معايير الويب!} another}

When the document is LTR

Current (RTL text is laid out backwards)

New

When the document is RTL

Current (RTL text is laid out backwards, content in the hbox starts from the left edge of the box)

New

Omikhleia · 2025-05-25T13:11:51Z

This does break a lot of tests, including unexpected ones.

mimbrown · 2025-06-13T23:08:08Z

Coming back to this after a while. @Omikhleia I've tried to visually inspect the difference between the current typesetter and the changes I've made, and can't make anything out. If I'm reading the diffs correct, it's looking like a lot of these tests are differences of fractions of points, but maybe I'm missing something. Do you have any idea what's going on?

Omikhleia · 2025-06-24T20:30:20Z

Do you have any idea what's going on?

Sorry for the delay @mimbrown I failed to see earlier that I was pinged here ^^

I retraced your steps, it turns out that with in the original code, self:shapeAllNodes() was invoked twice (once in the bidi logic, then in the regular typesetter) at different times (on different node lists).

I daresay it's really dubious... It might also explain that when I once tried to remove the whole bidi logic (e.g. in the old code, just commenting out the content of the enabling/disabling methods), the debug output also had some subtle differences of the same sort, for a document that had just French text (so I wouldn't have expected bidi to do something at all). My feeling is that the old code was wrong, despite being "active" all the time by default... Doh.

Omikhleia · 2025-06-24T21:40:51Z

it turns out that with in the original code, self:shapeAllNodes() was invoked twice (once in the bidi logic, then in the regular typesetter) at different times (on different node lists).

That is -- the old bidi code was:

invoking splitNodelistIntoBidiRuns much earlier
Then invoking shapeAllNodes (!)
Then call calling the original typesetter - which removed some "discardable" glues, did other fancy things, and invoked shapeAllNodes (thus again?)...

Your proposed code moved all that mess to the base typesetter, a nice move IMHO, but invokes splitNodelistIntoBidiRuns at a later stage (not illogically, IMHO), and there's only one shapeAllNodes then. To me, you might be in the right track here... but don't take this for granted, I don't understand a lot of what SILE does the way it does it, crippled with side-effects and unclear intents, even after a 4-year involvement as a user and casual contributor.

mimbrown · 2025-06-25T16:37:54Z

Thanks for looking at this! I see your point, yes I remember thinking that removing the double shapeAllNodes call was a plus for this PR, but I just put it back to see if that fixes a lot of the random test failures. I agree, it's troubling that the behavior would be different, it doesn't seem like it should be, but I'm not prepared to break all tests and stand by the new behavior as better!

Omikhleia · 2025-06-25T17:34:41Z

(...) but I just put it back to see if that fixes a lot of the random test failures.

Darn, all tests failed but for some font fetching issue in the CI. Let's invoke our great @alerque here :)

I agree, it's troubling that the behavior would be different, it doesn't seem like it should be, but I'm not prepared to break all tests and stand by the new behavior as better!

I'll add a dedicated comment on the code, but still, this indeed has heavy code smell... Well let's see if it does the trick and confirms the root cause when the CI is repaired.
Yet, personally I'd prefer SILE to address its issues even at the cost of breaking many tests.

If the decision is to keep these moved lines, my opinion is that it would need an extra in-code comment to note the code small and refer to this issue (or a new one), so people who maintain or investigate the code in the future are aware of the potential issue... But let's wait for a result, discussion and decision.

Omikhleia · 2025-06-25T17:43:53Z

typesetters/base.lua

      return {}
   end
+   nodelist = bidi.splitNodelistIntoBidiRuns(nodelist, self.frame:writingDirection())
+   self:shapeAllNodes(nodelist)


Re. the main discussion in the PR: yes 👍 having those two lines here is also what I was discussing. I arrived at the same conclusion and all other things equal (I haven't checked our code was identical as mine; when just starting from the old code and following the same steps as you), the test tests/image.sil gave the same result as before. That's the only one I checked though.

But the double shapeAllNodes (here and further below), or the reason why this splitting/shaping has to occur here (rather than after some discardable nodes or penalties are removed) are very unclear. I'd like to grasp why, and have proper in-code comments here... ideally ^^

I agree, I'll add the comment (if this works). Just want to validate that it does before going to all the trouble 🙃

Wait... from boxUpNodes(), we invoke breakIntoLines() which also invokes shapeAllNodes(). How many times does SILE ends up reshaping the same content (with a few glues and penalties added or removed)?
This is not even cost-less, with the dubious "inplace" boolean there, we browse, re-copy this list multiple times. Inefficient and weird.

🤷‍♂️ it definitely does look weird. If we figure out why this changes subtly when calling shapeAllNodes either twice or before/after discarding nodes (not sure which is the culprit yet), that would probably help us think through how it should be fixed.

Looking at the function, it seems like calling it multiple times shouldn't have any effect, unless calling shape() on a node can somehow return more unshaped nodes, which would be problematic. So I would guess that calling shapeAllNodes() after removing the discardable nodes is what is causing the difference?

My best guess for the difference in the logs is that it is introduced by some glues, and that the line ratio (affecting their shrinking/stretching) occurs to be slightly different -- but the amount is small and distributed on all glues, so in the end its just a few decimal points (but this would affect "Mx" lines in the logs in the way we observe).

Hmm... I wonder if the issue isn't what we put pack into the state.nodes or not, during all that chain of calls...

See:

function typesetter:breakIntoLines (nodelist, breakWidth) self:shapeAllNodes(nodelist) local breakpoints = SILE.linebreak:doBreak(nodelist, breakWidth) return self:breakpointsToLines(breakpoints) end

Then breakPointsToLines works on self.states.nodes, not the nodelist. Ouch. That's the big assumption. They might be the same or not and shapeAllNodes works "in place" "by default", so it all depends on what the nodelist originally was. It might have been self.states.nodes initially... but nodelist = bidi.splitNodelistIntoBidiRun returns a new nodelist, and do we put it back to self.states.nodes? Otherwise later at some point this will be inconsistent.

Frankly these user of the internal states and the multiple side-affecting methods are a mess to follow. As as wrote in another comment, the constant hesitation on what we work (putting things back and forth in the states) is a likely source of error. The code is quite a mess there.

When I see such things, I am frightened:

sile/core/break.lua

Line 713 in b3b17ad

SILE.typesetter.state.nodes = self.nodes -- Horrible breaking of separation of concerns here. :-(

Another cross-boundary side-effect here... Ideally linebreak:doBreak() should do that, but should return the nodelist it worked on...

I totally agree, there's some major refactoring that would be helpful. But my goal with this PR has been to be minimal, not expansive, with changes. Once @alerque can let us know what's causing the 404 errors in the test runs, we should be able to see if what I'm doing now is actually consistent with what was there before. Based on this, I think we should be able to keep the new shapeAllNodes call before the discarding of other nodes, remove the second (redundant) one, and most tests should pass fine. More expansive changes should be a different PR I think.

…idi_to_core

mimbrown · 2025-06-25T18:13:04Z

Yes I just pulled in the upstream, possibly in between me making this PR and now something in the test setup changed.

Omikhleia · 2025-06-25T19:51:43Z

typesetters/base.lua

   self.state.hmodeOnly = true
   SILE.process(content)

+   self.state.nodes = bidi.splitNodelistIntoBidiRuns(self.state.nodes, self.frame:writingDirection())


by the way, do we need to put that back into self.state.nodes? There's a lot of back-and-forth stuff of that kind, I know, in the code base, but it would be best to avoid it, and just use locals.

I'm mostly following what was there before, because (as discussed) I can't track through if this is necessary so I'm afraid to do anything different!

Omikhleia · 2025-09-06T18:56:22Z

typesetters/base.lua

+   for i = 1, #vboxes do
+      local v = vboxes[i]
+      if v.is_vbox then
+         v.nodes = bidi.reorder(v.nodes, self.frame:writingDirection())


Hello @mimbrown

As I am working on my own rewrite of a typesetter (here), I tried to port your proposed changes to it.

It made me realize there's an issue here, when the document uses "liners" (which I introduced earlier to SILE), i.e. for underlining, striking out, etc. portions of text that can be line broken.¹ The liner logic eventually wraps part of lines if liner boxes (responsible for applying the wanted effect), and these can be deeply nested...

So here, bidi re-ordering -- which as far as I can tell, expects a flat list of shaped nodes, comes far too late, and will not handle these liner boxes.

This raises a few questions:

Why is it done here so late on the final line vboxes (via another repeated loop on all lines), while it could perhaps have been done on each line individually before it's finalized (i.e. just before the liner reboxing logic applies).

But why is it done, actually, after line-breaking has occurred? Is that due to some interpretation with respect to the Knuth-Plass algorithm?

And what exactly is it doing? The bidi code is totally undocumented (and even has some code commented out, never a good sign), with opaque variables (what's rv !? why to we have to SU.compress it for holes²); and weird sequences³...

I'd totally be in favor of having bidi also made "standard" in my re-implementation, but it's a tricky piece of code, and I don't even know what to test, without knowing the languages :)

Footnotes

E.g. \underline{....} in SIL. ↩

That's the only use for this compress utility in all the SILE code base, if not mistaken... ↩

E.g. sequences of SU.flip_in_place / reverse_each_node, where the later browses the node list again to SU.flip_in_place internal hboxes, knowing their inner structure. Whether such code is legit is hard to say, for one not knowing the details. ↩

Linked to a 10-year old issue: #76

@Omikhleia I definitely don't have the answers to all the "why" questions. My goal with this PR was to change things as minimally as possible, and only remove/change things that I was logically positive shouldn't have an effect. BiDi is really painful... but, I am pretty sure, we can't reverse before the line-breaking happens. Here's an example of why:

Imagine a string of glyphs (' means the character is RTL):

ABC DEF A'B'C' D'E'F'

If we want to apply the BiDi algorithm, it becomes:

ABC DEF F'E'D' C'B'A'

Easy. But let's say we need a line break in the middle of the RTL text. In that case, we want the output to look like this:

ABC DEF C'B'A' F'E'D'

But if we reorder before breaking into lines, we'll end up with:

ABC DEF F'E'D' C'B'A'

Off the top of my head, I don't know why we couldn't do the reordering before the reboxing. Maybe we should! But here's why we can't do it before line-breaking.

Omikhleia · 2025-10-16T21:52:39Z

Linked #2268

Initial change to move bidi to core, get bidi working in hboxes

7494e7f

mimbrown requested a review from alerque as a code owner May 3, 2025 12:59

mimbrown changed the title ~~Initial change to move bidi to core, get bidi working in hboxes~~ WIP: Initial change to move bidi to core, get bidi working in hboxes May 3, 2025

Omikhleia reviewed May 3, 2025

View reviewed changes

typesetters/base.lua Show resolved Hide resolved

alerque marked this pull request as draft May 5, 2025 21:05

Merge remote-tracking branch 'upstream/master' into chore/bidi_to_core

5acee7b

Put back dubious double shapeAllNodes call

8e6096b

Omikhleia reviewed Jun 25, 2025

View reviewed changes

Merge remote-tracking branch 'origin/chore/bidi_to_core' into chore/b…

4d0777c

…idi_to_core

Omikhleia reviewed Jun 25, 2025

View reviewed changes

Omikhleia mentioned this pull request Jul 5, 2025

BIDI algorithm not working properly? #76

Open

Omikhleia added this to Language settings overhaul Jul 5, 2025

This was referenced Jul 5, 2025

\color not working with bidi #466

Open

Bad interaction between bidi and color fonts #501

Open

Omikhleia mentioned this pull request Jul 15, 2025

CI/tests: fonts SBL_Hbrw and AmiriQuran's addresses changed and now CI breaks #2297

Open

Omikhleia reviewed Sep 6, 2025

View reviewed changes

This was referenced Sep 12, 2025

Refactor the bidi logic in resilient Omikhleia/resilient.sile#150

Closed

MWE: framebox and RTL issue Omikhleia/ptable.sile#10

Open

Uh oh!

WIP: Initial change to move bidi to core, get bidi working in hboxes #2278

Are you sure you want to change the base?

WIP: Initial change to move bidi to core, get bidi working in hboxes #2278

Uh oh!

Conversation

mimbrown commented May 3, 2025

Uh oh!

Uh oh!

mimbrown commented May 3, 2025

When the document is LTR

Current (RTL text is laid out backwards)

New

When the document is RTL

Current (RTL text is laid out backwards, content in the hbox starts from the left edge of the box)

New

Uh oh!

Omikhleia commented May 25, 2025

Uh oh!

mimbrown commented Jun 13, 2025

Uh oh!

Omikhleia commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Omikhleia commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mimbrown commented Jun 25, 2025

Uh oh!

Omikhleia commented Jun 25, 2025

Uh oh!

Omikhleia Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimbrown Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Omikhleia Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimbrown Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Omikhleia Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Omikhleia Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

mimbrown Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

mimbrown commented Jun 25, 2025

Uh oh!

Omikhleia Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimbrown Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Omikhleia Sep 6, 2025

Choose a reason for hiding this comment

Footnotes

Uh oh!

Omikhleia Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

mimbrown Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

Omikhleia commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Omikhleia commented Jun 24, 2025 •

edited

Loading

Omikhleia commented Jun 24, 2025 •

edited

Loading

Omikhleia Jun 25, 2025 •

edited

Loading

Omikhleia Jun 25, 2025 •

edited

Loading

mimbrown Jun 25, 2025 •

edited

Loading

Omikhleia Jun 26, 2025 •

edited

Loading

Omikhleia Jun 25, 2025 •

edited

Loading