-
-
Notifications
You must be signed in to change notification settings - Fork 102
WIP: Initial change to move bidi to core, get bidi working in hboxes #2278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@alerque this does not yet include any test updates or anything like that, just the basic code change. The thing that's controversial is, I've also updated the output logic for hboxes, since I've found them to be wrong by default in RTL mode. Given the content: When the document is LTRCurrent (RTL text is laid out backwards)NewWhen the document is RTLCurrent (RTL text is laid out backwards, content in the hbox starts from the left edge of the box)New |
|
This does break a lot of tests, including unexpected ones. |
|
Coming back to this after a while. @Omikhleia I've tried to visually inspect the difference between the current typesetter and the changes I've made, and can't make anything out. If I'm reading the diffs correct, it's looking like a lot of these tests are differences of fractions of points, but maybe I'm missing something. Do you have any idea what's going on? |
Sorry for the delay @mimbrown I failed to see earlier that I was pinged here ^^ I retraced your steps, it turns out that with in the original code, I daresay it's really dubious... It might also explain that when I once tried to remove the whole bidi logic (e.g. in the old code, just commenting out the content of the enabling/disabling methods), the debug output also had some subtle differences of the same sort, for a document that had just French text (so I wouldn't have expected bidi to do something at all). My feeling is that the old code was wrong, despite being "active" all the time by default... Doh. |
That is -- the old bidi code was:
Your proposed code moved all that mess to the base typesetter, a nice move IMHO, but invokes splitNodelistIntoBidiRuns at a later stage (not illogically, IMHO), and there's only one shapeAllNodes then. To me, you might be in the right track here... but don't take this for granted, I don't understand a lot of what SILE does the way it does it, crippled with side-effects and unclear intents, even after a 4-year involvement as a user and casual contributor. |
|
Thanks for looking at this! I see your point, yes I remember thinking that removing the double |
Darn, all tests failed but for some font fetching issue in the CI. Let's invoke our great @alerque here :)
I'll add a dedicated comment on the code, but still, this indeed has heavy code smell... Well let's see if it does the trick and confirms the root cause when the CI is repaired. If the decision is to keep these moved lines, my opinion is that it would need an extra in-code comment to note the code small and refer to this issue (or a new one), so people who maintain or investigate the code in the future are aware of the potential issue... But let's wait for a result, discussion and decision. |
| return {} | ||
| end | ||
| nodelist = bidi.splitNodelistIntoBidiRuns(nodelist, self.frame:writingDirection()) | ||
| self:shapeAllNodes(nodelist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. the main discussion in the PR: yes 👍 having those two lines here is also what I was discussing. I arrived at the same conclusion and all other things equal (I haven't checked our code was identical as mine; when just starting from the old code and following the same steps as you), the test tests/image.sil gave the same result as before. That's the only one I checked though.
But the double shapeAllNodes (here and further below), or the reason why this splitting/shaping has to occur here (rather than after some discardable nodes or penalties are removed) are very unclear. I'd like to grasp why, and have proper in-code comments here... ideally ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I'll add the comment (if this works). Just want to validate that it does before going to all the trouble 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... from boxUpNodes(), we invoke breakIntoLines() which also invokes shapeAllNodes(). How many times does SILE ends up reshaping the same content (with a few glues and penalties added or removed)?
This is not even cost-less, with the dubious "inplace" boolean there, we browse, re-copy this list multiple times. Inefficient and weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷♂️ it definitely does look weird. If we figure out why this changes subtly when calling shapeAllNodes either twice or before/after discarding nodes (not sure which is the culprit yet), that would probably help us think through how it should be fixed.
Looking at the function, it seems like calling it multiple times shouldn't have any effect, unless calling shape() on a node can somehow return more unshaped nodes, which would be problematic. So I would guess that calling shapeAllNodes() after removing the discardable nodes is what is causing the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My best guess for the difference in the logs is that it is introduced by some glues, and that the line ratio (affecting their shrinking/stretching) occurs to be slightly different -- but the amount is small and distributed on all glues, so in the end its just a few decimal points (but this would affect "Mx" lines in the logs in the way we observe).
Hmm... I wonder if the issue isn't what we put pack into the state.nodes or not, during all that chain of calls...
See:
function typesetter:breakIntoLines (nodelist, breakWidth)
self:shapeAllNodes(nodelist)
local breakpoints = SILE.linebreak:doBreak(nodelist, breakWidth)
return self:breakpointsToLines(breakpoints)
end
Then breakPointsToLines works on self.states.nodes, not the nodelist. Ouch. That's the big assumption. They might be the same or not and shapeAllNodes works "in place" "by default", so it all depends on what the nodelist originally was. It might have been self.states.nodes initially... but nodelist = bidi.splitNodelistIntoBidiRun returns a new nodelist, and do we put it back to self.states.nodes? Otherwise later at some point this will be inconsistent.
Frankly these user of the internal states and the multiple side-affecting methods are a mess to follow. As as wrote in another comment, the constant hesitation on what we work (putting things back and forth in the states) is a likely source of error. The code is quite a mess there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I see such things, I am frightened:
Line 713 in b3b17ad
| SILE.typesetter.state.nodes = self.nodes -- Horrible breaking of separation of concerns here. :-( |
Another cross-boundary side-effect here... Ideally linebreak:doBreak() should do that, but should return the nodelist it worked on...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree, there's some major refactoring that would be helpful. But my goal with this PR has been to be minimal, not expansive, with changes. Once @alerque can let us know what's causing the 404 errors in the test runs, we should be able to see if what I'm doing now is actually consistent with what was there before. Based on this, I think we should be able to keep the new shapeAllNodes call before the discarding of other nodes, remove the second (redundant) one, and most tests should pass fine. More expansive changes should be a different PR I think.
|
Yes I just pulled in the upstream, possibly in between me making this PR and now something in the test setup changed. |
| self.state.hmodeOnly = true | ||
| SILE.process(content) | ||
|
|
||
| self.state.nodes = bidi.splitNodelistIntoBidiRuns(self.state.nodes, self.frame:writingDirection()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way, do we need to put that back into self.state.nodes? There's a lot of back-and-forth stuff of that kind, I know, in the code base, but it would be best to avoid it, and just use locals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly following what was there before, because (as discussed) I can't track through if this is necessary so I'm afraid to do anything different!
| for i = 1, #vboxes do | ||
| local v = vboxes[i] | ||
| if v.is_vbox then | ||
| v.nodes = bidi.reorder(v.nodes, self.frame:writingDirection()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @mimbrown
As I am working on my own rewrite of a typesetter (here), I tried to port your proposed changes to it.
It made me realize there's an issue here, when the document uses "liners" (which I introduced earlier to SILE), i.e. for underlining, striking out, etc. portions of text that can be line broken.1 The liner logic eventually wraps part of lines if liner boxes (responsible for applying the wanted effect), and these can be deeply nested...
So here, bidi re-ordering -- which as far as I can tell, expects a flat list of shaped nodes, comes far too late, and will not handle these liner boxes.
This raises a few questions:
- Why is it done here so late on the final line vboxes (via another repeated loop on all lines), while it could perhaps have been done on each line individually before it's finalized (i.e. just before the liner reboxing logic applies).
- But why is it done, actually, after line-breaking has occurred? Is that due to some interpretation with respect to the Knuth-Plass algorithm?
- And what exactly is it doing? The bidi code is totally undocumented (and even has some code commented out, never a good sign), with opaque variables (what's
rv!? why to we have toSU.compressit for holes2); and weird sequences3...
I'd totally be in favor of having bidi also made "standard" in my re-implementation, but it's a tricky piece of code, and I don't even know what to test, without knowing the languages :)
Footnotes
-
E.g.
\underline{....}in SIL. ↩ -
That's the only use for this compress utility in all the SILE code base, if not mistaken... ↩
-
E.g. sequences of
SU.flip_in_place/reverse_each_node, where the later browses the node list again toSU.flip_in_placeinternal hboxes, knowing their inner structure. Whether such code is legit is hard to say, for one not knowing the details. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linked to a 10-year old issue: #76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Omikhleia I definitely don't have the answers to all the "why" questions. My goal with this PR was to change things as minimally as possible, and only remove/change things that I was logically positive shouldn't have an effect. BiDi is really painful... but, I am pretty sure, we can't reverse before the line-breaking happens. Here's an example of why:
Imagine a string of glyphs (' means the character is RTL):
ABC DEF A'B'C' D'E'F'
If we want to apply the BiDi algorithm, it becomes:
ABC DEF F'E'D' C'B'A'
Easy. But let's say we need a line break in the middle of the RTL text. In that case, we want the output to look like this:
ABC DEF C'B'A'
F'E'D'
But if we reorder before breaking into lines, we'll end up with:
ABC DEF F'E'D'
C'B'A'
Off the top of my head, I don't know why we couldn't do the reordering before the reboxing. Maybe we should! But here's why we can't do it before line-breaking.
|
Linked #2268 |




The PR has a couple changes:
corefolder (for lack of maybe a better place). Minimal tweaks to some bidi functions to take, for example, adirectioninstead of atypesetter, to reduce direct dependency on the typesetter.splitIntoRunsandreordercalls directly intotypesetters/base.lua#boxUpNodessplitIntoRunsandreordercalls intotypesetters/base.lua#makeHboxbidipackage down to be mostly just deprecated exports.