Improve --space-as-offset: determine spaces by unicode#446
Improve --space-as-offset: determine spaces by unicode#446duanyao wants to merge 1 commit intocoolwanglu:incomingfrom
Conversation
|
Seems that old PDF generators/converters were not able to handle this well -- after all this has nothing to do with printing. And ToUnicode is indeed optional in the standard. I'm not sure if this is a good solution. Or possible we can take consideration of the |
|
If ToUnicode is missing, can we just ignore |
Fix #445.
Now
--space-as-offsetworks on "unicode space" instead of ASCII SPACE before decoding the text.This change should also increases the oppotunities of converting spaces to offsets.
However for PDFs with bad unicode support, this may still drop chars, though I haven't found an example yet.