Remove or correct non-NFD confusables #1238

roozbehp · 2025-11-13T04:50:37Z

See https://github.com/unicode-org/properties/issues/486

roozbehp · 2025-11-13T04:53:23Z

Sorry, this is a little hard to review, but be assured that I triple-checked it myself very thoroughly. confusablesSummary.txt is relatively easier to check compared to other files, but it's still pretty long.

Anyways, please take a look.

roozbehp · 2025-11-13T05:45:49Z

Converting to draft to fix Java style issues.

See unicode-org/properties#486

roozbehp · 2025-11-13T08:29:00Z

Java style fixed. Ready for review.

josh-hadley

@roozbehp, apologies for taking a long time to get this reviewed. It was tricky as advertised, but all looks good. The process did make me want to see if a one-time revamp of confusables-source.txt to force most or all entries to our new preferred format universally would be worthwhile (presumably making reviews/maintenance like this a little less taxing).

macchiati · 2025-11-25T01:10:39Z

Note that we need to add confusable changes to the Migration section of each release. That is, if the skeleton(X) changes from Y to Z, that requires implementations that have mappings to skeletons need to update.

(We had some breakages of indexes in production softward with the U17 integrations; luckily caught by unittests.)

roozbehp · 2025-11-25T02:33:16Z

Note that we need to add confusable changes to the Migration section of each release. That is, if the skeleton(X) changes from Y to Z, that requires implementations that have mappings to skeletons need to update.

We're going to have a lot of confusable changes in Unicode 18.0, but this specific pull request should not affect the skeleton of any string, since a conformant implementation would not use any of the data removed in this pull request: They simply don't occur in NFD form that the algorithm applies before looking for the prototypes.

macchiati · 2025-11-25T16:16:50Z

I'm not worried about dropping all the non-nfd forms. I just want to make sure that we alert people of changes, and that anytime we had X -> Y before, we have nfd(X) -> nfd(Y) after (when nfd(X) is a single code point -- and you include a onetime test that that is true.

…

On Mon, Nov 24, 2025, 18:33 Roozbeh Pournader ***@***.***> wrote: *roozbehp* left a comment (unicode-org/unicodetools#1238) <#1238 (comment)> Note that we need to add confusable changes to the Migration section of each release. That is, if the skeleton(X) changes from Y to Z, that requires implementations that have mappings to skeletons need to update. We're going to have a lot of confusable changes in Unicode 18.0, but this specific pull request should not affect the skeleton of any string, since a conformant implementation would not use any of the data removed in this pull request: They simply don't occur in NFD form that the algorithm applies before looking for the prototypes. — Reply to this email directly, view it on GitHub <#1238 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJLEMCGVFUS42MPA5ORERD36O5YDAVCNFSM6AAAAACL6VXZVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZTGU2DIMJWGQ> . You are receiving this because you commented.Message ID: ***@***.***>

roozbehp requested review from josh-hadley and markusicu November 13, 2025 04:50

roozbehp force-pushed the roozbehp-non-nfd branch from ce6b7d3 to a931024 Compare November 13, 2025 05:16

roozbehp marked this pull request as draft November 13, 2025 05:45

Remove or correct non-NFD confusables

22058ca

See unicode-org/properties#486

roozbehp force-pushed the roozbehp-non-nfd branch from a931024 to 22058ca Compare November 13, 2025 06:21

roozbehp marked this pull request as ready for review November 13, 2025 06:21

josh-hadley approved these changes Nov 25, 2025

View reviewed changes

roozbehp merged commit bcd1dec into main Nov 25, 2025
24 of 27 checks passed

roozbehp deleted the roozbehp-non-nfd branch November 25, 2025 02:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Remove or correct non-NFD confusables #1238

Remove or correct non-NFD confusables #1238

Uh oh!

roozbehp commented Nov 13, 2025 •

edited

Loading

Uh oh!

roozbehp commented Nov 13, 2025 •

edited

Loading

Uh oh!

roozbehp commented Nov 13, 2025

Uh oh!

roozbehp commented Nov 13, 2025

Uh oh!

josh-hadley left a comment

Uh oh!

macchiati commented Nov 25, 2025

Uh oh!

roozbehp commented Nov 25, 2025

Uh oh!

Uh oh!

macchiati commented Nov 25, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Remove or correct non-NFD confusables #1238

Remove or correct non-NFD confusables #1238

Uh oh!

Conversation

roozbehp commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roozbehp commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roozbehp commented Nov 13, 2025

Uh oh!

roozbehp commented Nov 13, 2025

Uh oh!

josh-hadley left a comment

Choose a reason for hiding this comment

Uh oh!

macchiati commented Nov 25, 2025

Uh oh!

roozbehp commented Nov 25, 2025

Uh oh!

Uh oh!

macchiati commented Nov 25, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roozbehp commented Nov 13, 2025 •

edited

Loading

roozbehp commented Nov 13, 2025 •

edited

Loading