Skip to content

Commit a931024

Browse files
committed
Remove or correct non-NFD confusables
See unicode-org/properties#486
1 parent 5879e56 commit a931024

File tree

11 files changed

+517
-8019
lines changed

11 files changed

+517
-8019
lines changed

unicodetools/data/security/dev/confusables.txt

Lines changed: 203 additions & 2081 deletions
Large diffs are not rendered by default.

unicodetools/data/security/dev/confusablesSummary.txt

Lines changed: 242 additions & 3831 deletions
Large diffs are not rendered by default.

unicodetools/data/security/dev/data/confusablesSummaryIdentifier.txt

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# confusablesSummaryIdentifier.txt
2-
# Date: 2025-10-25, 07:52:31 GMT
2+
# Date: 2025-11-13, 04:42:22 GMT
33
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -164,11 +164,10 @@
164164
(‎ J ‎) 004A LATIN CAPITAL LETTER J
165165
← (‎ Ј ‎) 0408 CYRILLIC CAPITAL LETTER JE
166166

167-
# K Κ К
167+
# K Κ К
168168
(‎ K ‎) 004B LATIN CAPITAL LETTER K
169169
← (‎ Κ ‎) 039A GREEK CAPITAL LETTER KAPPA
170170
← (‎ К ‎) 041A CYRILLIC CAPITAL LETTER KA
171-
← (‎ K ‎) 212A KELVIN SIGN
172171

173172
# M Μ М
174173
(‎ M ‎) 004D LATIN CAPITAL LETTER M
@@ -1012,10 +1011,6 @@
10121011
← (‎ ஜ ‎) 0B9C TAMIL LETTER JA
10131012
← (‎ ജ ‎) 0D1C MALAYALAM LETTER JA # →ஜ→
10141013

1015-
# ஒள ஔ
1016-
(‎ ஒள ‎) 0B92 0BB3 TAMIL LETTER O, TAMIL LETTER LLA
1017-
← (‎ ஔ ‎) 0B94 TAMIL LETTER AU
1018-
10191014
# ண ണ
10201015
(‎ ண ‎) 0BA3 TAMIL LETTER NNA
10211016
← (‎ ണ ‎) 0D23 MALAYALAM LETTER NNA
@@ -1461,5 +1456,5 @@
14611456
(‎ へ ‎) 3078 HIRAGANA LETTER HE
14621457
← (‎ ヘ ‎) 30D8 KATAKANA LETTER HE
14631458

1464-
# total : 524
1459+
# total : 522
14651460

unicodetools/data/security/dev/data/source/confusables-macFonts.txt

Lines changed: 0 additions & 761 deletions
Large diffs are not rendered by default.

unicodetools/data/security/dev/data/source/confusables-source.txt

Lines changed: 26 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3967,8 +3967,6 @@ FFE8 ; | # ( │ → │) HALFWIDTH FORMS LIGHT VERTICAL → pipe
39673967
‰ ; ⁰/₀₀
39683968
‱ ; ⁰/₀₀₀
39693969
þ ; ƿ
3970-
Å ; Ȧ
3971-
å ; ȧ
39723970

39733971
0629 ; 006F 0308 # Teh marbuta
39743972
06C3 ; 0629 # ...goal
@@ -4595,7 +4593,7 @@ A793 ; 0454 # ( ꞓ ) LATIN SMALL LETTER C WITH BAR =>U+0454 ( є ) CYRILLIC SMA
45954593
∔ ; +̇
45964594
∸ ; −̇
45974595
≐ ; =̇
4598-
; ≐̣
4596+
2251 ; 003D 0323 0307
45994597
≗ ; =̊
46004598
≙ ; =̂
46014599
≚ ; =̌
@@ -5534,22 +5532,24 @@ A7F1 ; 02E2 # ( ꟱ → ˢ ) MODIFIER LETTER CAPITAL S → MODIFIER LETTER SMAL
55345532
# Unicode 17.0 (post-UTC #183)
55355533
20C1 ; FDFC # ( ⃁ → ﷼ ) SAUDI RIYAL SIGN → RIAL SIGN
55365534
2B73F ; 6138 # ( 愸 → 𫜿 ) CJK UNIFIED IDEOGRAPH 2B73F → CJK UNIFIED IDEOGRAPH 6138
5537-
2F81D ; 20674 # ( 𠙴 → 凵 ) CJK UNIFIED IDEOGRAPH 2F81D → CJK UNIFIED IDEOGRAPH 20674
5538-
2F82D ; 2D161 # ( 𭅡 → 卑 ) CJK UNIFIED IDEOGRAPH 2F82D → CJK UNIFIED IDEOGRAPH 2D161
5539-
2F85B ; 21533 # ( 𡔳 → 壷 ) CJK UNIFIED IDEOGRAPH 2F85B → CJK UNIFIED IDEOGRAPH 21533
5540-
2F85D ; 21587 # ( 𡖇 → 多 ) CJK UNIFIED IDEOGRAPH 2F85D → CJK UNIFIED IDEOGRAPH 21587
5541-
2F860 ; 216A7 # ( 𡚧 → 𡚨 ) CJK UNIFIED IDEOGRAPH 2F860 → CJK UNIFIED IDEOGRAPH 216A7
5542-
2F89C ; 22505 # ( 𢔅 → 徚 ) CJK UNIFIED IDEOGRAPH 2F89C → CJK UNIFIED IDEOGRAPH 22505
5543-
2F905 ; 23D40 # ( 𣵀 → 涅 ) CJK UNIFIED IDEOGRAPH 2F905 → CJK UNIFIED IDEOGRAPH 23D40
5544-
2F927 ; 2AEC5 # ( 𪻅 → 𤠔 ) CJK UNIFIED IDEOGRAPH 2F927 → CJK UNIFIED IDEOGRAPH 2AEC5
5545-
2F92B ; 248FD # ( 𤣽 → 玥 ) CJK UNIFIED IDEOGRAPH 2F92B → CJK UNIFIED IDEOGRAPH 248FD
5546-
2F935 ; 24C36 # ( 𤰶 → 𤰶 ) CJK UNIFIED IDEOGRAPH 2F935 → CJK UNIFIED IDEOGRAPH 24C36
5547-
2F943 ; 2511A # ( 𥄚 → 𥄙 ) CJK UNIFIED IDEOGRAPH 2F943 → CJK UNIFIED IDEOGRAPH 2511A
5548-
2F96E ; 31E7C # ( 𱹼 → 緇 ) CJK UNIFIED IDEOGRAPH 2F96E → CJK UNIFIED IDEOGRAPH 31E7C
5549-
2F97E ; 2659D # ( 𦖝 → 𦖨 ) CJK UNIFIED IDEOGRAPH 2F97E → CJK UNIFIED IDEOGRAPH 2659D
5550-
2F9A4 ; 26D06 # ( 𦴆 → 𦰶 ) CJK UNIFIED IDEOGRAPH 2F9A4 → CJK UNIFIED IDEOGRAPH 26D06
5551-
2F9CB ; 4695 # ( 䚕 → 𧢮 ) CJK UNIFIED IDEOGRAPH 2F9CB → CJK UNIFIED IDEOGRAPH 4695
5552-
2F9D6 ; 25AD4 # ( 𥫔 → 贛 ) CJK UNIFIED IDEOGRAPH 2F9D6 → CJK UNIFIED IDEOGRAPH 25AD4
5535+
# Following lines originally refereed to CJK compatibility characters,
5536+
# which were not OK in confusables (PAG ref #486). The codepoints have been
5537+
# replaced with the unified ideograph they decompose to.
5538+
51F5 ; 20674
5539+
5351 ; 2D161
5540+
58F7 ; 21533
5541+
591A ; 21587
5542+
216A8 ; 216A7
5543+
5F9A ; 22505
5544+
6D85 ; 23D40
5545+
24814 ; 2AEC5
5546+
73A5 ; 248FD
5547+
25119 ; 2511A
5548+
7DC7 ; 31E7C
5549+
265A8 ; 2659D
5550+
26C36 ; 26D06
5551+
278AE ; 4695
5552+
8D1B ; 25AD4
55535553

55545554
# High priority confusables for Tibetan (PAG ref #402)
55555555
0F7B ; 0F7A 0F7A # ( ཻ → ེེ ) TIBETAN VOWEL SIGN EE → TIBETAN VOWEL SIGN E, TIBETAN VOWEL SIGN E
@@ -5617,7 +5617,6 @@ A7F1 ; 02E2 # ( ꟱ → ˢ ) MODIFIER LETTER CAPITAL S → MODIFIER LETTER SMAL
56175617
0A47 ; 0947
56185618
09F0 ; 09B0
56195619
1031 ; 0B47
5620-
0B94 ; 0B92 0BB3
56215620
0D25 ; 0BAE
56225621
0D16 ; 0BB5
56235622
0D46 ; 0BC6
@@ -5818,3 +5817,10 @@ A8CF ; 007C 007C # SAURASHTRA DOUBLE DANDA
58185817
514C ; 5151
58195818
980B ; 2EA07
58205819
2EDB5 ; 32A8F
5820+
5821+
# Pairs add to compensate for removal of non-NFD characters and sequences.
5822+
# These were previously confusable due to transitivity, but now need to be
5823+
# listed explicitly.
5824+
2EBF ; 2EBE
5825+
2EC0 ; 2EBE
5826+
9FC3 ; 4039

0 commit comments

Comments
 (0)