Ship fast tries in icu_normalizer_data by default

Changing from the small trie type to the fast trie type doubles the throughput for figuring out that already-NFC UTF-16 is indeed already in NFC for Japanese and Chinese. Korean becomes even faster. It seems reasonable to assume that other languages whose bulk of characters is between U+1000 and U+FFFF would get doubled throughput, too.

It seems bad to offer worse performance by default for this part of the BMP.

For nfd, databake claims 27948B vs. 34748B. (Fast is 6.6 KB larger.)

For nfkd, databake claims 43132B vs. 51324B. (Fast is 8.0 KB larger.)

For uts46d, databake claims 56200B vs 69488B. It seems bad to pessimize widely-used languages to save 13 KB in binary size, but people already complain about having to carry _any_ IDNA data as a side effect of depending on `url`, and it seems unlikely that IDNA processing is a perf bottleneck, so perhaps we should keep defaulting to the small trie type for this one.

For the Harfbuzz supplement, databake claims 4486B vs. 6382B, but in the U+1000 to U+FFFF range that result in a query to this trie, so perhaps it makes sense to keep the small trie type for this one. (But we should check, if there are characters of interest in the relevant range.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ship fast tries in icu_normalizer_data by default #6836

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ship fast tries in icu_normalizer_data by default #6836

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions