-
Notifications
You must be signed in to change notification settings - Fork 237
Description
Changing from the small trie type to the fast trie type doubles the throughput for figuring out that already-NFC UTF-16 is indeed already in NFC for Japanese and Chinese. Korean becomes even faster. It seems reasonable to assume that other languages whose bulk of characters is between U+1000 and U+FFFF would get doubled throughput, too.
It seems bad to offer worse performance by default for this part of the BMP.
For nfd, databake claims 27948B vs. 34748B. (Fast is 6.6 KB larger.)
For nfkd, databake claims 43132B vs. 51324B. (Fast is 8.0 KB larger.)
For uts46d, databake claims 56200B vs 69488B. It seems bad to pessimize widely-used languages to save 13 KB in binary size, but people already complain about having to carry any IDNA data as a side effect of depending on url, and it seems unlikely that IDNA processing is a perf bottleneck, so perhaps we should keep defaulting to the small trie type for this one.
For the Harfbuzz supplement, databake claims 4486B vs. 6382B, but in the U+1000 to U+FFFF range that result in a query to this trie, so perhaps it makes sense to keep the small trie type for this one. (But we should check, if there are characters of interest in the relevant range.)