Add SliceUtf8.toTitleCase()#177
Conversation
|
@martint PTAL |
3e1949e to
d8d6530
Compare
|
@dain I've pushed new implementation. PTAL |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis pull request adds UTF-8 title-casing support to SliceUtf8. It introduces a TITLE_CODE_POINTS lookup table in static initialization, adds public toTitleCase methods for Slice and byte-array ranges, and implements a code-point-based transformation that title-cases the first cased code point of each whitespace-delimited word and lowercases subsequent code points. Invalid UTF-8 sequences are copied verbatim and do not reset word-start state. Output allocation is lazy; unchanged input is returned as a wrapped view. Tests validate behavior, invalid-UTF-8 handling, and no-op wrapping. 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/test/java/io/airlift/slice/TestSliceUtf8.java (1)
756-775:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winConsider adding test coverage for
toTitleCasewith invalid bytes at start and middle positions.The assertions for
toLowerCase/toUpperCasetest invalid sequences at three positions: end only, start + end ("FOO"), and surrounding ("FOO" + invalid + "BAR"). However,toTitleCaseonly covers the end position. This matters for title-casing because the start-of-word state should persist across invalid sequences.🧪 Suggested additional assertions
+ // invalid sequence at start followed by text should not start a new word + assertThat(toTitleCase(wrappedBuffer(concat(invalidSequence, new byte[] {'F', 'O', 'O'})))) + .isEqualTo(wrappedBuffer(concat(invalidSequence, new byte[] {'F', 'o', 'o'}))); + + // invalid sequence in middle should not start a new word + assertThat(toTitleCase(wrappedBuffer(concat(new byte[] {'F', 'O', 'O'}, invalidSequence, new byte[] {'B', 'A', 'R'})))) + .isEqualTo(wrappedBuffer(concat(new byte[] {'F', 'o', 'o'}, invalidSequence, new byte[] {'b', 'a', 'r'})));
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 227aa4ea-f292-45cf-a25b-5bbce119c3a5
📒 Files selected for processing (2)
src/main/java/io/airlift/slice/SliceUtf8.javasrc/test/java/io/airlift/slice/TestSliceUtf8.java
d8d6530 to
a0818bd
Compare
dain
left a comment
There was a problem hiding this comment.
Looks good to me, but maybe squash to one commit
a0818bd to
53d58d9
Compare
this is based on the logic of the toLower and toUpper case