🔧 Fix: Improve Unicode Handling in String Diff Functions #962

olegmingaleev · 2025-10-23T16:38:27Z

📋 Summary

This PR fixes Unicode handling issues in the string diff functions (pfx and sfx) by implementing proper grapheme cluster segmentation using the modern Intl.Segmenter API.

🐛 Problem

The previous implementation had issues with complex Unicode characters, particularly:
Multi-byte Unicode characters (emojis, accented characters)
Zero Width Joiner (ZWJ) sequences in emojis (e.g., 👨‍🍳)
Surrogate pairs and combining characters
Incorrect prefix/suffix calculations leading to malformed diffs

✅ Solution

Replaced character-based logic with Intl.Segmenter for proper grapheme cluster handling
Updated pfx() function to work with grapheme clusters instead of individual characters
Updated sfx() function to work with grapheme clusters instead of individual characters
Added comprehensive test case for chef emoji with ZWJ sequences

🧪 Testing

Added test case for complex emoji: 👨‍🍳 (chef emoji with ZWJ)
All existing tests continue to pass
Verified correct diff behavior with multi-byte Unicode characters
###🔍 Technical Details
The fix uses Intl.Segmenter with granularity: 'grapheme' to properly segment strings into grapheme clusters, ensuring that complex Unicode characters are treated as single units rather than being split across multiple characters.

🎯 Impact

✅ Correct diff behavior with all Unicode characters
✅ Better handling of emoji sequences

- Use Intl.Segmenter for proper grapheme cluster handling - Fix prefix/suffix calculation for complex emoji sequences - Add test case for chef emoji with ZWJ sequences - Ensures correct diff behavior with multi-byte Unicode characters

streamich · 2025-10-28T00:27:14Z

Fixed here #964

olegmingaleev mentioned this pull request Oct 23, 2025

Incorrect Unicode Handling in String Diff Functions #963

Closed

streamich closed this Oct 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

🔧 Fix: Improve Unicode Handling in String Diff Functions #962

🔧 Fix: Improve Unicode Handling in String Diff Functions #962

Uh oh!

olegmingaleev commented Oct 23, 2025 •

edited

Loading

Uh oh!

streamich commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

🔧 Fix: Improve Unicode Handling in String Diff Functions #962

🔧 Fix: Improve Unicode Handling in String Diff Functions #962

Uh oh!

Conversation

olegmingaleev commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Summary

🐛 Problem

✅ Solution

🧪 Testing

🎯 Impact

Uh oh!

streamich commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

olegmingaleev commented Oct 23, 2025 •

edited

Loading