← Back to index | Original Bugzilla link

Bug 23341 – [std.uni] ZWJ not handled properly

Status: NEW
Severity: enhancement
Priority: P4
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2022-09-17T16:56:19Z
Last change time: 2024-12-01T16:40:24Z
Assigned to: No Owner
Creator: Garrett D'Amore

Comments

Comment #0 by garrett — 2022-09-17T16:56:19Z

For example, when iterating over the following string "\U0001f9db\u200d\u2640" byGrapheme, there should be exactly one grapheme (representing a female vampire). Instead it is treated as two graphemes. This form of composition is becoming increasingly important in modern Unicode, as it is used to build rich representations of characters, for example adding or modifying gender. An example program demonstrating this problem is here: https://gist.github.com/gdamore/13cc3b50aa3dbffca291f76b87849645 Note that in some systems, fallbacks may actually render these "graphemes" as multiple glyphs (Unicode TR51 leaves this as an implementation detail), so we might want to have a way to specify whether these are displayed together or separately. Modern implementations can generally display quite a high level of richness without needing fallbacks.

Comment #1 by garrett — 2022-09-17T17:09:26Z

ZWJ probably requires a level of sophistication to handle properly: https://en.wikipedia.org/wiki/Zero-width_joiner For example, the handling in Devangari is a little different since ZWJ modifies characters placed before it. For example: s2 = "\u0915\u094d\u200d"; writefln("s2 is %s\n", s2); writefln("graphemes %d (expect 1)\n", wr.walkLength); // this should be "1" This looks like: क्‍

Comment #2 by garrett — 2022-09-17T18:34:41Z

This problem is not limited to ZWJ: For example: s2 = "\U0001F44D\U0001F3fD"; writefln("s2 is %s\n", s2); writefln("graphemes %d (expect 1)\n", wr.walkLength); // this should be "1" That is a thumbs up with a skin tone modifier. That should be one grapheme.

Comment #3 by robert.schadek — 2024-12-01T16:40:24Z

THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/10500 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB