Bug 7084 – Missing writeln Unicode normalization

Status
NEW
Severity
enhancement
Priority
P4
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Windows
Creation time
2011-12-09T01:12:59Z
Last change time
2024-12-01T16:14:45Z
Keywords
bootcamp
Assigned to
No Owner
Creator
bearophile_hugs
See also
https://issues.dlang.org/show_bug.cgi?id=2742
Moved to GitHub: phobos#9920 →

Comments

Comment #0 by bearophile_hugs — 2011-12-09T01:12:59Z
In this program the string 'txt1' contains two codepoints: LATIN CAPITAL LETTER A, and COMBINING DIAERESIS. I think a good printing function has to perform Unicode normalization and show a single \U000000C4 (LATIN CAPITAL LETTER A WITH DIAERESIS) glyph. But with DMD 2.057beta it shows two glyphs (on Windows), an 'A' followed by a diaeresis. writeln(txt2) shows what I think is the correct output for writeln(txt1) too: import std.stdio; void main() { dstring txt1 = "\U00000041\U00000308"d; writeln(txt1); dstring txt2 = "\U000000C4"d; writeln(txt2); }
Comment #1 by hsteoh — 2012-02-25T17:57:14Z
IMO this should be an enhancement request. As I understand, Unicode normalization is non-trivial, so we probably should think over how we want to do it.
Comment #2 by bearophile_hugs — 2012-02-26T14:59:46Z
(In reply to comment #1) > IMO this should be an enhancement request. As I understand, Unicode > normalization is non-trivial, so we probably should think over how we want to > do it. OK, now it's an enhancement.
Comment #3 by hsteoh — 2012-02-26T22:22:24Z
Here's a link to the relevant part of the Unicode standard for whoever wants to implement normalization: http://unicode.org/reports/tr15/ Note that there are several different normalizations, with NFC probably being the closest to what this bug requires. After scanning through the standard, it seems to me that rather than putting this in std.stdio (or the prospective std.io), we really should put it in std.uni or std.utf, and have different algorithms available for programs to choose the normalization form. The algorithms involved are not trivial, and some people may not want std.stdio to automatically normalize to a particular form when they want specifically to use a different form or a non-normalized output for whatever reason.
Comment #4 by hsteoh — 2016-10-15T05:12:28Z
@andralex: Are you sure this bug qualifies for 'bootcamp'? Unicode normalization is highly-nontrivial, and requires significant effort to support correctly, and will probably involve multiple modules (at least std.uni and std.stdio, perhaps also std.utf). Plus, deciding which normalization scheme(s) to default to is a decision that can only be made with more experience with the language and community.
Comment #5 by dfj1esp02 — 2016-10-17T17:01:13Z
This can have the same problem as issue 2742: normalizing it always may be not what one wants, and detecting console is problematic. Also AFAIK not all characters have precomposed variants.
Comment #6 by robert.schadek — 2024-12-01T16:14:45Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9920 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB