Bug 18532 – Hex literals produce invalid strings

Status
NEW
Severity
enhancement
Priority
P4
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2018-02-27T14:13:13Z
Last change time
2024-12-13T18:57:30Z
Assigned to
No Owner
Creator
FeepingCreature
Moved to GitHub: dmd#17847 →

Comments

Comment #0 by default_357-line — 2018-02-27T14:13:13Z
Hex literals let you declare strings that are invalid utf-8. This violates the docs, as well as the typesystem. "\xff" is an expression of type string. string is defined ( https://dlang.org/spec/arrays.html#strings ) to be in UTF-8 format. Furthermore, string is an array of char, and chars are defined to be UTF-8 codepoints. 0xFF is not a valid UTF-8 codepoint. The docs state that hex strings do not perform UTF-8 checking. The docs accurately describe the code; the code is mistaken since it breaks the type. Either the behavior of hex literals must be changed, or the definition of char must be changed. As it stands, the documentation and behavior is self-contradictory. Maybe hex literals can be ubyte[]?
Comment #1 by default_357-line — 2018-02-27T14:18:52Z
Update: std.conv.hexString does not validate its return value either.
Comment #2 by b2.temp — 2018-02-27T16:33:13Z
It doesn't have to. hexString isn't even design to represent strings literals, it can be a memory dump as well that can be cast to ubyte[].
Comment #3 by default_357-line — 2018-02-27T17:01:48Z
It has to, because it returns string and string is defined to be UTF-8. If it wants to return something that is not UTF-8, it should return ubyte[], and you should have to cast it to string explicitly.
Comment #4 by dfj1esp02 — 2018-02-28T08:54:04Z
I'd say the spec just specifies encodings for strings, meaning that it can't be something else like EBCDIC or cp1252. There was a debate whether invalid utf violates type system and an idea that invalid utf can produce an exception, a replacement character or be ignored.
Comment #5 by dfj1esp02 — 2018-02-28T09:01:01Z
https://forum.dlang.org/post/[email protected] - I suppose discussion was there.
Comment #6 by robert.schadek — 2024-12-13T18:57:30Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/17847 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB