Bug 15882 – writeln on a bad dstring triggering assert(0) in std.utf.toUTF8

Status
RESOLVED
Resolution
WORKSFORME
Severity
enhancement
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
x86_64
OS
Windows
Creation time
2016-04-06T01:48:00Z
Last change time
2016-04-06T19:26:04Z
Assigned to
nobody
Creator
erikas.aubade

Comments

Comment #0 by erikas.aubade — 2016-04-06T01:48:47Z
So in trying to pass a UTF32 string from C++ to D, i think I've stumbled upon an issue in phobos. If the bad dstring is placed in a struct, and the struct converted to string, it returns the value x"8DA0000 51B5A8 67 69 74 65 63 68 20 57 69 6E 67 4D 61 6E 20 53 74 72 69 6B 65 20 46 6F 72 63 65 20 33 44 20 55 53 42"d, but if I try to pass the bad dstring to writeln, it triggers the assert(0) at the end of std.utf.toUTF8() (It's tricky to make a test snippet to send with this bug, because dmd will reject the string as bad unicode, so hopefully the literal is enough.)
Comment #1 by ag0aep6g — 2016-04-06T16:04:34Z
What do you suggest to be fixed/improved here? As you say, the dstring is invalid, so it seems correct that the assert fails.
Comment #2 by erikas.aubade — 2016-04-06T17:08:26Z
An assert failure is not considered a recoverable error; as i understand it should be indicative of a critical flaw in logic, not an indication of bad input. It's also inconsistent with the way D treats bad unicode elsewhere. For instance, if you put that bad string in a struct, it will wrilen with no error and simply display as a hex-literal. In the functions that decode utf8 to utf32, bad unicode will issue a recoverable exception.
Comment #3 by erikas.aubade — 2016-04-06T17:12:54Z
(And just to clarify, my personal preference would be to have it print the Unicode replacement char (U+FFFD), similar to other functions in std.utf)
Comment #4 by ag0aep6g — 2016-04-06T19:26:04Z
Seems like phobos is ahead of us here. ---- void main() { uint[] a = [0x8DA0000, 0x051B5A8, 0x67, 0x69, 0x74, 0x65, 0x63, 0x68]; import std.stdio: writeln; writeln(cast(dchar[]) a); } ---- Compiled with the freshly released dmd 2.071.0, this prints "��gitech", i.e. it uses replacement characters now. I think this is part of a change in the stance regarding invalid char/wchar/dchar values. There used to be a guarantee/requirement that they do not occur. Apparently, that has been given up in favor of a more forgiving approach. Since in dmd 2.071.0 it's working as you suggest, I'm closing this as RESVOLED:WORKSFMORME. Feel free to reopen if I missed anything. By the way, the generated hex string is broken. There should be lots of zero bytes. I've filed a separate issue for that: issue 15888.