Bug 6125 – to!string doesn't throw on invalid UTF sequence

Status
NEW
Severity
normal
Priority
P3
Component
phobos
Product
D
Version
D2
Platform
Other
OS
All
Creation time
2011-06-08T11:41:02Z
Last change time
2024-12-01T16:14:11Z
Keywords
bootcamp
Assigned to
No Owner
Creator
Andrej Mitrovic
Moved to GitHub: phobos#9906 →

Comments

Comment #0 by andrej.mitrovich — 2011-06-08T11:41:02Z
I'm not sure if this is a bug or wanted behavior: auto x = to!string(cast(char)255); That won't throw. But this will: auto x = to!string(cast(char)255); // or try 128 auto z = toUTF8(x); // throws I've had this example code translated from C: foreach (y; 0 .. 16) foreach (x; 0 .. 16) { auto buffer = to!string(cast(char)(16 * x + y)); auto result = buffer.toUTF16z; // call to utf16z for the winapi } Essentially the code builds a table of characters that it prints out. But it doesn't seem to take into account invalid UTF8 code points. This leads me to another question, how does one iterate through valid UTF code points, starting from 0? Is there a Phobos function that does that?
Comment #1 by andrej.mitrovich — 2016-08-27T21:55:57Z
----- import std.conv; import std.stdio; void main() { auto x = to!string(cast(char)255); writeln(x); } ----- Outputs: [Decode error - output not utf-8] I think the to!() routines should be UTF safe so the call to to!string above should throw an exception. Is this right Andrei?
Comment #2 by andrei — 2016-10-14T16:55:25Z
Well since it doesn't throw we may as well make it nothrow :o) and use the replacement char, or add an overload. I'll bootcamp this.
Comment #3 by lucia.mcojocaru — 2016-11-21T13:36:26Z
Is this a Windows specific bug? I tested the following on Linux 64: 1 import std.conv; 2 import std.stdio; 3 import std.utf; 4 5 void main() 6 { 7 auto x = to!string(cast(char)191); 8 auto z = toUTF8(x); 9 writeln(x); 10 11 12 foreach (y; 0 .. 16) 13 foreach (r; 0 .. 16) 14 { 15 auto buffer = to!string(cast(char)(16 * r + y)); 16 auto b = toUTF8(buffer); 17 writeln(b); 18 // auto result = buffer.toUTF16z; // call to utf16z for the winapi 19 } 20 } Only the commented line throws: core.exception.UnicodeException@src/rt/util/utf.d(292): invalid UTF-8 sequence
Comment #4 by bugzilla — 2019-12-11T14:18:40Z
The original bug isn't windows specific. I don't know if the example from Lucia Cojocaru can be considered the same bug...
Comment #5 by robert.schadek — 2024-12-01T16:14:11Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9906 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB