Comment #0 by matti.niemenmaa+dbugzilla — 2008-07-05T05:33:21Z
As it stands it is quite impossible to use an API which is strict about keeping char as UTF-8 only. For instance, given a function which turns a C string into a D string:
// accepts anything, not only UTF-8, hence ubyte and not char
ubyte[] fromStringz(ubyte*);
One can't call it with a char*, even though the function itself would work fine. Casting works, of course, but the end result of such is that code starts to look something like the following:
auto x = getSomeUTF8();
auto y = &x[5];
x = cast(char[])foo(cast(ubyte*)y);
bar(cast(ubyte[])x);
return cast(ubyte)x[0];
This is far too verbose and unreadable. The only "real" cast there is the cast(char[]) which asserts that foo, given UTF-8, returns valid UTF-8. The rest are essentially just saying that "yes, UTF-8 bytes are the same size as ubytes!", which should not be necessary.
The wchar->ushort and dchar->uint conversions should be included for completeness's sake, but I suspect they aren't as necessary.
Comment #1 by bearophile_hugs — 2013-02-22T09:37:18Z
Now the signatures of std.string.toStringz are:
pure nothrow immutable(char)* toStringz(const(char)[] s);
pure nothrow immutable(char)* toStringz(string s);
Consider is it's worth keeping this issue open.
Comment #2 by robert.schadek — 2024-12-13T17:48:37Z