Found by Namal in D.learn:
http://forum.dlang.org/post/[email protected]
Original code:
----
void main()
{
import std.algorithm;
import std.string;
char[] line;
auto bytes = line.representation.dup;
bytes.sort;
string result = bytes.assumeUTF; /* should be rejected */
}
----
Reduced to show it's a compiler bug:
----
char[] assumeUTF(ubyte[] str) pure { return cast(char[]) str; }
void main()
{
ubyte[] b = ['a', 'b', 'c'];
string s = assumeUTF(b); /* should be rejected */
assert(s == "abc"); /* passes */
b[0] = '!';
assert(s == "abc"); /* fails */
}
----
Another variant to show it's not about arrays or the char type:
----
ubyte* toBytePointer(uint* p) pure { return cast(ubyte*) p; }
void main()
{
uint* i = new uint;
immutable ubyte* b = toBytePointer(i); /* should be rejected */
*i = 0xFF_FF_FF_FF;
assert(*b != 0xFF); /* fails */
}
----
Comment #1 by schveiguy — 2017-07-15T11:15:10Z
I actually think it's a design problem. assumeUTF is marked pure. The input is ubyte and the output is char. This means the compiler can reasonably assume the output is unrelated to the input and therefore unique.
This is quite a pickle. We can't very well unmark it pure, and I think the compiler logic is sound.
Comment #2 by ag0aep6g — 2017-07-15T12:58:51Z
(In reply to Steven Schveighoffer from comment #1)
> I actually think it's a design problem. assumeUTF is marked pure. The input
> is ubyte and the output is char. This means the compiler can reasonably
> assume the output is unrelated to the input and therefore unique.
>
> This is quite a pickle. We can't very well unmark it pure, and I think the
> compiler logic is sound.
I don't agree that the compiler logic is sound. The casts are valid. The compiler cannot assume that they don't occur.
It even happens with classes (no cast needed):
----
class B { int x; }
class C : B {}
B toB(C c) pure { return c; }
void main()
{
C c = new C;
c.x = 1;
immutable B b = toB(c); /* should be rejected */
assert(b.x == 1); /* passes */
c.x = 2;
assert(b.x == 1); /* fails */
}
----
Comment #3 by schveiguy — 2017-07-15T14:21:25Z
I'm not sure the UB rules for D and aliasing. In C you definitely can run into things like the array cast being considered unrelated.
The class case is definitely a bug.
Comment #4 by ag0aep6g — 2017-07-15T14:48:36Z
(In reply to Steven Schveighoffer from comment #3)
> I'm not sure the UB rules for D and aliasing. In C you definitely can run
> into things like the array cast being considered unrelated.
As far as I know, C's strict aliasing rule isn't exactly uncontroversial. Personally, I think it's an abomination.
> The class case is definitely a bug.
Even with the strict aliasing rule, there is a type that is allowed to alias others. In C it's char. That would be ubyte in D, I guess. The non-class examples all involve ubyte. So even with C-like strict aliasing, they should be rejected.
Comment #5 by schveiguy — 2017-07-15T15:27:09Z
Then the example could be changed to wchar and ushort
Comment #6 by robert.schadek — 2024-12-13T18:53:22Z