The crash is caused by wrong codegen for the toString() method. Here's a comparison of a toString() method for a struct vs. for a union:
Struct code:
--------
struct Jack
{
float f;
int i;
string toString()
{
string s;
return s;
}
}
--------
Generated assembly for Jack.toString:
--------
447dc: 55 push %rbp
447dd: 48 8b ec mov %rsp,%rbp
447e0: 31 c0 xor %eax,%eax
447e2: 31 d2 xor %edx,%edx
447e4: 5d pop %rbp
447e5: c3 retq
--------
As can be easily seen, this basically initializes the string s to have .ptr=null, .length=0, returned by value in %eax and %edx. This is the correct behaviour.
Union code (basically the same as the struct code, just change 'struct' to 'union'):
--------
union Jack
{
float f;
int i;
string toString()
{
string s;
return s;
}
}
--------
Generated assembly for Jack.toString:
--------
4471c: 55 push %rbp
4471d: 48 8b ec mov %rsp,%rbp
44720: 48 83 ec 10 sub $0x10,%rsp
44724: 48 8b 55 f8 mov -0x8(%rbp),%rdx
44728: 48 8b 45 f0 mov -0x10(%rbp),%rax
4472c: c9 leaveq
4472d: c3 retq
--------
Here, the 'sub' line appears to be allocating space on the stack for a local variable, presumeably s. But then it fails to initialize s before loading it into the return registers %rdx and %rax. I'm assuming that this is probably why the allocation of the local variable never got elided, as it was in the struct case -- the optimizer doesn't see the missing initialization of s, so it cannot elide the loads from the stack.
So it looks like the bug is caused by failure to initialize s when generating code for a union vs. a struct. Why, though, is a mystery to me, since toString never actually references `this`, so in theory whether `this` is a struct or union shouldn't matter.
Comment #2 by hsteoh — 2018-01-12T18:25:07Z
Note also, the name `toString` is irrelevant to this bug. The same codegen bug appears if you rename the method to something else, like 'member'.
Furthermore, the local variable doesn't have to be a string; any local variable without an initializer would also exhibit the same missing initialization, it doesn't initialize the local to .init, as it should.
Comment #3 by hsteoh — 2018-01-12T18:41:05Z
Minimized code:
------
union U {
int method() {
int x;
return x;
}
}
------
The disassembly shows that x is never initialized to 0, and a garbage value is returned.
Comment #4 by hsteoh — 2018-01-12T19:44:19Z
More interesting clues: running a union method inside CTFE containing a local variable without an explicit initializer causes a CTFE error "cannot modify variable at compile time", whereas explicitly initializing the local variable works.
------
union U {
int method() {
int x; // causes CTFE failure unless explicitly initialized
return x;
}
}
enum y = U.init.method();
------