This is a spin-off from issue 19968.
This program can exhibit undefined behavior even `main` is @safe and `f` is correctly @trusted:
----
void main() @safe
{
bool b = void;
f(b);
}
void f(bool cond) @trusted
{
import core.stdc.stdlib: free, malloc;
byte b;
void* p = cond ? &b : malloc(1);
if(!cond) free(p);
}
----
Typical output:
----
munmap_chunk(): invalid pointer
Error: program killed by signal 6
----
That means `free` is being called on `&b`. That operation has undefined behavior. But that can only happen if `cond` is both true and false at the same time.
Surely, an @trusted function should be allowed to assume that a bool is either true or false, and not both.
Comment #1 by simen.kjaras — 2019-08-21T07:58:51Z
So instead of closing the obvious hole of @safe functions using void initialization we're just poking at symptoms here and there?
Comment #2 by hsteoh — 2020-06-05T23:31:09Z
There's more to it than a hole in @safe. Look at the disassembly below, there seems to be a codegen bug as well:
-------------------
000000000003f698 <_Dmain>:
3f698: 55 push %rbp
3f699: 48 8b ec mov %rsp,%rbp
3f69c: 48 83 ec 10 sub $0x10,%rsp
3f6a0: 40 8a 7d f8 mov -0x8(%rbp),%dil
3f6a4: e8 07 00 00 00 callq 3f6b0 <@trusted void test.f(bool)>
3f6a9: 31 c0 xor %eax,%eax
3f6ab: c9 leaveq
3f6ac: c3 retq
3f6ad: 00 00 add %al,(%rax)
...
000000000003f6b0 <@trusted void test.f(bool)>:
3f6b0: 55 push %rbp
3f6b1: 48 8b ec mov %rsp,%rbp
3f6b4: 48 83 ec 20 sub $0x20,%rsp
3f6b8: 89 7d f8 mov %edi,-0x8(%rbp)
3f6bb: c6 45 e8 00 movb $0x0,-0x18(%rbp)
3f6bf: 40 80 7d f8 00 rex cmpb $0x0,-0x8(%rbp)
3f6c4: 74 06 je 3f6cc <@trusted void test.f(bool)+0x1c>
3f6c6: 48 8d 45 e8 lea -0x18(%rbp),%rax
3f6ca: eb 0a jmp 3f6d6 <@trusted void test.f(bool)+0x26>
3f6cc: bf 01 00 00 00 mov $0x1,%edi
3f6d1: e8 9a fc ff ff callq 3f370 <malloc@plt>
3f6d6: 48 89 45 f0 mov %rax,-0x10(%rbp)
3f6da: 8a 4d f8 mov -0x8(%rbp),%cl
3f6dd: 80 f1 01 xor $0x1,%cl
3f6e0: 74 09 je 3f6eb <@trusted void test.f(bool)+0x3b>
3f6e2: 48 8b 7d f0 mov -0x10(%rbp),%rdi
3f6e6: e8 65 f9 ff ff callq 3f050 <free@plt>
3f6eb: c9 leaveq
3f6ec: c3 retq
---------------
In main(), the value of -0x8(%rbp), apparently where main.b is stored, is loaded into the lower register %dil. But in f(), the value of the entire register %edi is stored in a local variable (coincidentally -0x8(%rbp), but points to a different place because this is now the local scope of the callee). Then a few instructions down this local variable is tested for having all 0's in its value: even though only the lower part of the register was actually loaded in main!
Then after the if-statement, the (lower byte of the) local variable -0x8(%rbp) is loaded into %cl and compared against a literal 1.
Even though technically this codegen works if b is either 0 or 1, it seems inconsistent at best (why compare the entire 32-bit value to 0 when checking for false, but only the lower byte when checking for true?), and in this case outright wrong when b is uninitialized and therefore can have any random garbage value other than 0 or 1.
Comment #3 by hsteoh — 2020-06-05T23:44:42Z
Actually, as far as this bug is concerned, @safe is a red herring, and so is void initialization.
Proof:
---------
bool schrodingersCat() @safe {
union U { bool b; int i; }
U u;
u.i = 2;
return u.b;
}
void main() @safe {
import std.stdio;
bool b = schrodingersCat();
if (b) writeln("alive");
if (!b) writeln("dead");
}
---------
Output:
---------
alive
dead
---------
Apparently, D semantics exhibit quantum mechanical effects!
Comment #4 by Patrick.Schluter — 2020-06-06T09:21:34Z
(In reply to hsteoh from comment #2)
> There's more to it than a hole in @safe. Look at the disassembly below,
> there seems to be a codegen bug as well:
>
> -------------------
> 000000000003f698 <_Dmain>:
> 3f698: 55 push %rbp
> 3f699: 48 8b ec mov %rsp,%rbp
> 3f69c: 48 83 ec 10 sub $0x10,%rsp
> 3f6a0: 40 8a 7d f8 mov -0x8(%rbp),%dil
The bug is here and only in dmd!
gdb and ldc use movzx to load the EDI register no mov. When b is initialized the error doesn't manifest as it reuses the EAX register to load EDI that it had used to zero the byte.
This said. The example doesn't compile with option -O . It returns then
<source>(4): Error: variable b used before set
>
> Even though technically this codegen works if b is either 0 or 1, it seems
> inconsistent at best (why compare the entire 32-bit value to 0 when checking
> for false, but only the lower byte when checking for true?), and in this
> case outright wrong when b is uninitialized and therefore can have any
> random garbage value other than 0 or 1.
This is C integer promotion rule. bool being really just an integral type with 2 values instead of being a real special thing (see Java for the drawbacks of that).
Comment #5 by dlang-bot — 2023-06-28T10:10:41Z
@dkorpel created dlang/dmd pull request #15362 "Fix 20148 - void initializated bool can be both true and false" fixing this issue:
- Fix 20148 - void initializated bool can be both true and false
https://github.com/dlang/dmd/pull/15362
Comment #6 by bugzilla — 2023-07-21T00:44:48Z
(In reply to hsteoh from comment #2)
> There's more to it than a hole in @safe. Look at the disassembly below,
> there seems to be a codegen bug as well:
>
> -------------------
> 000000000003f698 <_Dmain>:
> 3f698: 55 push %rbp
> 3f699: 48 8b ec mov %rsp,%rbp
> 3f69c: 48 83 ec 10 sub $0x10,%rsp
> 3f6a0: 40 8a 7d f8 mov -0x8(%rbp),%dil
> 3f6a4: e8 07 00 00 00 callq 3f6b0 <@trusted void
> test.f(bool)>
> 3f6a9: 31 c0 xor %eax,%eax
> 3f6ab: c9 leaveq
> 3f6ac: c3 retq
> 3f6ad: 00 00 add %al,(%rax)
> ...
>
> 000000000003f6b0 <@trusted void test.f(bool)>:
> 3f6b0: 55 push %rbp
> 3f6b1: 48 8b ec mov %rsp,%rbp
> 3f6b4: 48 83 ec 20 sub $0x20,%rsp
> 3f6b8: 89 7d f8 mov %edi,-0x8(%rbp)
> 3f6bb: c6 45 e8 00 movb $0x0,-0x18(%rbp)
> 3f6bf: 40 80 7d f8 00 rex cmpb $0x0,-0x8(%rbp)
> 3f6c4: 74 06 je 3f6cc <@trusted void
> test.f(bool)+0x1c>
> 3f6c6: 48 8d 45 e8 lea -0x18(%rbp),%rax
> 3f6ca: eb 0a jmp 3f6d6 <@trusted void
> test.f(bool)+0x26>
> 3f6cc: bf 01 00 00 00 mov $0x1,%edi
> 3f6d1: e8 9a fc ff ff callq 3f370 <malloc@plt>
> 3f6d6: 48 89 45 f0 mov %rax,-0x10(%rbp)
> 3f6da: 8a 4d f8 mov -0x8(%rbp),%cl
> 3f6dd: 80 f1 01 xor $0x1,%cl
> 3f6e0: 74 09 je 3f6eb <@trusted void
> test.f(bool)+0x3b>
> 3f6e2: 48 8b 7d f0 mov -0x10(%rbp),%rdi
> 3f6e6: e8 65 f9 ff ff callq 3f050 <free@plt>
> 3f6eb: c9 leaveq
> 3f6ec: c3 retq
> ---------------
>
> In main(), the value of -0x8(%rbp), apparently where main.b is stored, is
> loaded into the lower register %dil. But in f(), the value of the entire
> register %edi is stored in a local variable (coincidentally -0x8(%rbp), but
> points to a different place because this is now the local scope of the
> callee). Then a few instructions down this local variable is tested for
> having all 0's in its value: even though only the lower part of the register
> was actually loaded in main!
>
> Then after the if-statement, the (lower byte of the) local variable
> -0x8(%rbp) is loaded into %cl and compared against a literal 1.
>
> Even though technically this codegen works if b is either 0 or 1, it seems
> inconsistent at best (why compare the entire 32-bit value to 0 when checking
> for false, but only the lower byte when checking for true?), and in this
> case outright wrong when b is uninitialized and therefore can have any
> random garbage value other than 0 or 1.
The code gen looks correct to me. The cmp is a byte compare instruction which only looks at the least significant byte, where the bool was stored.