Bug 18794 – Compiling with -O causes runtime segfault

Status
RESOLVED
Resolution
FIXED
Severity
major
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2018-04-24T18:25:19Z
Last change time
2019-04-20T17:45:37Z
Keywords
pull, wrong-code
Assigned to
No Owner
Creator
hsteoh
Blocks
18750

Comments

Comment #0 by hsteoh — 2018-04-24T18:25:19Z
Reduced code: ------------ void func() {} struct S { size_t[] impl; this(int v) { impl = [0]; } bool method(int v) { int wordIdx = v >> 6; int bitIdx = v & 0b00111111; func(); if (wordIdx >= impl.length) return false; return (impl[0] & (1UL << bitIdx)) != 0; } } void main() { auto s = S(0); s.method(0); } ------------ Compile command: dmd -O -run test.d Output: ------------ Error: program killed by signal 11 ------------ Compiling without -O fixes the problem. Commenting out the call to func() also makes the problem go away. Also, the details of method() seem quite important; changing the return statement in various ways seems to make the problem go away, though I'm not 100% certain the current form is minimal. Eliding the ctor call also makes the problem go away, though I didn't explore all possible combinations. In the original call, wordIdx is used to index the impl array, but the problem seems to persist even when impl[0] is hardcoded. However, removing the test `wordIdx >= impl.length` seems to mask the problem. So there's something about it that's triggering the wrong code.
Comment #1 by ag0aep6g — 2018-04-24T20:42:18Z
Reduced: ---- bool method(size_t* p) { int bitIdx = 0; func(); return (*p & (1UL << bitIdx)) != 0; } void func() {} void prep() { asm {} ulong[2] x = -1; } void main() { prep(); size_t s; method(&s); } ---- Generated code for `method`: ---- 0: 55 push rbp 1: 48 8b ec mov rbp,rsp 4: 48 83 ec 10 sub rsp,0x10 8: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi c: c7 45 f0 00 00 00 00 mov DWORD PTR [rbp-0x10],0x0 13: e8 00 00 00 00 call 18 <_D4test6methodFPmZb+0x18> 14: R_X86_64_PLT32 _D4test4funcFZv-0x4 18: 48 8b 45 f8 mov rax,QWORD PTR [rbp-0x8] 1c: 48 8b 4d f0 mov rcx,QWORD PTR [rbp-0x10] 20: 48 0f a3 08 bt QWORD PTR [rax],rcx 24: 48 0f 92 c0 rex.W setb al 28: 48 8b e5 mov rsp,rbp 2b: 5d pop rbp 2c: c3 ret ---- bitIdx is a DWORD at rbp-0x10. But later a QWORD is read from there and used in the bt instruction. So that reads garbage from the stack. The garbage can be controlled by prep. Looks like this is directly related to the generation of the bt instruction, which is horribly broken. But it doesn't seem to be a duplicate of the known issues. Adding to the tracker.
Comment #2 by hsteoh — 2018-04-24T20:47:18Z
Just a side-note that although in my own environment, the problem can be reproduced with -O alone, apparently in some other environments specifying both -O and -profile is necessary to trigger the bug.
Comment #3 by issues.dlang — 2018-04-24T22:27:32Z
I can reproduce this on FreeBSD x86_64 with master, but I have to use -profile with -O. -O by itself doesn't trigger it for me. And adding -inline seems to get rid of the problem.
Comment #4 by hsteoh — 2018-04-24T22:37:31Z
Just tested in my environment, -inline does indeed make the problem go away. (Mask it, probably.) However, I can still reproduce the problem with just -O, even though the original problem was discovered when I compiled with -O -profile. For reference, I'm running dmd git commit b7f9af8766af90f221227946ba52f546e3188f9c.
Comment #5 by ag0aep6g — 2018-04-25T05:21:34Z
(In reply to ag0aep6g from comment #1) > bitIdx is a DWORD at rbp-0x10. But later a QWORD is read from there and used > in the bt instruction. So that reads garbage from the stack. The garbage can > be controlled by prep. It's probably worth pointing out that the result is still wrong even when prep zeroes the high bits. An int can't just be used as the low half of a long. (In reply to hsteoh from comment #2) > Just a side-note that although in my own environment, the problem can be > reproduced with -O alone, apparently in some other environments specifying > both -O and -profile is necessary to trigger the bug. (In reply to Jonathan M Davis from comment #3) > I can reproduce this on FreeBSD x86_64 with master, but I have to use > -profile with -O. -O by itself doesn't trigger it for me. You guys are talking about the original code, right? The behavior relies on stack garbage, so it makes sense that it isn't reproducible everywhere. If you happen to have zeroes at the particular stack address, you won't see a segfault. The modified code in comment #1 should segfault consistently with just -O (and without -inline).
Comment #6 by dlang-bot — 2019-04-20T15:47:32Z
@aG0aep6G created dlang/dmd pull request #9658 "fix issue 18794 - Compiling with -O causes runtime segfault" fixing this issue: - fix issue 18794 - Compiling with -O causes runtime segfault Just adding a test. The issue has apparently been fixed by d80e14ba6037373f08c6dba274368408932d9e48. https://github.com/dlang/dmd/pull/9658
Comment #7 by dlang-bot — 2019-04-20T17:45:37Z
dlang/dmd pull request #9658 "fix issue 18794 - Compiling with -O causes runtime segfault" was merged into master: - 067645189191f9739e7151f2b02275ab3ea65557 by aG0aep6G: fix issue 18794 - Compiling with -O causes runtime segfault Just adding a test. The issue has apparently been fixed by d80e14ba6037373f08c6dba274368408932d9e48. https://github.com/dlang/dmd/pull/9658