Bug 6505 – Wrong code for expression involving 8 floats, only with -O

Status
RESOLVED
Resolution
FIXED
Severity
blocker
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
Other
OS
Windows
Creation time
2011-08-16T04:33:00Z
Last change time
2015-06-09T05:11:58Z
Assigned to
nobody
Creator
clugdbug

Comments

Comment #0 by clugdbug — 2011-08-16T04:33:20Z
Found this when I removed a reference to std.random from the DMD test suite. This bug is in test22.d. It was originally described in: http://www.digitalmars.com/d/archives/digitalmars/D/bugs/4766.html but wasn't reduced correctly -- it has never been fixed. It only applies when compiling with -O: the result is -nan. ---- real randx() { return 1.2; } void test1() { float x10=randx(); float x11=randx(); float x20=randx(); float x21=randx(); float y10=randx(); float y11=randx(); float y20=randx(); float y21=randx(); float tmp=( x20*x21 + y10*y10 + y10*y11 + y11*y11 + y11*y20 + y20*y20 + y10*y21 + y11*y21 + y21*y21); assert(tmp > 0); }
Comment #1 by bearophile_hugs — 2011-08-16T06:37:50Z
Simplified a little: double foo() { return 1.0; } void main() { double a = foo(); double b = foo(); double x = a*a + a*a + a*a + a*a + a*a + a*a + a*a + a*b + a*b; assert(x > 0); } --------------------------- Asm normal compilation: _D4test3fooFZd comdat fld1 ret __Dmain comdat L0: enter 024h,0 call near ptr _D4test3fooFZd fstp qword ptr -018h[EBP] call near ptr _D4test3fooFZd fstp qword ptr -010h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -010h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -010h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fdecstp fstp qword ptr -024h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST fld qword ptr -024h[EBP] faddp ST(1),ST fstp qword ptr -8[EBP] fld qword ptr -8[EBP] ftst fstsw AX sahf fstp ST ja L7B mov EAX,9 call near ptr _D4test8__assertFiZv L7B: xor EAX,EAX leave ret ---------------------------- Asm compilation with -O: _D4test3fooFZd comdat fld qword ptr FLAT:_DATA[00h] ret __Dmain comdat L0: sub ESP,034h call near ptr _D4test3fooFZd fstp qword ptr 0Ch[ESP] call near ptr _D4test3fooFZd fld qword ptr 0Ch[ESP] fld qword ptr 0Ch[ESP] fxch ST2 fstp qword ptr 014h[ESP] fmul qword ptr 014h[ESP] fxch ST1 fld qword ptr 0Ch[ESP] fxch ST1 fmul qword ptr 014h[ESP] fxch ST1 fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fdecstp fld qword ptr 0Ch[ESP] fxch ST1 fstp qword ptr [ESP] fmul ST,ST(0) faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST fld qword ptr [ESP] faddp ST(1),ST ftst fstsw AX fstp ST sahf ja L84 mov EAX,9 call near ptr _D4test8__assertFiZv L84 add ESP,034h xor EAX,EAX ret ------------------------
Comment #2 by clugdbug — 2011-08-23T01:52:36Z
Thanks bearophile, that reduction is very helpful. The code loads 8 values on the x87 stack. The eighth one would cause an overflow, so in cg87.c, push87(), it inserts: fdecstp; fstp [EBP - 24]; The next line is: fld [EBP - 18]; The code passes to the scheduler, where there's a bug: it converts the code to: fdecstp; fld [EBP - 18]; fxch ST(1); fstp [EBP - 24]; This is an x87 stack overflow, since the load happens before space has been made in the save. The x87 returns a NaN.
Comment #3 by clugdbug — 2011-08-23T12:19:44Z
Here's the problem. The scheduler keeps track of the number of used x87 registers via the "fpustackused" variable. Each instruction has a "fpuadjust" value which says how many x87 registers it uses. The problem is CALL instructions. They have fpuadjust = 0. But, a function returning float returns it in ST(0) --- so fpuadjust should be +1. And if it's a complex result, fpuadjust should be +2. I think it should be possible to put an assert in cgsched.c, line 2491: if (tbl[j]) { fpu += tbl[j]->fpuadjust; + assert(fpu >= 0); if (fpu >= 8) // if FPU stack overflow but this would fail at the moment, because the total goes negative after a call followed by an FSTP. So it needs to distinguish between each type of call, depending on how it affects the FPU stack. I think the bug lies in funccall() in cod1.c; it should be setting something in 'code' to record how many x87 registers are returned. Then, there'll be another minor problem: the number of FPU values CAN reach 8. This is because of the code in push87() in cg87.c, which does fdecstp; fstp; when the stack is full. So the condition above will need to change to: if (fpu > 8) // if FPU stack overflow This is too invasive a change for me to make and test.
Comment #4 by bugzilla — 2011-08-23T15:23:43Z