← Back to index | Original Bugzilla link

Bug 6505 – Wrong code for expression involving 8 floats, only with -O

Status: RESOLVED
Resolution: FIXED
Severity: blocker
Priority: P2
Component: dmd
Product: D
Version: D2
Platform: Other
OS: Windows
Creation time: 2011-08-16T04:33:00Z
Last change time: 2015-06-09T05:11:58Z
Assigned to: nobody
Creator: clugdbug

Comments

Comment #0 by clugdbug — 2011-08-16T04:33:20Z

Found this when I removed a reference to std.random from the DMD test suite. This bug is in test22.d. It was originally described in: http://www.digitalmars.com/d/archives/digitalmars/D/bugs/4766.html but wasn't reduced correctly -- it has never been fixed. It only applies when compiling with -O: the result is -nan. ---- real randx() { return 1.2; } void test1() { float x10=randx(); float x11=randx(); float x20=randx(); float x21=randx(); float y10=randx(); float y11=randx(); float y20=randx(); float y21=randx(); float tmp=( x20*x21 + y10*y10 + y10*y11 + y11*y11 + y11*y20 + y20*y20 + y10*y21 + y11*y21 + y21*y21); assert(tmp > 0); }

Comment #1 by bearophile_hugs — 2011-08-16T06:37:50Z

Simplified a little: double foo() { return 1.0; } void main() { double a = foo(); double b = foo(); double x = a*a + a*a + a*a + a*a + a*a + a*a + a*a + a*b + a*b; assert(x > 0); } --------------------------- Asm normal compilation: _D4test3fooFZd comdat fld1 ret __Dmain comdat L0: enter 024h,0 call near ptr _D4test3fooFZd fstp qword ptr -018h[EBP] call near ptr _D4test3fooFZd fstp qword ptr -010h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -010h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -010h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] fdecstp fstp qword ptr -024h[EBP] fld qword ptr -018h[EBP] fmul qword ptr -018h[EBP] faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST fld qword ptr -024h[EBP] faddp ST(1),ST fstp qword ptr -8[EBP] fld qword ptr -8[EBP] ftst fstsw AX sahf fstp ST ja L7B mov EAX,9 call near ptr _D4test8__assertFiZv L7B: xor EAX,EAX leave ret ---------------------------- Asm compilation with -O: _D4test3fooFZd comdat fld qword ptr FLAT:_DATA[00h] ret __Dmain comdat L0: sub ESP,034h call near ptr _D4test3fooFZd fstp qword ptr 0Ch[ESP] call near ptr _D4test3fooFZd fld qword ptr 0Ch[ESP] fld qword ptr 0Ch[ESP] fxch ST2 fstp qword ptr 014h[ESP] fmul qword ptr 014h[ESP] fxch ST1 fld qword ptr 0Ch[ESP] fxch ST1 fmul qword ptr 014h[ESP] fxch ST1 fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fld qword ptr 0Ch[ESP] fmul ST,ST(0) fdecstp fld qword ptr 0Ch[ESP] fxch ST1 fstp qword ptr [ESP] fmul ST,ST(0) faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST faddp ST(1),ST fld qword ptr [ESP] faddp ST(1),ST ftst fstsw AX fstp ST sahf ja L84 mov EAX,9 call near ptr _D4test8__assertFiZv L84 add ESP,034h xor EAX,EAX ret ------------------------

Comment #2 by clugdbug — 2011-08-23T01:52:36Z

Thanks bearophile, that reduction is very helpful. The code loads 8 values on the x87 stack. The eighth one would cause an overflow, so in cg87.c, push87(), it inserts: fdecstp; fstp [EBP - 24]; The next line is: fld [EBP - 18]; The code passes to the scheduler, where there's a bug: it converts the code to: fdecstp; fld [EBP - 18]; fxch ST(1); fstp [EBP - 24]; This is an x87 stack overflow, since the load happens before space has been made in the save. The x87 returns a NaN.

Comment #3 by clugdbug — 2011-08-23T12:19:44Z

Here's the problem. The scheduler keeps track of the number of used x87 registers via the "fpustackused" variable. Each instruction has a "fpuadjust" value which says how many x87 registers it uses. The problem is CALL instructions. They have fpuadjust = 0. But, a function returning float returns it in ST(0) --- so fpuadjust should be +1. And if it's a complex result, fpuadjust should be +2. I think it should be possible to put an assert in cgsched.c, line 2491: if (tbl[j]) { fpu += tbl[j]->fpuadjust; + assert(fpu >= 0); if (fpu >= 8) // if FPU stack overflow but this would fail at the moment, because the total goes negative after a call followed by an FSTP. So it needs to distinguish between each type of call, depending on how it affects the FPU stack. I think the bug lies in funccall() in cod1.c; it should be setting something in 'code' to record how many x87 registers are returned. Then, there'll be another minor problem: the number of FPU values CAN reach 8. This is because of the code in push87() in cg87.c, which does fdecstp; fstp; when the stack is full. So the condition above will need to change to: if (fpu > 8) // if FPU stack overflow This is too invasive a change for me to make and test.

Comment #4 by bugzilla — 2011-08-23T15:23:43Z

https://github.com/D-Programming-Language/dmd/commit/628640f45bbc2bed3e84e835e19cf03ee2c4eafe https://github.com/D-Programming-Language/dmd/commit/a92f6bf42e547c6978e4e317b4feb2c6610cf518