Bug 17965 – Usage of the FPU while function result already in right XMM registers
Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
All
Creation time
2017-11-03T18:27:27Z
Last change time
2021-02-28T07:27:35Z
Keywords
backend, performance, pull, wrong-code
Assigned to
No Owner
Creator
Basile-z
Comments
Comment #0 by b2.temp — 2017-11-03T18:27:27Z
For the following trivial function:
---
struct Point{double x,y;}
Point foo()
{
Point result;
return result;
}
---
dmd64 with -O generates:
;------- SUB 000000000044E1C0h -------
000000000044E1C0h push rbp
000000000044E1C1h mov rbp, rsp
000000000044E1C4h sub rsp, 20h
000000000044E1C8h lea rax, qword ptr [00000000004C92F0h]
000000000044E1CFh movsd xmm0, qword ptr [rax] // result.x = 0; // default init OK
000000000044E1D3h movsd qword ptr [rbp-10h], xmm0 // load result.x in a temp because ?
000000000044E1D9h movsd xmm1, qword ptr [rax+08h] // result.y = 0; // default init OK
000000000044E1DEh movsd qword ptr [rbp-08h], xmm1 // load result.y in a temp because ?
000000000044E1E4h fld qword ptr [rbp-10h] // pass the whole result to the FPU because ?
000000000044E1E7h fld qword ptr [rbp-08h] // ...
000000000044E1EAh fstp qword ptr [rbp-20h] // ...
000000000044E1EDh movsd xmm1, qword ptr [rbp-20h] // reload back result to XMM0 and 1 because?
000000000044E1F2h fstp qword ptr [rbp-20h] //
000000000044E1F5h movsd xmm0, qword ptr [rbp-20h] // .
000000000044E1FAh mov rsp, rbp
000000000044E1FDh pop rbp
000000000044E1FEh ret
;-------------------------------------
Point.x is returned in low XMM0 half and Point.y in low XMM1 half.
from 000000000044E1E4h to 000000000044E1F5h, the result is loaded in the FPU and then loaded back in XMM0 and XMM1 for no reasons. In addition, 32 bytes are allocated for this useless transfert, leading to the prelude and prologue to be emitted.
Expected backend production is something like
---
lea rax, qword ptr [<address of init>]
movsd xmm0, qword ptr [rax]
movsd xmm1, qword ptr [rax+08h]
ret
---
Comment #1 by aliloko — 2017-11-03T20:01:26Z
You can avoid such FPU <=> XMM round trips by using the LDC compiler. The DMD backend tend to generate those.
Comment #2 by b2.temp — 2017-11-03T20:51:35Z
(In reply to ponce from comment #1)
> You can avoid such FPU <=> XMM round trips by using the LDC compiler. The
> DMD backend tend to generate those.
I know p0nce, but this is really a "pathological case" to me. The "trip", as you say, that happens at the end is a bug, not just something that could be better.
Comment #3 by aliloko — 2017-11-04T00:13:00Z
@Basile: I also find that unsettling.
Comment #4 by bitter.taste — 2018-03-29T10:40:14Z
This is a side-effect of a silly optimization that turns Point into a complex number instead of a double[2] as one would expect.
You can see by yourself how the codegen improves if `TYcdouble` is replaced by `TYdouble` in `elstruct` (in the `if (I64 && targ1 && targ2)` branch):
---
33090: 55 push rbp
33091: 48 8b ec mov rbp,rsp
33094: 48 8d 05 b5 bf 02 00 lea rax,[rip+0x2bfb5] # 5f050 <_D3foo5Point6__initZ>
3309b: 66 0f 28 00 movapd xmm0,XMMWORD PTR [rax]
3309f: 5d pop rbp
330a0: c3 ret
---
But I'm pretty sure this change may wreak havoc due to various concerns regarding the alignment and scalar broadcasting.
The bottom line here is:
> You can avoid such FPU <=> XMM round trips by using the LDC compiler.
Or GDC, everything but DMD is fine.
Comment #5 by dlang-bot — 2020-09-05T04:40:02Z
@WalterBright created dlang/dmd pull request #11693 "fix Issue 17965 - Usage of the FPU while function result already in r…" fixing this issue:
- fix Issue 17965 - Usage of the FPU while function result already in right XMM registers
https://github.com/dlang/dmd/pull/11693
Comment #6 by dlang-bot — 2020-09-08T00:05:35Z
dlang/dmd pull request #11693 "fix Issue 17965 - Usage of the FPU while function result already in r…" was merged into master:
- 8f620dab0d9de5dea50161bda7470dbad8705293 by Walter Bright:
fix Issue 17965 - Usage of the FPU while function result already in right XMM registers
https://github.com/dlang/dmd/pull/11693
Comment #7 by dlang-bot — 2021-02-28T07:27:35Z
dlang/dmd pull request #12219 "test: Move runnable tests for complex and imaginary types into runnable/complex.d" was merged into master:
- e8f605f6650b916eb70ba58ae3cee385dfc549f3 by Iain Buclaw:
test: Move test for issue 17965 to own source file
https://github.com/dlang/dmd/pull/12219