Bug 23049 – [SIMD][CODEGEN] Wrong code for XMM.RCPSS after inlining

Status
RESOLVED
Resolution
INVALID
Severity
major
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
All
Creation time
2022-04-23T14:36:06Z
Last change time
2022-04-24T08:54:53Z
Keywords
backend, SIMD, wrong-code
Assigned to
No Owner
Creator
ponce

Attachments

IDFilenameSummaryContent-TypeSize
1849main.dmain sourcetext/plain502

Comments

Comment #0 by aliloko — 2022-04-23T14:36:06Z
Created attachment 1849 main source With DMD 2.100-beta.1, Consider the following program: ------------ main.d ------------ import core.simd; import core.stdc.stdio; float4 _mm_rcp_ss (float4 a) pure @trusted { return cast(float4) __simd(XMM.RCPSS, a); } void main() { float4 A = [2.34f, -70000.0f, 0.00001f, 345.5f]; float4 correct = [1 / 2.34f, -70000.0f, 0.00001f, 345.5f]; float4 R = _mm_rcp_ss(A); // sometimes DMD clears to zero the high values. assert(R.array[1] == correct.array[1]); assert(R.array[2] == correct.array[2]); assert(R.array[3] == correct.array[3]); } -------------------------------- The first assertion fails when built with: $ dmd -inline -m64 main.d RCPSS is used, but the top of the register/variable is cleared to zero when XMM.RCPSS is inline into the unittest.
Comment #1 by bugzilla — 2022-04-24T07:38:15Z
I finally figured out what was going on here. The code generated is: float4 A = [2.34f, -70000.0f, 0.00001f, 345.5f]; movaps XMM0,FLAT:.rodata[00h][RIP] movaps -020h[RBP],XMM0 float4 R = cast(float4) __simd(XMM.RCPSS, A); rcpss XMM1,-020h[RBP] (*) movaps -010h[RBP],XMM1 assert(R.array[1] == -70000.0f) movss XMM2,-0Ch[RBP] ... (*) rcpss stores a value into the lower 4 bytes of XMM1, leaving the rest of XMM1 unchanged. But, according to the compiler, the entirety of XMM1 was changed by the assignment, even though it wasn't. Hence, the upper 12 bytes of XMM1 are garbage. You can make it work by explicitly passing the implicit argument: float4 R = A; R = cast(float4) __simd(XMM.RCPSS, R, A);
Comment #2 by bugzilla — 2022-04-24T07:55:20Z
Comment #3 by aliloko — 2022-04-24T08:54:53Z
Thanks for the __simd explanations!