Created attachment 1849
main source
With DMD 2.100-beta.1,
Consider the following program:
------------ main.d ------------
import core.simd;
import core.stdc.stdio;
float4 _mm_rcp_ss (float4 a) pure @trusted
{
return cast(float4) __simd(XMM.RCPSS, a);
}
void main()
{
float4 A = [2.34f, -70000.0f, 0.00001f, 345.5f];
float4 correct = [1 / 2.34f, -70000.0f, 0.00001f, 345.5f];
float4 R = _mm_rcp_ss(A);
// sometimes DMD clears to zero the high values.
assert(R.array[1] == correct.array[1]);
assert(R.array[2] == correct.array[2]);
assert(R.array[3] == correct.array[3]);
}
--------------------------------
The first assertion fails when built with:
$ dmd -inline -m64 main.d
RCPSS is used, but the top of the register/variable is cleared to zero when XMM.RCPSS is inline into the unittest.
Comment #1 by bugzilla — 2022-04-24T07:38:15Z
I finally figured out what was going on here. The code generated is:
float4 A = [2.34f, -70000.0f, 0.00001f, 345.5f];
movaps XMM0,FLAT:.rodata[00h][RIP]
movaps -020h[RBP],XMM0
float4 R = cast(float4) __simd(XMM.RCPSS, A);
rcpss XMM1,-020h[RBP] (*)
movaps -010h[RBP],XMM1
assert(R.array[1] == -70000.0f)
movss XMM2,-0Ch[RBP]
...
(*) rcpss stores a value into the lower 4 bytes of XMM1, leaving the rest of XMM1 unchanged. But, according to the compiler, the entirety of XMM1 was changed by the assignment, even though it wasn't. Hence, the upper 12 bytes of XMM1 are garbage.
You can make it work by explicitly passing the implicit argument:
float4 R = A;
R = cast(float4) __simd(XMM.RCPSS, R, A);