Created attachment 1848
main source
Using DMD v2.100.0-beta.1-dirty,
consider the following program:
----------- main.d -------------
import core.stdc.stdio;
import core.simd;
double2 _mm_loadr_pd (const(double)* mem_addr)
{
double2 a = *cast(double2*)(mem_addr);
double2 r;
r.ptr[0] = a.array[1];
r.ptr[1] = a.array[0];
return r;
}
unittest
{
align(16) double[2] A = [56.0, -74.0];
double2 R = _mm_loadr_pd(A.ptr);
}
double2 _mm_loadu_pd (const(double)* mem_addr)
{
return cast(double2) __simd(XMM.LODUPD, *mem_addr);
}
unittest
{
double[2] A = [56.0, -75.0];
double2 R = _mm_loadu_pd(A.ptr);
printf("%f %f\n", R[0], R[1]);
double[2] correct = [56.0, -75.0];
assert(R.array == correct);
}
void main()
{
}
--------------------------------
To reproduce:
$ dmd -m64 -inline -O main.d -unittest
$ main.exe
This outputs:
56.000000 -74.000000
main.d(29): [unittest] unittest failure
1/1 modules FAILED unittests
instead of the normal:
56.000000 -75.000000
1 modules passed unittests
Notes:
- -O, -inline, and -unittest are necessary.
- _mm_loadu_pd is inline into the unittest
- the 1st unittest is necessary, what happens seems to be that a former variable or register is reused
Comment #1 by bugzilla — 2022-04-24T04:54:30Z
A smaller test with -O -unittest :
import core.simd;
unittest
{
align(16) double[2] A = [56.0, -74.0];
}
unittest
{
double[2] A = [56.0, -75.0];
double2 R = cast(double2) __simd(XMM.LODUPD, *A.ptr);
assert(R.array == A);
}
void main()
{
}
Comment #2 by bugzilla — 2022-04-24T06:28:08Z
The problem is with the lines:
double[2] A = [56.0, -75.0];
double2 R = cast(double2) __simd(XMM.LODUPD, *A.ptr);
LODUPD (actually MOVUPD) reads two doubles. The code passes it a double lvalue. The optimizer replaces the double with a reference to 56.0. The second double the LODUPD reads is whatever is after the 56.0.
This problem can be fixed with a cast to double2 so the optimizer knows it's a 16 byte operation:
double2 R = cast(double2) __simd(XMM.LODUPD, *cast(double2*)A.ptr);
I'm not really sure what to do about this as __simd does not do type checking on its arguments, which is why it's @system code.
I'll leave it open for now.