Created attachment 1847
main source
Hello,
Here is a new shipment of DMD codegen bugs.
With latest DMD v2.100.0-beta.1-dirty
Consider the following program, and built it with
$ dmd -m64 main.d
------------------ main.d ------------------------
import core.simd;
import core.stdc.stdio;
alias __m128 = float4;
alias __m128i = int4;
alias __m64 = long1;
struct long1
{
long[1] array;
}
__m128 _mm_loadl_pi (__m128 a, const(__m64)* mem_addr) pure @trusted
{
return cast(__m128) __simd(XMM.LODLPS, a, *cast(const(__m128)*)mem_addr);
}
__m64 to_m64(__m128i a) pure @trusted
{
long2 la = cast(long2)a;
long1 r;
r.array[0] = la.array[0];
return r;
}
void _mm_print_ps(__m128 v) @trusted
{
float[4] C = (cast(float4)v).array;
printf("%f %f %f %f\n", C[0], C[1], C[2], C[3]);
}
void main()
{
float4 A = [1.0f, 2.0f, 3.0f, 4.0f];
float4 B = [5.0f, 6.0f, 7.0f, 8.0f];
__m64 M = to_m64(cast(__m128i)B);
__m128 R = _mm_loadl_pi(A, &M);
_mm_print_ps(R);
float[4] correct = [5.0f, 6.0f, 3.0f, 4.0f];
assert(R.array == correct);
}
-------------------------------------------------
Output with DMD 2.095.1:
5.000000 6.000000 3.000000 4.000000
Output with DMD 2.097.0 all the way to DMD v2.100.0-beta:
0.000000 0.000000 3.000000 4.000000
Notes:
- superficially looks related to https://issues.dlang.org/show_bug.cgi?id=21673 ?
Comment #1 by bugzilla — 2022-04-25T21:55:20Z
Here's the problem. You've specified the LODLPS instruction (actually MOVLPS):
https://www.felixcloutier.com/x86/movlps
But what was generated was the MOVHLPS instruction:
https://www.felixcloutier.com/x86/movhlps
They both have the same opcode: 0F 12. The two are distinguished by the second operand. A 64 bit second operand selects MOVLPS, a 128 bit operand selects MOVHLPS. The code:
__simd(XMM.LODLPS, a, *cast(const(__m128)*)mem_addr)
selects MOVHLPS. However, changing it to:
__simd_sto(XMM.LODLPS, a, *cast(const(long)*)mem_addr) [1]
doesn't work because core.simd doesn't have that overload. Hence, the PR to add it to core.simd, and then with the change[1] the example works.
In general, when working with SIMD instructions that change only parts of a register, it merits close attention to the instruction that is actually generated.
Comment #2 by dlang-bot — 2022-04-25T21:56:53Z
@WalterBright created dlang/druntime pull request #3811 "fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen" fixing this issue:
- fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen
https://github.com/dlang/druntime/pull/3811
Comment #3 by dlang-bot — 2022-05-02T11:10:46Z
dlang/druntime pull request #3811 "fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen" was merged into stable:
- daf4c10d7ee6c992ce0e056f2c1bddd875245fec by Walter Bright:
fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen
https://github.com/dlang/druntime/pull/3811
Comment #4 by dlang-bot — 2022-05-03T15:37:35Z
dlang/druntime pull request #3817 "merge stable" was merged into master:
- 9c0d4f914e0817c9ee4eafc5a45c41130aa6b981 by Walter Bright:
fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen
https://github.com/dlang/druntime/pull/3817