Bug 23046 – [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen

Status
RESOLVED
Resolution
FIXED
Severity
regression
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
All
Creation time
2022-04-22T13:09:36Z
Last change time
2022-05-03T15:37:35Z
Keywords
backend, pull, SIMD, wrong-code
Assigned to
No Owner
Creator
ponce

Attachments

IDFilenameSummaryContent-TypeSize
1847main.dmain sourcetext/plain881

Comments

Comment #0 by aliloko — 2022-04-22T13:09:36Z
Created attachment 1847 main source Hello, Here is a new shipment of DMD codegen bugs. With latest DMD v2.100.0-beta.1-dirty Consider the following program, and built it with $ dmd -m64 main.d ------------------ main.d ------------------------ import core.simd; import core.stdc.stdio; alias __m128 = float4; alias __m128i = int4; alias __m64 = long1; struct long1 { long[1] array; } __m128 _mm_loadl_pi (__m128 a, const(__m64)* mem_addr) pure @trusted { return cast(__m128) __simd(XMM.LODLPS, a, *cast(const(__m128)*)mem_addr); } __m64 to_m64(__m128i a) pure @trusted { long2 la = cast(long2)a; long1 r; r.array[0] = la.array[0]; return r; } void _mm_print_ps(__m128 v) @trusted { float[4] C = (cast(float4)v).array; printf("%f %f %f %f\n", C[0], C[1], C[2], C[3]); } void main() { float4 A = [1.0f, 2.0f, 3.0f, 4.0f]; float4 B = [5.0f, 6.0f, 7.0f, 8.0f]; __m64 M = to_m64(cast(__m128i)B); __m128 R = _mm_loadl_pi(A, &M); _mm_print_ps(R); float[4] correct = [5.0f, 6.0f, 3.0f, 4.0f]; assert(R.array == correct); } ------------------------------------------------- Output with DMD 2.095.1: 5.000000 6.000000 3.000000 4.000000 Output with DMD 2.097.0 all the way to DMD v2.100.0-beta: 0.000000 0.000000 3.000000 4.000000 Notes: - superficially looks related to https://issues.dlang.org/show_bug.cgi?id=21673 ?
Comment #1 by bugzilla — 2022-04-25T21:55:20Z
Here's the problem. You've specified the LODLPS instruction (actually MOVLPS): https://www.felixcloutier.com/x86/movlps But what was generated was the MOVHLPS instruction: https://www.felixcloutier.com/x86/movhlps They both have the same opcode: 0F 12. The two are distinguished by the second operand. A 64 bit second operand selects MOVLPS, a 128 bit operand selects MOVHLPS. The code: __simd(XMM.LODLPS, a, *cast(const(__m128)*)mem_addr) selects MOVHLPS. However, changing it to: __simd_sto(XMM.LODLPS, a, *cast(const(long)*)mem_addr) [1] doesn't work because core.simd doesn't have that overload. Hence, the PR to add it to core.simd, and then with the change[1] the example works. In general, when working with SIMD instructions that change only parts of a register, it merits close attention to the instruction that is actually generated.
Comment #2 by dlang-bot — 2022-04-25T21:56:53Z
@WalterBright created dlang/druntime pull request #3811 "fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen" fixing this issue: - fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen https://github.com/dlang/druntime/pull/3811
Comment #3 by dlang-bot — 2022-05-02T11:10:46Z
dlang/druntime pull request #3811 "fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen" was merged into stable: - daf4c10d7ee6c992ce0e056f2c1bddd875245fec by Walter Bright: fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen https://github.com/dlang/druntime/pull/3811
Comment #4 by dlang-bot — 2022-05-03T15:37:35Z
dlang/druntime pull request #3817 "merge stable" was merged into master: - 9c0d4f914e0817c9ee4eafc5a45c41130aa6b981 by Walter Bright: fix Issue 23046 - [REG][CODEGEN] __simd(XMM.LODLPS) bad codegen https://github.com/dlang/druntime/pull/3817