Bug 19443 – core.simd generates MOVLPS instead of MOVHLPS

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
All
Creation time
2018-11-28T18:11:08Z
Last change time
2021-03-21T09:56:07Z
Keywords
backend, pull, SIMD
Assigned to
No Owner
Creator
j.kulaviir

Attachments

IDFilenameSummaryContent-TypeSize
1718movhlps.dtest case. includes both core.simd and inline assembly tests.text/x-csrc433

Comments

Comment #0 by j.kulaviir — 2018-11-28T18:11:08Z
Created attachment 1718 test case. includes both core.simd and inline assembly tests. this one is sort of fun. the core.simd module seems to be producing unexpected results, but only with the movhlps instruction, at least in so far as I have seen. all other instructions that I have used, including movlhps, seem to be working as expected. for those not familiar with this instruction: https://www.felixcloutier.com/x86/MOVHLPS.html to elaborate: given inputs [1, 2, 4, 8] and [2, 3, 5, 7], the core.simd module gives the result [2, 3, 4, 8]. when I dropped down to inline assembly though, it produced the expected result of [5, 7, 4, 8]. attached is the test case. it was compiled against dmd 2.83, the latest stable release at the time of this writing, and was tested over several systems with varying amd and intel cpus. all results were consistent - even from the interactive widget on the dlang.org page. I did not try compiling it against ldc or gdc. the documentation on the core.simd module is more or less non existent, so it's possible that I may have missed something important. it's also possible that I have no idea what I'm talking about, but either way it seems like a bug. I'm not sure how often people muck around with simd stuff, so I marked it as minor, but it'd be nice if someone would weigh in on this. thanks everyone
Comment #1 by n8sh.secondary — 2020-10-08T22:39:08Z
I ran into MOVHLPS not working today. I spent some time looking through DMD but couldn't find anything different between the way it treats MOVHLPS and MOVLHPS (the latter of which as j.kulaviir said works fine) aside from them having different opcodes.
Comment #2 by bugzilla — 2020-12-28T09:56:45Z
Here's the attachment: /*dmd -m64 movhlps.d*/ import std.stdio; import core.simd; void main () { float4 a = [1, 2, 4, 8]; float4 b = [2, 3, 5, 7]; writefln ("expected result: [5, 7, 4, 8]"); //Does not produce the expected result writefln ("core.simd: %s", simd!(XMM.MOVHLPS) (a, b)); //But this does. How mysterious! float4 res; asm { movaps XMM0, a; movaps XMM1, b; movhlps XMM0, XMM1; movaps res, XMM0; } writefln ("asm: %s", res); }
Comment #3 by bugzilla — 2020-12-28T10:07:39Z
I corrected the line: writefln ("core.simd: %s", simd!(XMM.MOVHLPS) (a, b)); to: writefln ("core.simd: %s", __simd(XMM.MOVHLPS, a, b)); Compiled with master and ran it, and the output is: expected result: [5, 7, 4, 8] core.simd: [0, 0, 0, 64, 0, 0, 64, 64, 0, 0, 128, 64, 0, 0, 0, 65] asm: [5, 7, 4, 8] which appears to be the expected behavior. Marking as resolved.
Comment #4 by aliloko — 2021-01-07T15:12:26Z
Unfortunately it isn't resolved. Consider this test case, built with DMD from 7 jan 2021 at commit 4f18b2798ad8fa337b8b71e4d2dd0d983adf9868 (with digger) void main() { float4 a = [1.0f, 2.0f, 3.0f, 4.0f]; float4 b = [5.0f, 6.0f, 7.0f, 8.0f]; float4 r = cast(float4) __simd(XMM.MOVHLPS, a, b); float[4] correct = [7.0f, 8.0f, 3.0f, 4.0f]; assert(r.array == correct); // FAIL, produces [5, 6, 3, 4] instead } and indeed Godbolt can show how it generated MOVLPS instead of MOVHLPS: https://d.godbolt.org/z/43n5KP
Comment #5 by bugzilla — 2021-03-21T06:44:15Z
The MOVHLPS instruction is encoded: NP 0F 12 /r MOVHLPS xmm1, xmm2 "Moves two packed single-precision floating-point values from the high quadword of the second XMM argument (second operand) to the low quadword of the first XMM register (first argument). The quadword at bits 127:64 of the destination operand is left unchanged. Bits (MAXVL-1:128) of the corresponding destination register remain unchanged." The MOVLPS instruction is encoded: NP 0F 12 /r MOVLPS xmm1, m64 "Moves two packed single-precision floating-point values from the source 64-bit memory operand and stores them in the low 64-bits of the destination XMM register. The upper 64bits of the XMM register are preserved. Bits (MAXVL-1:128) of the corresponding destination register are preserved." https://www.felixcloutier.com/x86/movlps https://www.felixcloutier.com/x86/movhlps Looking at the code: float4 a = [1.0f, 2.0f, 3.0f, 4.0f]; float4 b = [5.0f, 6.0f, 7.0f, 8.0f]; float4 r = cast(float4) __simd(XMM.MOVHLPS, a, b); float[4] correct = [7.0f, 8.0f, 3.0f, 4.0f]; assert(r.array == correct); // FAIL, produces [5, 6, 3, 4] instead The problem appears to be that the second operand needs to be forced into an XMM register rather than remaining in memory.
Comment #6 by bugzilla — 2021-03-21T06:50:34Z
Another problem is that MOVHLPS and LODLPS have the same opcode (!) in core.simd.
Comment #7 by dlang-bot — 2021-03-21T07:32:42Z
@WalterBright created dlang/dmd pull request #12293 "fix Issue 19443 - core.simd generates MOVLPS instead of MOVHLPS" fixing this issue: - fix Issue 19443 - core.simd generates MOVLPS instead of MOVHLPS https://github.com/dlang/dmd/pull/12293
Comment #8 by dlang-bot — 2021-03-21T09:56:07Z
dlang/dmd pull request #12293 "fix Issue 19443 - core.simd generates MOVLPS instead of MOVHLPS" was merged into master: - f96662159182b611ba1eafd6b1ba050bddf672dd by Walter Bright: fix Issue 19443 - core.simd generates MOVLPS instead of MOVHLPS https://github.com/dlang/dmd/pull/12293