Bug 10225 – core.simd wrong codegen for XMM.STOUPS with __simd_sto

Status
RESOLVED
Resolution
FIXED
Severity
critical
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-06-01T03:05:00Z
Last change time
2016-10-01T11:45:57Z
Keywords
SIMD, wrong-code
Assigned to
nobody
Creator
code

Comments

Comment #0 by code — 2013-06-01T03:05:09Z
For the follwing small test program: import core.simd; import std.stdio; void main(string[] args) { float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f]; float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr); result = __simd(XMM.ADDPS, result, result); __simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result); writefln("%s", value1); } Dmd will generate the follwing assembly: mov rax,qword ptr [rbp-68h] movups xmm0,xmmword ptr [rax] movaps xmmword ptr [rbp-60h],xmm0 movdqa xmm0,xmmword ptr [rbp-60h] addps xmm0,xmmword ptr [rbp-60h] movaps xmmword ptr [rbp-60h],xmm0 As you can clearly see the last instruction is a movaps but it should be a movups because XMM.STOUPS was given (unaligned store)
Comment #1 by turkeyman — 2013-06-03T03:02:18Z
(In reply to comment #0) > For the follwing small test program: > > import core.simd; > import std.stdio; > > void main(string[] args) > { > float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f]; > float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr); > result = __simd(XMM.ADDPS, result, result); > __simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result); > writefln("%s", value1); > } > > Dmd will generate the follwing assembly: > mov rax,qword ptr [rbp-68h] > movups xmm0,xmmword ptr [rax] > movaps xmmword ptr [rbp-60h],xmm0 > movdqa xmm0,xmmword ptr [rbp-60h] > addps xmm0,xmmword ptr [rbp-60h] > movaps xmmword ptr [rbp-60h],xmm0 > > As you can clearly see the last instruction is a movaps but it should be a > movups because XMM.STOUPS was given (unaligned store) Indeed. Wrong opcode >_< I never use unaligned vectors, so I missed it! Haven't widely tested the unaligned bit of std.simd yet.
Comment #2 by yebblies — 2013-11-21T05:33:18Z
I think it's worse than that actually. import core.simd; __gshared float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f]; void func(); void main(string[] args) { float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr); result = __simd(XMM.ADDPS, result, result); __simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result); func(); } gives mov rax, qword ptr [_D5testx6value1Af+8H] ; 0008 _ 48: 8B. 05, 00000008(rel) movups xmm0, xmmword ptr [rax] ; 000F _ 0F 10. 00 movaps xmmword ptr [rbp-10H], xmm0 ; 0012 _ 0F 29. 45, F0 movdqa xmm1, xmmword ptr [rbp-10H] ; 0016 _ 66: 0F 6F. 4D, F0 addps xmm1, xmmword ptr [rbp-10H] ; 001B _ 0F 58. 4D, F0 movaps xmmword ptr [rbp-10H], xmm1 ; 001F _ 0F 29. 4D, F0 So the value is loaded from value1, then stored into a stack slot, then loaded, added, and stored back into the stack slot. The result is never written back into value1. With -O the whole thing is dropped, despite the call to func() which could easily access value1. Elevating to critical.
Comment #3 by code — 2013-11-21T06:59:06Z
I don't know anyone who is ussing core.simd because of its broken state. So its not really that critical, its just another feature that doesn't work as advertised.
Comment #4 by github-bugzilla — 2016-04-03T18:47:20Z
Commits pushed to master at https://github.com/D-Programming-Language/dmd https://github.com/D-Programming-Language/dmd/commit/087a4ca9b5c1ccda6b6963cb9d7e06074185a90a fix Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto - add missing STO* ops to codegen https://github.com/D-Programming-Language/dmd/commit/7133d4a17ce5767a3427ccc47ec01468115e6d6c Merge pull request #5625 from MartinNowak/fix10225 fix Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto
Comment #5 by github-bugzilla — 2016-10-01T11:45:57Z
Commits pushed to stable at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/087a4ca9b5c1ccda6b6963cb9d7e06074185a90a fix Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto https://github.com/dlang/dmd/commit/7133d4a17ce5767a3427ccc47ec01468115e6d6c Merge pull request #5625 from MartinNowak/fix10225