Bug 10225 – core.simd wrong codegen for XMM.STOUPS with __simd_sto
Status
RESOLVED
Resolution
FIXED
Severity
critical
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-06-01T03:05:00Z
Last change time
2016-10-01T11:45:57Z
Keywords
SIMD, wrong-code
Assigned to
nobody
Creator
code
Comments
Comment #0 by code — 2013-06-01T03:05:09Z
For the follwing small test program:
import core.simd;
import std.stdio;
void main(string[] args)
{
float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f];
float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr);
result = __simd(XMM.ADDPS, result, result);
__simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result);
writefln("%s", value1);
}
Dmd will generate the follwing assembly:
mov rax,qword ptr [rbp-68h]
movups xmm0,xmmword ptr [rax]
movaps xmmword ptr [rbp-60h],xmm0
movdqa xmm0,xmmword ptr [rbp-60h]
addps xmm0,xmmword ptr [rbp-60h]
movaps xmmword ptr [rbp-60h],xmm0
As you can clearly see the last instruction is a movaps but it should be a movups because XMM.STOUPS was given (unaligned store)
Comment #1 by turkeyman — 2013-06-03T03:02:18Z
(In reply to comment #0)
> For the follwing small test program:
>
> import core.simd;
> import std.stdio;
>
> void main(string[] args)
> {
> float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f];
> float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr);
> result = __simd(XMM.ADDPS, result, result);
> __simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result);
> writefln("%s", value1);
> }
>
> Dmd will generate the follwing assembly:
> mov rax,qword ptr [rbp-68h]
> movups xmm0,xmmword ptr [rax]
> movaps xmmword ptr [rbp-60h],xmm0
> movdqa xmm0,xmmword ptr [rbp-60h]
> addps xmm0,xmmword ptr [rbp-60h]
> movaps xmmword ptr [rbp-60h],xmm0
>
> As you can clearly see the last instruction is a movaps but it should be a
> movups because XMM.STOUPS was given (unaligned store)
Indeed. Wrong opcode >_<
I never use unaligned vectors, so I missed it!
Haven't widely tested the unaligned bit of std.simd yet.
Comment #2 by yebblies — 2013-11-21T05:33:18Z
I think it's worse than that actually.
import core.simd;
__gshared float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f];
void func();
void main(string[] args)
{
float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr);
result = __simd(XMM.ADDPS, result, result);
__simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result);
func();
}
gives
mov rax, qword ptr [_D5testx6value1Af+8H] ; 0008 _ 48: 8B. 05, 00000008(rel)
movups xmm0, xmmword ptr [rax] ; 000F _ 0F 10. 00
movaps xmmword ptr [rbp-10H], xmm0 ; 0012 _ 0F 29. 45, F0
movdqa xmm1, xmmword ptr [rbp-10H] ; 0016 _ 66: 0F 6F. 4D, F0
addps xmm1, xmmword ptr [rbp-10H] ; 001B _ 0F 58. 4D, F0
movaps xmmword ptr [rbp-10H], xmm1 ; 001F _ 0F 29. 4D, F0
So the value is loaded from value1, then stored into a stack slot, then loaded, added, and stored back into the stack slot. The result is never written back into value1.
With -O the whole thing is dropped, despite the call to func() which could easily access value1. Elevating to critical.
Comment #3 by code — 2013-11-21T06:59:06Z
I don't know anyone who is ussing core.simd because of its broken state. So its not really that critical, its just another feature that doesn't work as advertised.
Comment #4 by github-bugzilla — 2016-04-03T18:47:20Z