When compiled with no flags, the following program gives wrong results:
import std.stdio;
import core.simd;
double2 * v(double* a)
{
return cast(double2*)a;
}
void main()
{
double2 a;
auto p = cast(double*) &a;
p[0] = 1;
p[1] = 2;
double2 b = v(p)[0];
v(p)[0] = b;
writeln(p[0 .. 2]); // prints [1, 0]
}
Disassembly of the relevant part of the code:
call 426344 <_D3tmp1vFPdZPNhG2d>
movapd xmm0,XMMWORD PTR [rax]
movapd XMMWORD PTR [rbp-0x10],xmm0
movapd xmm1,XMMWORD PTR [rbp-0x10]
movsd QWORD PTR [rbp-0x40],xmm1 ; should be movapd
mov rdi,QWORD PTR [rbp-0x20]
call 426344 <_D3tmp1vFPdZPNhG2d>
movsd xmm1,QWORD PTR [rbp-0x40] ; should be movapd
movapd XMMWORD PTR [rax],xmm1
This happens with both DMD 2.060 and the latest version of 2.061 from github. It doesn't happen if I use either -O flag or -inline. It doesn't happen with LDC or GDC.
I have only tested this on linux.
Comment #1 by jerro.public — 2012-12-23T10:49:42Z
I managed to reduce it a bit further:
import std.stdio;
import core.simd;
double2 * v(double2* a)
{
return a;
}
void main()
{
double2 a = [1, 2];
*v(&a) = a;
writeln(a.array);
}
And the disassembly:
movsd QWORD PTR [rbp-0x20],xmm1
lea rdi,[rbp-0x10]
call 4263f4 <_D3tmp1vFPNhG2dZPNhG2d>
movsd xmm1,QWORD PTR [rbp-0x20]
movapd XMMWORD PTR [rax],xmm1
Comment #2 by bugzilla — 2012-12-23T21:10:51Z
This is happening in cod3.c REGSAVE::save() and REGSAVE::restore(). Unfortunately, just changing the opcodes doesn't work because MOVAPD requires 16 bit alignment of the operands. Fixing that exposes further problems.
Essentially, it'll have to wait a bit.
Comment #3 by jerro.public — 2012-12-24T16:24:04Z
(In reply to comment #2)
> This is happening in cod3.c REGSAVE::save() and REGSAVE::restore().
> Unfortunately, just changing the opcodes doesn't work because MOVAPD requires
> 16 bit alignment of the operands. Fixing that exposes further problems.
>
> Essentially, it'll have to wait a bit.
I know nothing about the DMD back end, so this may be an obviously bad idea, but if alignment is the main problem, wouldn't using MOVUPD work in the meantime?
Comment #4 by bugzilla — 2012-12-24T19:00:01Z
MOVUPD is terribly slow.
Comment #5 by yebblies — 2013-01-14T03:08:27Z
(In reply to comment #4)
> MOVUPD is terribly slow.
Terribly slow is still much better than wrong-code.