See: https://github.com/dlang/dmd/pull/13977#issuecomment-1098199644
Consider:
```
import core.simd;
double2 set0(double2 x, double* a)
{
x[0] = *a;
return x;
}
double2 set1(double2 x, double* a)
{
x[1] = *a;
return x;
}
```
GDC generates this optimized code:
```
set0:
movlpd xmm0, QWORD PTR [rdi]
ret
set1:
movhpd xmm0, QWORD PTR [rdi]
ret
```
But DMD -O still does a roundtrip to stack memory:
```
assume CS:.text.set1
push RBP
mov RBP,RSP
sub RSP,010h
movapd -010h[RBP],XMM0
movsd XMM1,[RDI]
movsd -8[RBP],XMM1
movapd XMM0,-010h[RBP]
mov RSP,RBP
pop RBP
ret
```
In dmd.backend.cod1.getlvalue, vector variables are prevented from being in a register because the backend doesn't generate the correct assignment instructions yet. For example, it would use movsd for `x[0] = *a`, which clears the upper 64 bits of the XMM0 register and accidentally set `x[1] = 0` (see issue 21673 and issue 23009).
When this is fixed, SIMD code gen can be improved by allowing vector variables to be put in registers again.
Comment #1 by robert.schadek — 2024-12-13T19:22:16Z