Comment #0 by pro.mathias.lang — 2020-07-08T02:41:59Z
Take the following code:
```
alias Content = ulong[256];
void main ()
{
Content v;
}
```
What DMD generates for this is on Linux c86_64 (used `run.dlang.org`):
```
.text._Dmain segment
assume CS:.text._Dmain
_Dmain:
push RBP
mov RBP,RSP
sub RSP,0808h
mov ECX,0800h
mov qword ptr -8[RBP],0
lea RAX,-8[RBP]
mov AL,[RAX]
lea RDI,0FFFFF7F8h[RBP]
rep
stosb
xor EAX,EAX
leave
ret
add [RAX],AL
.text._Dmain ends
```
The best to do here would be to call `memset` or `memcpy`, which is what LDC does.
The second best would be to use `rep stosd` 0x100 times, as it is faster than `rep stosb` 0x800 times.
Source:
- Agner Fog, optimizing assembly (https://www.agner.org/optimize/optimizing_assembly.pdf), 16.9 Strings instructions (all processors):
> `REP MOVSD` and `REP STOSD` are quite fast if the repeat count is not too small. The largest word size (DWORD in 32-bit mode, QWORD in 64-bit mode) is preferred. Both source and destination should be aligned by the word size or better. In many cases, however, it is faster to use vector registers. Moving data in the largest available registers is faster than `REP MOVSD` and `REP STOSD` in most cases, especially on older processors. See page 150 for details.
Related: https://issues.dlang.org/show_bug.cgi?id=14458