Bug 21027 – Backend: DMD use 'rep stosb' even for ulong arrays

Status
RESOLVED
Resolution
INVALID
Severity
normal
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2020-07-08T02:41:59Z
Last change time
2020-07-20T06:06:45Z
Keywords
performance
Assigned to
No Owner
Creator
Mathias LANG
See also
https://issues.dlang.org/show_bug.cgi?id=14458

Comments

Comment #0 by pro.mathias.lang — 2020-07-08T02:41:59Z
Take the following code: ``` alias Content = ulong[256]; void main () { Content v; } ``` What DMD generates for this is on Linux c86_64 (used `run.dlang.org`): ``` .text._Dmain segment assume CS:.text._Dmain _Dmain: push RBP mov RBP,RSP sub RSP,0808h mov ECX,0800h mov qword ptr -8[RBP],0 lea RAX,-8[RBP] mov AL,[RAX] lea RDI,0FFFFF7F8h[RBP] rep stosb xor EAX,EAX leave ret add [RAX],AL .text._Dmain ends ``` The best to do here would be to call `memset` or `memcpy`, which is what LDC does. The second best would be to use `rep stosd` 0x100 times, as it is faster than `rep stosb` 0x800 times. Source: - Agner Fog, optimizing assembly (https://www.agner.org/optimize/optimizing_assembly.pdf), 16.9 Strings instructions (all processors): > `REP MOVSD` and `REP STOSD` are quite fast if the repeat count is not too small. The largest word size (DWORD in 32-bit mode, QWORD in 64-bit mode) is preferred. Both source and destination should be aligned by the word size or better. In many cases, however, it is faster to use vector registers. Moving data in the largest available registers is faster than `REP MOVSD` and `REP STOSD` in most cases, especially on older processors. See page 150 for details. Related: https://issues.dlang.org/show_bug.cgi?id=14458
Comment #1 by bugzilla — 2020-07-20T05:48:29Z
This turns out to be a problem introduced by https://github.com/dlang/dmd/pull/9828
Comment #2 by bugzilla — 2020-07-20T06:06:45Z
Compile with -O and the problem goes away.