I am trying to figure out why win32 executables compiled from D source by dmd are usually somewhat slower than similar win32 programs compiled from C++ source by, for example, mingw-gcc.
I believe I found a relatively simple case where dmd puts a redundant instruction into the object code.
I have this simple D program:
-----
immutable int MAX_N = 1_000_000;
int main () {
int [MAX_N] a;
foreach (i; 0..MAX_N)
a[i] = i;
return a[7];
}
-----
The assembly (dmd -O -release -inline -noboundscheck, then obj2asm) has the following piece corresponding to the cycle:
-----
L2C: mov -03D0900h[EDX*4][EBP],EDX
mov ECX,EDX
inc EDX
cmp EDX,0F4240h
jb L2C
-----
Here, the second line "mov ECX, EDX" does not seem to serve any purpose at all. If this observation is correct, this instruction is an indication of a bug in code generation, and fixing that bug may improve performance in more general case.
The "return a[7]" part is to assure the whole loop need not be optimized out. The ldmd2 compiler reportedly does that when no return is present. DMD however does not, however that is irrelevant to this issue.
Previous discussion:
http://forum.dlang.org/thread/[email protected]
Will attach source and disassembly in comments.
Ivan Kazmenko.
Comment #1 by gassa — 2013-12-26T02:13:47Z
Created attachment 1307
source code of the demonstrating example
Comment #2 by gassa — 2013-12-26T02:14:21Z
Created attachment 1308
disassembly of the demonstrating example
Comment #3 by gassa — 2013-12-26T02:27:09Z
I should note that the exact compile command must be some sort of:
dmd a0.d -O -release -inline -noboundscheck -L/STACK:268435456
Otherwise, the default stack limit makes the program crash at runtime.
The "-L/STACK:268435456" does not affect the generated object file since it is used on linking stage.
Comment #4 by maxim — 2013-12-26T08:08:29Z
This may be remainders from internally created variables. Compiler often rewrites high-level constructions to lower ones with implicitly introducing new variables. What you see from asm is their usage.
By the way, it is not a 'code generation bug', it is poor optimization.
Comment #5 by robert.schadek — 2024-12-13T18:15:26Z