Comment #0 by r9shackleford — 2015-04-25T01:25:32Z
dmd and gdc do _very_ poorly on a benchmark featuring D due to what I assume is poor codegen.
https://github.com/logicchains/LPATHBench/blob/master/d.d
using the "slow"(aka ranges) version, DMD produces code that is 4 times slower. Almost same exact ratios for GDC, so I assume it's due to frontend(?)
LDC seems to optimize it away to almost identical performance between "fast" and "slow" version, unsure if it's due to the LLVM optimizations or one of their patches to dmd.
dmd 2.067 GC summary for fast version:
Number of collections: 1
Total GC prep time: 0 milliseconds
Total mark time: 0 milliseconds
Total sweep time: 0 milliseconds
Total page recovery time: 0 milliseconds
Max Pause Time: 0 milliseconds
Grand total GC time: 0 milliseconds
GC summary: 1 MB, 1 GC 0 ms, Pauses 0 ms < 0 ms
dmd 2.067 GC summary for range version:
Number of collections: 2711
Total GC prep time: 13 milliseconds
Total mark time: 67 milliseconds
Total sweep time: 370 milliseconds
Total page recovery time: 75 milliseconds
Max Pause Time: 0 milliseconds
Grand total GC time: 526 milliseconds
GC summary: 1 MB, 2711 GC 526 ms, Pauses 80 ms < 0 ms
second number is runtime in ms, optimization flags relevant to compilers are enabled.
./dmd_slow
8981 LANGUAGE D 7311
./dmd_fast
8981 LANGUAGE D 1835
./gdc_slow
8981 LANGUAGE D 4249
./gdc_fast
8981 LANGUAGE D 903
./ldc_slow
8981 LANGUAGE D 999
./ldc_fast
8981 LANGUAGE D 1078
I marked this as a DMD issue due to LDC producing an expected output despite it being related to phobos.
Comment #1 by r9shackleford — 2015-04-25T23:41:04Z
Upon closer inspection, I believe this is an inlining issue, possibly related to cross-module inlining. If I move the function to another file, LDC achieves similar performance as GDC - but it goes away with singleobj flag.
this kills range performance.
coincidentally, on arch linux LDC is the only compiler that doesn't use a statically linked phobos. Maybe related?
Comment #2 by safety0ff.bugz — 2015-04-26T00:41:32Z
(In reply to weaselcat from comment #0)
> using the "slow"(aka ranges) version, DMD produces code that is 4 times
> slower. Almost same exact ratios for GDC, so I assume it's due to frontend(?)
>
> LDC seems to optimize it away to almost identical performance between "fast"
> and "slow" version, unsure if it's due to the LLVM optimizations or one of
> their patches to dmd.
>
AFAIK slow version allocates a closure on the heap, perhaps LDC optimizes out unnecessary closures.
Comment #3 by robert.schadek — 2024-12-13T18:42:28Z