Created attachment 1023
Test case
I am attempting to port some CPU emulators from C# to D. It uses a large
central switch statement (2221 lines in this case) with a case per opcode; the
contents of each opcode handler are 'manually inlined' because in C# the
automatic inliner is not very aggressive and there is no explicit inline
modifier (same as in D).
Building the attached program takes under 1 second with only -inline, about 4-5
seconds with only -O, and around 15 minutes with -O and -inline. -release
changes outcomes slightly but not in a significant way.
Because of the nature of this issue, it was not possible to reduce this to a
smaller test case. It is not the number of case statements in the switch that
cause this issue, it is the size or complexity of the code inside the switch
statement.
It's very possible the issue is simply related to very large methods rather
than anything at all to do with switch.
The resulting EXE does work and -O -inline is about 17% faster than -O alone
when run in a benchmark. It just takes a very long time to build.
This isn't a total blocker, however, this is only one CPU and I have about 6-7
I would like to port immediately, and long term, maybe around a dozen. The Z80
CPU has a switch statement which is about 4x bigger than this.
As I am very new to the D toolchain, I'm not currently using incremental
builds. I assuming this is possible and it should mitigate the problem
somewhat. It does make performance testing and optimization an issue, though. I
am using DMD 2.054.
Comment #1 by bearophile_hugs — 2011-09-10T10:37:30Z
Are you able to compile DMD and profile it during its compilation of (a reduced version of) this code? This profiling data will probably help to fix this performance bug.
Comment #2 by beirich2 — 2011-09-10T21:13:25Z
Created attachment 1024
trace.log and trace.def
First, I commented out half of the case statements, and this dropped the -O -inline compile time from 15 minutes to 2 minutes. So there is some definite non-linearity happening there.
Then, I compiled DMD 2.055 with it's "trace" makefile setting. I used that version of DMD to compile the half-size test case with -O -inline. This took a very long time with instrumentation - over an hour.
In any case, I attached the resulting trace.log and trace.def. It doesn't really mean anything to me, so I hope it means something to someone here!
Comment #3 by bearophile_hugs — 2011-09-11T04:05:24Z
Comment #4 by bearophile_hugs — 2011-09-11T05:32:04Z
Maybe it's possible to speed up the vec_index function (in srt\tk\vec.c) using smarter "bithacks".
Comment #5 by bearophile_hugs — 2011-09-11T06:10:47Z
With a test (removing @property, replacing immutable with const, and removing the imports) the LDC1 compiler shows normal compilation speeds with every combination of switches I have tried.
Note: probably LDC1 disables the inlining logic of the D front-end and uses the LLVM one.
Comment #6 by yebblies — 2011-09-11T08:26:07Z
This is (from what I can see) a problem with the optimizer, a similar kind of bug to issue 2396. It doesn't seem to have anything to do with the switch, but looks like a slow algorithm trying to reorder assignments, and choking on a 2000+ line function.
Doing some regex magic and placing each case block in a (){}(); delegate call gives normal compile times for me. Maybe this will be a suitable workaround for you.
Unfortunately there are far fewer people working on the dmd backend than the frontend, so depending on the difficulty it may be a while before this is fixed.
Comment #7 by yebblies — 2012-02-01T06:26:44Z
*** This issue has been marked as a duplicate of issue 2396 ***