When compiling the OpenSSL module with the -inline and -O flags with DMD 2.065.0 it takes around half a second. With 2.066.0-b5 the same compilation takes over six minutes. Command used to compile:
dmd -c -I. -inline -O -of./objs/tango-net-util-cOpenSSL-O--inline--release.o tango/net/util/c/OpenSSL.d
The file that is compiled is:
https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/net/util/c/OpenSSL.d
Comment #1 by bugzilla — 2014-07-26T06:19:55Z
Any chance of a reduced test case?
Comment #2 by doob — 2014-07-26T14:17:36Z
(In reply to Walter Bright from comment #1)
> Any chance of a reduced test case?
I don't have access to my development computer right now. So I want be able to do that in the near future.
Comment #3 by nilsbossung — 2014-07-26T21:27:50Z
(In reply to Walter Bright from comment #1)
> Any chance of a reduced test case?
Here you go.
2.065 finishes in less than 0.1 seconds on my system.
2.066.0-b5 takes over 7 seconds.
When the code is all in one module, 2.065 is just as slow.
---
cat > SharedLib.d << code
module SharedLib;
final class SharedLib {
void getSymbol() {return getSymbolImpl();}
void getSymbolImpl() {return getSymbol_();}
/* add more intermediate functions to go slower */
void getSymbol_() {}
}
code
cat > OpenSSL.d << code
module OpenSSL;
import SharedLib;
SharedLib ssllib;
void bindFunc() {ssllib.getSymbol();}
void bindCrypto()
{
bindFunc(); /* add more of these to go slower */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 10 */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 20 */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 30 */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 40 */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 50 */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 60 */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 70 */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 80 */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 90 */
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc();
bindFunc(); /* 100 */
}
code
time dmd -c -inline -O OpenSSL.d
---
Comment #4 by k.hara.pg — 2014-07-27T14:37:58Z
(In reply to Nils from comment #3)
> (In reply to Walter Bright from comment #1)
> > Any chance of a reduced test case?
>
> Here you go.
>
> 2.065 finishes in less than 0.1 seconds on my system.
> 2.066.0-b5 takes over 7 seconds.
> When the code is all in one module, 2.065 is just as slow.
Introduced in: https://github.com/D-Programming-Language/dmd/pull/2561
(In reply to Kenji Hara from comment #5)
> (In reply to Kenji Hara from comment #4)
> > Introduced in: https://github.com/D-Programming-Language/dmd/pull/2561
>
> This is performance issue of dmd backend codegen, because it does not occur
> with -o- switch.
Thanks for tracking down the cause.
The optimizer is not designed to run fast, it is designed to build the fastest code. What #2561 did is heap a whole extra heaping helping of code into the optimizer to optimize. Hence the slowdown. #2561 needs to be revisited.
#2561 was an attempt to fix:
https://issues.dlang.org/show_bug.cgi?id=10985
Comment #7 by bugzilla — 2014-07-28T05:18:26Z
Thank you, Nils, for providing the test case. I'd argue that it is not really a regression, as the optimizer has at least been exhibiting this behavior for the last couple versions. The PR Kenji identified did make the slowdown appear in Jacob's code, even though as you found, a minor rearrangement shows the problem regardless.
I'd also it's not really a showstopper, since the compiler does (eventually) complete. By cutting down the number of function calls in one function, the optimizer times can be improved.
But it is something I'd like to address. Turning on the Digital Mars C++ profiler yields the top 10 time consumers:
Num Tree Func Per
Calls Time Time Call
13 24172807 10679826 821525 void cdecl copyprop()
215227791 9896739 9896739 0 unsigned cdecl vec_index(unsigned ,unsigned *)
929 3606016 3181392 3424 void cdecl flowcp()
8777409 375222 375222 0 int cdecl ERTOL(elem *)
3650 212113 114724 31 LIST *cdecl listrds(unsigned *,elem *,unsigned *)
543837 39448 39448 0 void cdecl vec_subass(unsigned *,unsigned *)
541398 38649 38649 0 void cdecl vec_orass(unsigned *,unsigned *)
25 41064 37402 1496 void cdecl flowae()
543032 34477 33198 0 unsigned *cdecl vec_calloc(unsigned )
681074 33198 31611 0 void cdecl vec_free(unsigned *)
8 64767 23682 2960 void cdecl builddags()
17 87772 22527 1325 void cdecl boolopt()
7452 21745 21745 2 void cdecl updaterd(elem *,unsigned *,unsigned *)
374313 16950 16950 0 int cdecl el_match(elem *,elem *)
268038 11610 11610 0 int cdecl el_noreturn(elem *)
135687 11740 10788 0 unsigned *cdecl vec_clone(unsigned *)
132910 9746 9746 0 void cdecl vec_andass(unsigned *,unsigned *)
176436 7532 7532 0 int cdecl el_appears(elem *,Symbol *)
92377 4666 4645 0 LIST *cdecl list_prepend(LIST **,void *)
13 4675 3940 303 void cdecl rmdeadass()
16532 2959 2959 0 void *cdecl mem_fmalloc(unsigned )
7419 5351 2887 0 elem *cdecl el_calloc()
17299 9741091 2840 0 _codelem
Looks like I need to take a look at copyprop().