Bug 8487 – Semantic analysis of templates is insanely slow

Status
RESOLVED
Resolution
WORKSFORME
Severity
major
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-07-31T19:57:58Z
Last change time
2019-10-10T11:29:56Z
Keywords
performance
Assigned to
No Owner
Creator
Jonathan M Davis

Attachments

IDFilenameSummaryContent-TypeSize
1129named_entities.dModule with named entities which takes 6+ minutes to compiletext/x-dsrc113708
1130profilingThe results of profiling dmd with googleperftools when compiling the previously attached filetext/plain22563
1131named_entities.dShorter and simpler module with named entities which takes 0.2+ seconds to compiletext/x-dsrc118363

Comments

Comment #0 by issues.dlang — 2012-07-31T19:57:58Z
Created attachment 1129 Module with named entities which takes 6+ minutes to compile The attached program is a slightly altered version of what I currenly have in the D lexer that I'm putting together. It uses some TypeTuples, a template, and some mixins to fill an AA with all of the named HTML entities and their values. It has a second template and associated mixins to test that the names and values match dmd. It takes over _6 minutes_ to build on my Phenom II. This is _insanely_ slow and is a serious impediment for the lexer that I'm working on. Based on the results of using googleperftools on the compiler, it appears that the vast majority of the time is spent on the semantic analysis of templates. I'll attach the full results shortly, but this is the top 10042 30.4% 30.4% 32889 99.6% TemplateInstance::semantic 8391 25.4% 55.8% 22394 67.8% arrayObjectMatch 8240 25.0% 80.8% 14045 42.5% match 3153 9.6% 90.4% 3153 9.6% StringExp::compare 1656 5.0% 95.4% 5075 15.4% StringExp::equals 654 2.0% 97.3% 654 2.0% Expression::dyncast 247 0.7% 98.1% 247 0.7% __GI_memcmp 76 0.2% 98.3% 81 0.2% _int_malloc 63 0.2% 98.5% 63 0.2% __GI_memcpy 52 0.2% 98.7% 52 0.2% Tuple::dyncast 35 0.1% 98.8% 107 0.3% __libc_malloc 31 0.1% 98.9% 31 0.1% touchfunc 25 0.1% 99.0% 56 0.2% ecom 12 0.0% 99.0% 80 0.2% VarDeclaration::syntaxCopy 10 0.0% 99.0% 15 0.0% _IO_vfprintf To quote the googleperftools' docs ( http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html ): ------------ Text mode has lines of output that look like this: 14 2.1% 17.2% 58 8.7% std::_Rb_tree::find Here is how to interpret the columns: 1. Number of profiling samples in this function 2. Percentage of profiling samples in this function 3. Percentage of profiling samples in the functions printed so far 4. Number of profiling samples in this function and its callees 5. Percentage of profiling samples in this function and its callees 6. Function name ------------ which should explain the format of the above results.
Comment #1 by issues.dlang — 2012-07-31T20:00:40Z
Created attachment 1130 The results of profiling dmd with googleperftools when compiling the previously attached file Here's the full results from googleperftools.
Comment #2 by timon.gehr — 2012-07-31T20:36:37Z
Well, the performance can probably be improved, but I'd advise rewriting that snippet to use CTFE instead. Templates aren't meant for computation. Also, std.metastrings should be rewritten to wrap a few CTFE-routines or be deprecated.
Comment #3 by issues.dlang — 2012-07-31T20:45:16Z
> Well, the performance can probably be improved, but I'd advise rewriting that snippet to use CTFE instead. I'm trying to figure out how to do that, and I'll probably manage it, but that means using an array instead of TypeTuple, and arrays don't have the benefit of forcing CTFE, making it much harder (and the need to have the literals treated as literals at the right time and not before also complicates things). It's quite clean with TypeTuples (aside from the fact that I'm forced to split it across multiple type tuples so that the template instantiations don't choke at 100, thinking that they're recursing infinitely), and I'd much prefer to use them, but with this poor performance, it's not really reasonable. > Templates aren't meant for computation. I don't buy that. D's templates are specifically designed in a way to make that sort of thing possible, and in this particular case, using templates is definitely easier than CTFE.
Comment #4 by timon.gehr — 2012-07-31T21:39:19Z
Created attachment 1131 Shorter and simpler module with named entities which takes 0.2+ seconds to compile
Comment #5 by timon.gehr — 2012-07-31T21:42:23Z
(In reply to comment #3) > > Well, the performance can probably be improved, but I'd advise rewriting that > snippet to use CTFE instead. > > I'm trying to figure out how to do that, and I'll probably manage it, but that > means using an array instead of TypeTuple, and arrays don't have the benefit of > forcing CTFE, making it much harder (and the need to have the literals treated > as literals at the right time and not before also complicates things). It's > quite clean with TypeTuples (aside from the fact that I'm forced to split it > across multiple type tuples so that the template instantiations don't choke at > 100, thinking that they're recursing infinitely), and I'd much prefer to use > them, but with this poor performance, it's not really reasonable. > See the attachment I created for a possible solution. > > Templates aren't meant for computation. > > I don't buy that. D's templates are specifically designed in a way to make that > sort of thing possible, and in this particular case, using templates is > definitely easier than CTFE. I disagree.
Comment #6 by issues.dlang — 2012-07-31T22:13:26Z
Thanks for your solution, though I did figure one out just a little bit ago. Regardless, I think that the original solution should compile in a reasonably short period of time (as in seconds) rather than 6+ minutes. And given how template-heavy D code generally is (especially Phobos), templates really need to be optimized, or compilation times will suffer.
Comment #7 by bugzilla — 2012-08-01T02:08:47Z
My initial impression is that the slowdown is caused by looking for a match with a previous instantiation by using a linear search. This can be fixed by using a hash table instead.
Comment #8 by doob — 2012-08-01T02:14:12Z
The problem is with "Format". I replaced it with manual string concatenations and it compiles in just above one second. With "Format" it takes almost four minutes on my computer.
Comment #9 by hsteoh — 2014-08-18T21:13:30Z
std.metastrings has since been deprecated and removed, and replaced with std.format. Does this issue still occur?
Comment #10 by issues.dlang — 2014-08-18T21:21:46Z
std.metastrings is no longer with us, but I'd be very surprised if the problem with template performance were fixed. Clearly, a new example is needed though ,since std.metastrings has finally been removed.
Comment #11 by razvan.nitu1305 — 2019-10-10T11:29:56Z
I'm going to close this as there is no test case. Feel free to submit a new issue if a new test case appears.