Bug 4721 – compilation slow when compiling unittests on dcollections
Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
Other
OS
Linux
Creation time
2010-08-24T06:44:00Z
Last change time
2010-09-20T05:21:19Z
Assigned to
nobody
Creator
schveiguy
Comments
Comment #0 by schveiguy — 2010-08-24T06:44:06Z
When compiling dcollections unit tests, it takes over 1 minute to compile the small library. I am unsure what the exact trigger is to make it slow. At the request of Walter, I profiled the compiler to try and see where the slowdown was. Here are the first few lines of the results using gprof:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
77.76 6.68 6.68 2952 2.26 2.26 elf_findstr(Outbuffer*, char const*, char const*)
2.10 6.86 0.18 4342 0.04 0.04 searchfixlist
1.28 6.97 0.11 663755 0.00 0.00 ScopeDsymbol::search(Loc, Identifier*, int)
1.05 7.06 0.09 2623497 0.00 0.00 isType(Object*)
0.76 7.12 0.07 911667 0.00 0.00 match(Object*, Object*, TemplateDeclaration*, Scope*)
0.76 7.19 0.07 656268 0.00 0.00 _aaGetRvalue(AA*, void*)
0.58 7.24 0.05 2507041 0.00 0.00 isTuple(Object*)
Walter said it looks like elf_findstr is a problem area and asked me to file this bug. The relevant discussion is here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=116007
All I can say to duplicate this is to download the latest svn of dcollections and compile unit tests. If you are running the profiled version, it takes a long long time to complete (I gave up after 5 minutes). I suggest compiling just one class, HashMap. The compile time for this takes 4 seconds. The appropriate commands to run are:
svn co http://svn.dsource.org/projects/dcollections/branches/d2 dcollections
cd dcollections
dmd -unittest dcollections/HashMap.d dcollections/Hash.d dcollections/Iterators.d dcollections/model/*
The compilation will fail, because there is no main function, but it will demonstrate the problem well enough.
If you want to compile the entire unit test suite, use the command:
./build-lib-linux.sh unittest
Which builds and runs the unit tests for the whole library. This takes over a minute to compile.
Comment #1 by andrej.mitrovich — 2010-08-24T07:31:19Z
I don't know about Linux (I might try it in a VM if you want me to), but on Windows I can compile the collection in 27 seconds:
C:\d2\dcollections>timeit build-lib-win32.bat unittest
unittest
C:\d2\dcollections>dmd -unittest unit_test.d dcollections\ArrayList.d dcollections\DefaultAllocator
.d dcollections\DefaultFunctions.d dcollections\Hash.d dcollections\HashMap.d dcollections\HashMulti
set.d dcollections\HashSet.d dcollections\Iterators.d dcollections\Link.d dcollections\LinkList.d dc
ollections\RBTree.d dcollections\TreeMap.d dcollections\TreeMultiset.d dcollections\TreeSet.d dcolle
ctions\model\Addable.d dcollections\model\Iterator.d dcollections\model\Keyed.d dcollections\model\L
ist.d dcollections\model\Map.d dcollections\model\Multiset.d dcollections\model\Set.d
running unit tests...
Version Number: Windows NT 5.1 (Build 2600)
Exit Time: 4:29 pm, Tuesday, August 24 2010
Elapsed Time: 0:00:27.328
Process Time: 0:00:00.015
System Calls: 161445
Context Switches: 36381
Page Faults: 165602
Bytes Read: 10062897
Bytes Written: 5792330
Bytes Other: 131570
This is with DMD 2.046 as stated in the Readme. Do you need my system specs?
With the newer 2.048 I get errors:
C:\d2>build-lib-win32.bat unittest
unittest
C:\d2>dmd -unittest unit_test.d dcollections\ArrayList.d dcollections\DefaultAllocator.d
ctions\Hash.d dcollections\HashMap.d dcollections\HashMultiset.d dcollections\HashSet.d d
nk.d dcollections\LinkList.d dcollections\RBTree.d dcollections\TreeMap.d dcollections\Tr
llections\model\Addable.d dcollections\model\Iterator.d dcollections\model\Keyed.d dcolle
Map.d dcollections\model\Multiset.d dcollections\model\Set.d
std.contracts has been scheduled for deprecation. Please use std.exception instead.
dcollections\HashSet.d(28): Error: template std.algorithm.find(alias pred = "a == b",R,E)
ryFun!(pred)(haystack.front,needle)) : bool)) does not match any function template declar
dcollections\HashSet.d(28): Error: template std.algorithm.find(alias pred = "a == b",R,E)
ryFun!(pred)(haystack.front,needle)) : bool)) cannot deduce template function from argume
),ubyte)
dcollections\HashSet.d(301): Error: template instance dcollections.HashSet.rangeEqual!(ub
dcollections\HashSet.d(996): instantiated from here: HashSet!(ubyte)
running unit tests...
'.\unit_test' is not recognized as an internal or external command,
operable program or batch file.
Comment #2 by schveiguy — 2010-08-24T07:49:15Z
It might be a linux-only problem, I'm not sure. I also have an older machine, maybe yours is twice as fast?
27 seconds still seems quite long for a small library.
Re the errors, I'm working on them now, I just downloaded 2.048. Apparently there is some issue with AssumeSorted not working as a range anymore... It's not critical to the library, but the unit tests use it. I'm just removing the assumeSorted for now (as unit test performance isn't important, it was more the "correct" thing to do), asked about it in d.learn.
Comment #3 by andrej.mitrovich — 2010-08-24T09:55:25Z
(In reply to comment #2)
> It might be a linux-only problem, I'm not sure. I also have an older machine,
> maybe yours is twice as fast?
>
> 27 seconds still seems quite long for a small library.
>
> Re the errors, I'm working on them now, I just downloaded 2.048. Apparently
> there is some issue with AssumeSorted not working as a range anymore... It's
> not critical to the library, but the unit tests use it. I'm just removing the
> assumeSorted for now (as unit test performance isn't important, it was more the
> "correct" thing to do), asked about it in d.learn.
Yeah I guess it could be rather long, there's not that many size and they're mostly small from what I can tell.
Btw your updated d2.0b library now compiles in 11 seconds on my system with DMD 2.048.
Comment #4 by andrej.mitrovich — 2010-08-24T09:55:55Z
* not that many files, not size.
Comment #5 by schveiguy — 2010-08-24T10:13:43Z
(In reply to comment #3)
y small from what I can tell.
>
> Btw your updated d2.0b library now compiles in 11 seconds on my system with DMD
> 2.048.
Still over a minute for me on Linux 2.048.
Comment #6 by schveiguy — 2010-08-24T14:10:26Z
More testing, I did some printf debugging.
According to the comments and the code of the function, it's a *linear* search through the symbol table for a match to a given symbol + suffix.
The symbol table is a null-separated single buffer. So not only is it linear through the strings, but it's linear on the *characters*. That is, when comparing the current symbol and it's a mismatch, the code must iterate through all the characters anyways to find the next null character.
I added some printouts to determine the eventual size of the symbol table, and the number of times the function is called. I also added printouts to show the number of matches found (those should theoretically not do a linear search through the entire table, but likely would search through most of it).
The eventual numbers for dcollections:
symbol table size: 4,253,953
number of calls: 12,677
number of matches: 648
doing the math, thats probably conservatively about 20 billion loop iterations -- way unacceptable for something that should be done via a hash lookup, or at least a tree/trie/binary search.
This helps, but only reduces it to 20 seconds (but a 66% reduction is pretty good!). I ran another round of profiling, and found we have a new bottleneck:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
75.79 6.51 6.51 8103 0.80 0.80 TemplateDeclaration::toJsonBuffer(OutBuffer*)
3.14 6.78 0.27 1668093 0.00 0.00 StructDeclaration::semantic(Scope*)
2.10 6.96 0.18 1 180.00 180.00 do32bit(FL, evc*, int)
1.98 7.13 0.17 15445 0.01 0.01 EnumDeclaration::toJsonBuffer(OutBuffer*)
0.70 7.19 0.06 656268 0.00 0.00 Port::isSignallingNan(long double)
0.47 7.23 0.04 915560 0.00 0.00 StructDeclaration::toCBuffer(OutBuffer*, HdrGenState*)
0.47 7.27 0.04 Dsymbol::searchX(Loc, Scope*, Identifier*)
This time, I question whether toJsonBuffer should be called at all, as I'm not outputting any JSON data?
Walter, reopen if you think this could still use improvement.
Comment #9 by schveiguy — 2010-08-26T11:42:02Z
(In reply to comment #8)
> Flat profile:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 75.79 6.51 6.51 8103 0.80 0.80
> TemplateDeclaration::toJsonBuffer(OutBuffer*)
> 3.14 6.78 0.27 1668093 0.00 0.00
> StructDeclaration::semantic(Scope*)
That table of functions is invalid -- I somehow compiled dmd wrong when running that profile. I think I used a unit test build. However, that was only the profiled version, the non-profiled version still takes 20 seconds to compile dcollections.
In any case, here is the correctly profiled version's heavy hitters:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
80.31 11.99 11.99 19000 0.63 0.63 searchfixlist
0.67 12.09 0.10 203173 0.00 0.00 StringTable::search(char const*, unsigned int)
0.60 12.18 0.09 369389 0.00 0.00 Lexer::scan(Token*)
0.54 12.26 0.08 953613 0.00 0.00 ScopeDsymbol::search(Loc, Identifier*, int)
0.47 12.33 0.07 1449798 0.00 0.00 calccodsize
Note, this profile is different from the first in that I was compiling the *entire* dcollections library, not just HashMap (which is now bearable due to the improved performance!). This is why the # calls is much higher on this one compared to the original.
So the last significant performance problem here is searchfixlist. Looking back, it was 2% of the runtime before, now it's jumped to 80%.
Looking at the function in backend/cod3.c, it appears to be another linear search through a linked list. I understand much less about this function than the other, so I'm not sure how/if it should be solved. Perhaps another hash map?
Comment #10 by bearophile_hugs — 2010-08-26T13:19:39Z
Reopened, because you have added here a second different performance bug.
Comment #11 by schveiguy — 2010-09-20T05:21:19Z
Closing this, opened the new issue as bug 4900.
Fixed in 2.049