Test program:
import std.stdio, std.perf;
void main() {
scope pc = new PerformanceCounter;
pc.start;
uint[] foo;
foreach(i; 0..1_000_000) {
foo ~= i;
}
pc.stop;
writeln(pc.milliseconds);
}
Timings:
DMD 2.019 (Last release before druntime): 42 milliseconds.
DMD 2.020 (First release with druntime): ~1000 milliseconds.
DMD 2.029 (Current version): ~1000 milliseconds.
DMD 2.029 (Replacing ~= with the Appender struct): 18 milliseconds.
DMD 2.029 (Replacing builtin array with rangeextra.TNew): 19 milliseconds.
This looks to be related to the block size caching scheme used by gcx.d. When appending to two arrays simultaneously, the difference between 2.019 and 2.029 is much smaller, both in absolute and especially in relative terms:
Program:
import std.stdio, std.perf;
void main() {
scope pc = new PerformanceCounter;
pc.start;
uint[] foo, bar;
foreach(i; 0..1_000_000) {
foo ~= i;
bar ~= i;
}
pc.stop;
writeln(pc.milliseconds);
}
Timings:
DMD 2.019: ~1800 ms
DMD 2.029: ~2300 ms (Note: Still slower but not by as much even in absolute terms)
DMD 2.029 (Using Appender instead of ~=): 49 ms
Comment #1 by dsimcha — 2009-04-25T23:09:35Z
Created attachment 340
Fix by adding BlkInfo caching to druntime GC.
I've figured out the problem. In lifetime.d, the druntime devs tried to improve the allocation model in a way that required the whole BlkInfo struct from the GC, so the caching of size info wasn't used. This patch addresses this by adding caching of BlkInfo similar to caching of size to the GC.
It also fixes a subtle bug in the size caching that I found while reading the code: When free() is called on the block whose size is currently being cached, that cache information is not cleared.
Comment #2 by dsimcha — 2009-04-25T23:35:54Z
Created attachment 341
Another possible fix: query just size first in lifetime.
Here's another possible fix. This patch simply queries only the size of the block in lifetime.d and, if the array does not need to be reallocated, skips the querying of the full block info.
Comment #3 by sean — 2009-04-29T11:42:18Z
Thanks for doing this research. The reason gc_query() is currently used is because it provides a means of determining explicitly whether the array is an interior slice. However, I've begun thinking that it would make more sense just to require that gc_sizeOf() always return zero for interior pointers (I'd left the door open for it to return a positive value). That would bring performance in line with Phobos pre-Druntime. Either way though, the caching of BlkInfo is a valuable change. I'll see about at least getting these patches applied before the next release.
Comment #4 by dsimcha — 2009-04-30T18:20:09Z
Been thinking about this some more. It might be a good thing, even with the full BlkInfo caching, to make array appending only need size. In multithreaded programs, this might allow for the size to be queried atomically if it's cached, rather than requiring locking on every array append. Then again, this caching is somewhat of a kludge, and if you're doing a lot of appends, the better answer is probably to use an ArrayBuilder/Appender/TNew anyhow.
Comment #5 by sean — 2009-05-06T13:12:37Z
I'm afraid the source locations where most of this code is inserted by the patches is wrong. Could you update them?
Comment #6 by dsimcha — 2009-05-06T14:53:54Z
Sure, but could you please specify what's wrong with it and/or give me hints as to how to fix it? It seems right to me, though I'm pretty new to submitting patches. Also, while looking at it, I found a small error in my BlkInfo caching patch that needs fixing anyhow. I initially misunderstood how setAttr works.
if(gcx.infoCache.base is p) {
gcx.infoCache.attr = mask;
}
should be something like:
if(gcx.infoCache.base is p) {
gcx.infoCache.attr = mask & oldb;
}
Comment #7 by dsimcha — 2009-05-06T15:20:05Z
Created attachment 353
New BlkInfo caching
On second thought, it does look like some weird things were going on. First of all, Cygwin diff's omit whitespace option seems to do weird things. Here's another attempt, also with a subtle bug or two in the orig. patch fixed. I think this is the much better solution than the other patch, but if you disagree, let me know and I'll fix the other one too.