Bug 4244 – Built-in static array hash function is 8-9 times slower than hash function of a POD struct with equivalent size
Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
druntime
Product
D
Version
D2
Platform
x86
OS
All
Creation time
2010-05-27T17:08:06Z
Last change time
2019-05-16T23:33:50Z
Keywords
performance
Assigned to
No Owner
Creator
bearophile_hugs
Comments
Comment #0 by bearophile_hugs — 2010-05-27T17:08:06Z
This D2 program gets much slower if foo1() is used inside the main() instead of foo2() that apparently does something very similar (compiled with dmd v2.046, __gshared not used, compilation arguments -O -release -inline):
import std.c.stdio: printf;
int foo1(int x, int y) {
static int[int[2]] cache;
int[2] args = [x, y];
cache[args] = x;
return x;
}
int foo2(int x, int y) {
static struct Pair { int x, y; }
static int[Pair] cache;
Pair args = Pair(x, y);
cache[args] = x;
return x;
}
void main() {
enum int N = 600;
int tot;
foreach (x; 1 .. N)
foreach (y; 1 .. N)
tot += foo1(x, y);
printf("%d\n", tot);
}
An experiment shows that foo1 gets quite faster if the [x,y] is written inside the cache AA itself:
int foo1(int x, int y) {
static int[int[2]] cache;
int[2] args = [x, y];
cache[[x, y]] = x;
return x;
}
Comment #1 by hsteoh — 2015-06-30T18:28:45Z
Confirmed that this performance issue still happens on git HEAD (1ceb899e9ee430bcd223ddeeb907245ea5be3d47). Tested on Linux/64bit, AMD Phenom II X6 1055T (800 MHz).
The original code runs a little too fast on my system to have any measurable performance numbers (S/N ratio too low to be accurate), so I bumped N to 1800, and noted that the foo1() version takes on average about 1.4 to 1.5 seconds to complete, whereas the foo2() version takes only about 0.8 to 0.9 seconds on average. Both versions compiled with `dmd -O -release -inline`. (Surprisingly enough, not specifying -inline seems to produce faster code?! It's a small difference, though, so could just be background noise.)
Also surprisingly, gdc-5.1.1 produces much slower code than dmd in this case: the foo1() version takes on average about 3.9 seconds, whereas the foo2() version takes on average about 3.4 seconds. Both compiled with `gdc -O3 -finline -frelease`. This may possibly be caused by gdc using an older version of druntime, as recently the AA code in git HEAD has had some performance improvements.
Will take a look into the generated assembly code next.
Comment #2 by hsteoh — 2015-06-30T18:50:38Z
Looking at the generated assembly code, it seems that foo1 and foo2 are essentially identical. So it looks like the problem must come from the AA code, probably from the different treatments given to different typeinfo's for the two key types.
Comment #3 by hsteoh — 2015-06-30T20:27:22Z
The problem is caused by the hash computation of arrays (including static arrays). Here's the benchmark code I used:
------
void hashTest() {
import std.conv : to;
import std.datetime : Duration, benchmark;
import std.stdio : writefln;
static struct Pair { int x, y; }
int[2] staticArray;
Pair pair;
auto result = benchmark!(() => typeid(staticArray).toHash(),
() => typeid(pair).toHash())(30_000_000);
writefln("staticArrayHash: %s", to!Duration(result[0]));
writefln("structHash: %s", to!Duration(result[1]));
}
void main() {
hashTest();
}
------
Here are some trial runs showing the typical measurements:
------
$ ./test
staticArrayHash: 8 secs, 947 ms, 130 μs, and 7 hnsecs
structHash: 1 sec, 196 ms, 711 μs, and 1 hnsec
$ ./test
staticArrayHash: 9 secs, 130 ms, 423 μs, and 2 hnsecs
structHash: 1 sec, 196 ms, and 737 μs
$ ./test
staticArrayHash: 8 secs, 922 ms, 449 μs, and 9 hnsecs
structHash: 1 sec, 260 ms, 688 μs, and 8 hnsecs
$ ./test
staticArrayHash: 9 secs, 43 ms, 237 μs, and 6 hnsecs
structHash: 1 sec, 196 ms, and 983 μs
------
This shows that computing the hash of a static array is about 8-9 times slower than computing the hash of a struct of equivalent size.
Looking into the druntime code, it seems that a likely cause of the performance problem is the getArrayHash() function, which calls an inner function hasCustomHash() that in turn calls getElement(), which uses a loop to walk up the TypeInfo tree to ascertain whether the array elements have a custom hash function. I copied the druntime code into my test file, and modified hasCustomToHash() to just return false, and the benchmark shows that the array hash function is now approximately on par with the struct hash function.
So, it looks like the culprit is the call to getElement(). I'll see if I can cook up a PR to fix this.