Bug 9645 – std.algorithm.splitter on string with char as separator performs badly in certain situations
Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-03-04T10:55:00Z
Last change time
2013-11-10T09:26:41Z
Keywords
preapproved
Assigned to
nobody
Creator
schveiguy
Comments
Comment #0 by schveiguy — 2013-03-04T10:55:33Z
std.algorithm.splitter where the separator is a char can perform worse than the version where the separator is a substring. It should be at worse equivalent, since you can always degrade to the substring version given the single char separator. I would expect it to be faster.
In cases where the content to separator ratio is 1 to 1 up to a certain point (did not test for it), then the single-char version performs better than its substring counterpart. But in cases where the ratio of content to separator is large (arguably the more common case), it does horribly.
A simple test (ratio of 35:1 for content to separator):
import std.datetime;
import std.algorithm;
import std.stdio;
string line = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbba";
void main()
{
enum iterations = 1_000_000;
auto t1 = Clock.currTime();
foreach(i; 0..iterations)
{
foreach(s; splitter(line, 'a'))
{
if(i == 0)
writeln(s);
}
}
writefln("char splitter took %s", Clock.currTime()-t1);
t1 = Clock.currTime();
foreach(i; 0..iterations)
{
foreach(s; splitter(line, "a"))
{
if(i == 0)
writeln(s);
}
}
writefln("substring splitter took %s", Clock.currTime()-t1);
}
result (macbook pro 2.2GHz core i7):
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
char splitter took 2 secs, 702 ms, and 188 μs
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
substring splitter took 1 sec, 71 ms, and 736 μs
Comment #1 by bearophile_hugs — 2013-03-04T13:37:29Z
See also Issue 8013
Comment #2 by monarchdodra — 2013-08-22T07:36:21Z
(In reply to comment #0)
> char splitter took 2 secs, 702 ms, and 188 μs
>
> substring splitter took 1 sec, 71 ms, and 736 μs
Changes where made to improve the situation, However, due to performance bug 10848, it is still slower.
That said, I have a standing performance improvement pull for find:
https://github.com/D-Programming-Language/phobos/pull/1492
fix Issue 10403 - memchr optimization for std.algorithm.find
http://d.puremagic.com/issues/show_bug.cgi?id=10403
With it, I'm getting:
char splitter took 429 ms, 298 μs, and 8 hnsecs
substring splitter took 2 secs, 323 ms, 301 μs, and 7 hnsecs
Gone from twice as slow to five times as fast. Not bad :)
Comment #3 by monarchdodra — 2013-11-10T09:26:41Z
(In reply to comment #2)
> Gone from twice as slow to five times as fast. Not bad :)
Merged in: This is now fixed.
That said, this doesn't mean there can't be room for even more improvements in the splitter algorithm: see https://d.puremagic.com/issues/show_bug.cgi?id=9646) shows that splitter substring can also be improved.