Bug 9645 – std.algorithm.splitter on string with char as separator performs badly in certain situations

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-03-04T10:55:00Z
Last change time
2013-11-10T09:26:41Z
Keywords
preapproved
Assigned to
nobody
Creator
schveiguy

Comments

Comment #0 by schveiguy — 2013-03-04T10:55:33Z
std.algorithm.splitter where the separator is a char can perform worse than the version where the separator is a substring. It should be at worse equivalent, since you can always degrade to the substring version given the single char separator. I would expect it to be faster. In cases where the content to separator ratio is 1 to 1 up to a certain point (did not test for it), then the single-char version performs better than its substring counterpart. But in cases where the ratio of content to separator is large (arguably the more common case), it does horribly. A simple test (ratio of 35:1 for content to separator): import std.datetime; import std.algorithm; import std.stdio; string line = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbba"; void main() { enum iterations = 1_000_000; auto t1 = Clock.currTime(); foreach(i; 0..iterations) { foreach(s; splitter(line, 'a')) { if(i == 0) writeln(s); } } writefln("char splitter took %s", Clock.currTime()-t1); t1 = Clock.currTime(); foreach(i; 0..iterations) { foreach(s; splitter(line, "a")) { if(i == 0) writeln(s); } } writefln("substring splitter took %s", Clock.currTime()-t1); } result (macbook pro 2.2GHz core i7): bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb char splitter took 2 secs, 702 ms, and 188 μs bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb substring splitter took 1 sec, 71 ms, and 736 μs
Comment #1 by bearophile_hugs — 2013-03-04T13:37:29Z
See also Issue 8013
Comment #2 by monarchdodra — 2013-08-22T07:36:21Z
(In reply to comment #0) > char splitter took 2 secs, 702 ms, and 188 μs > > substring splitter took 1 sec, 71 ms, and 736 μs Changes where made to improve the situation, However, due to performance bug 10848, it is still slower. That said, I have a standing performance improvement pull for find: https://github.com/D-Programming-Language/phobos/pull/1492 fix Issue 10403 - memchr optimization for std.algorithm.find http://d.puremagic.com/issues/show_bug.cgi?id=10403 With it, I'm getting: char splitter took 429 ms, 298 μs, and 8 hnsecs substring splitter took 2 secs, 323 ms, 301 μs, and 7 hnsecs Gone from twice as slow to five times as fast. Not bad :)
Comment #3 by monarchdodra — 2013-11-10T09:26:41Z
(In reply to comment #2) > Gone from twice as slow to five times as fast. Not bad :) Merged in: This is now fixed. That said, this doesn't mean there can't be room for even more improvements in the splitter algorithm: see https://d.puremagic.com/issues/show_bug.cgi?id=9646) shows that splitter substring can also be improved.