Bug 11531 – For a faster std.algorithm.group on strings

Status
NEW
Severity
enhancement
Priority
P4
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-11-16T13:28:06Z
Last change time
2024-12-01T16:19:07Z
Keywords
performance
Assigned to
No Owner
Creator
bearophile_hugs
Moved to GitHub: phobos#9617 →

Attachments

IDFilenameSummaryContent-TypeSize
1635test.dtest.dtext/plain836

Comments

Comment #0 by bearophile_hugs — 2013-11-16T13:28:06Z
This is a low-priority enhancement request. From my tests I've seen you can speed up std.algorithm.group applied on strings more than twice if you instead apply it on a immutable(ubyte)[] using std.string.representation. Dmitry Olshansky has suggested some ideas that can improve the performance of std.algorithm.group applied on strings: As to group it has to find runs of identical items. It can be speed up for Unicode if you take into account 2 simple tricks: - you don't need to decode - just identify the size of current dchar (stride) and see how many repetitions of such follow it; - special case if the current (w)char ASCII (or BMP for UTF-16) so as to speed up counting (1 char vs variable length slice of 1-4 chars, ditto with wchar).
Comment #1 by jack — 2017-01-27T20:42:21Z
Created attachment 1635 test.d Currently group does not auto-decode, and I have attached a test case which shows that using immutable(ubyte)[] rather than string has a huge performance pessimization being almost 2x slower.
Comment #2 by uplink.coder — 2017-01-27T23:55:59Z
Jack, I do not see anything of the kind. The performance difference is within 2% and will within fluctuations being caused by the gc.
Comment #3 by jack — 2017-01-28T01:28:04Z
(In reply to Stefan Koch from comment #2) > Jack, I do not see anything of the kind. > The performance difference is within 2% and will within fluctuations being > caused by the gc. Hmm dmd does not show this performance problem but ldc does $ dmd -O -inline -release test.d && ./test original 5 secs, 355 ms, 103 μs, and 7 hnsecs new 5 secs, 70 ms, 858 μs, and 6 hnsecs $ ldc2 -O5 -release test.d && ./test original 576 ms, 524 μs, and 6 hnsecs new 992 ms, 676 μs, and 6 hnsecs Odd.
Comment #4 by robert.schadek — 2024-12-01T16:19:07Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9617 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB