Bug 17485 – bringToFront and/or upperBound slow

Status
NEW
Severity
enhancement
Priority
P4
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Mac OS X
Creation time
2017-06-09T19:04:30Z
Last change time
2024-12-01T16:30:20Z
Keywords
performance
Assigned to
No Owner
Creator
Steven Schveighoffer
Moved to GitHub: phobos#9716 →

Comments

Comment #0 by schveiguy — 2017-06-09T19:04:30Z
See this thread: https://forum.dlang.org/post/[email protected] Not any idea why it doesn't go fast, or his other issue with -boundscheck=off on ldc, but tinkering with the code, I couldn't get much except: 1. It's not the interaction of SortedRange with bringToFront 2. It's not an issue with using functors instead of lambdas 3. It's not an issue with the initial setup, as that should take nowhere near the 10 seconds I found it takes (consistent with his testing). 4. It *may* be an issue with the comparison, but I switched away from using ==, and didn't get any better performance. Ali's message in the thread may give a clue as to why std::rotate is so good and something we should try and dissect to copy it's speedups: https://stackoverflow.com/questions/21160875/why-is-stdrotate-so-fast
Comment #1 by hsteoh — 2017-06-09T23:30:03Z
From an initial cursory glance: bringToFrontImpl relies only on forward range primitives to do its job. Also, it allows you to transcribe ranges with different element types across each other (as long as the element type is implicitly convertible to the target element type). While this is all great for generality, it sucks for performance when you're dealing with, e.g., built-in arrays. I think there should definitely be a specialization for built-in arrays at the very least, if not for slice-assignable random-access ranges in general, when the element types are identical. Definitely should consider using memmove() in these cases to take advantage of years' worth of work C library authors have put into memmove().
Comment #2 by hsteoh — 2017-06-09T23:48:53Z
More specifically, if (1) both ranges are built-in arrays, (2) they have the same element type, and (3) the back range is a single element, we can optimize this way: If front and back are disjoint: (front.ptr > back.ptr && front.ptr + front.length < back.ptr) ------- ElementType!InputRange e1 = back[0]; back[0] = front[$-1]; memmove(front.ptr+1, front.ptr, front.length-1); front[0] = e1; ------- If back is part of front (i.e., front.ptr < back.ptr < front.ptr + front.length): -------- ElementType!InputRange e1 = back[0]; memmove(front.ptr+1, front.ptr, back.ptr - front.ptr); front[0] = e1; --------
Comment #3 by schveiguy — 2017-06-10T11:08:50Z
Indeed, memmove fixes the specific use case: https://forums.dlang.org/post/[email protected] However, I think we should do a full examination of everything C++ does to optimize rotate and include it all at once, instead of waiting for people to find more cases where std::rotate outperforms D.
Comment #4 by robert.schadek — 2024-12-01T16:30:20Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9716 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB