Bug 13877 – Problem with map+join

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2014-12-19T14:16:00Z
Last change time
2015-02-18T03:41:06Z
Assigned to
peter.alexander.au
Creator
bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2014-12-19T14:16:00Z
This shows a different computational complexity using apparently equivalent range-based code: import std.stdio: writeln; import std.algorithm: map, join; uint count1, count2; const(int)[] foo1(in int[] data, in int i, in int max) { count1++; if (i < max) { typeof(return) result; foreach (immutable n; data) result ~= foo1(data, i + 1, max); return result; } else { return data; } } const(int)[] foo2(in int[] data, in int i, in int max) { count2++; if (i < max) { return data.map!(n => foo2(data, i + 1, max)).join; } else { return data; } } void main() { const r1 = foo1([1, 2, 3, 4, 5], 1, 7); writeln(count1); // 19531 const r2 = foo2([1, 2, 3, 4, 5], 1, 7); writeln(count2); // 1111111 assert(r1 == r2); // Results are equally correct. } I think such performance traps should not be present using ranges. The right complexity is restored using: return data.map!(n => foo2(data, i + 1, max)).cache.joiner.array;
Comment #1 by bearophile_hugs — 2014-12-19T14:16:43Z
Comment #2 by ketmar — 2014-12-19T15:01:25Z
btw, with such modification both variants becomes equivalent: auto stripSave(RT) (auto ref RT rng) { static struct R { RT r; @disable void save (); alias r this; } return R(rng); } const(int)[] foo2 (in int[] data, in int i, in int max) { count2++; if (i < max) { return data.map!(n => foo2(data, i + 1, max)).stripSave.join; } else { return data; } } the code turns MapResult to non-forward range, and then join doesn't calculate everything twice. the question is: should MapResult has '.save' here in the first place? or maybe we need some API — as `stripSave` to explicitly remove features from ranges? it's very obvious to the programmer that recalculating range values is costly in this case, so he can just remove `.save` to stop Phobos using it.
Comment #3 by safety0ff.bugz — 2014-12-20T17:00:57Z
IMO the problem is that join assumes that traversal of the range of ranges is cheap. All the optimization is doing is avoiding reallocation costs while building the array of results (by computing length then allocating appropriately sized array for the results.) As shown by bearophile, this shouldn't be assumed in the general case. I ran into an issue where std.parallelism assumed that opIndex was as cheap as popFront, but for my specific range it was easier to compute the next front knowing the previous one than it was to find the element given by the index. If these type of optimizations are important, perhaps there should be some "performance hint tags" for range operations. e.g. popFront on an array range is "cheap" whereas popFront on a MapResult may vary arbitrarily.
Comment #4 by peter.alexander.au — 2015-01-02T17:32:43Z
https://github.com/D-Programming-Language/phobos/pull/2837 For now, I'm limiting the optimization to arrays only. Any other range is conservatively assumed to have arbitrarily expensive iteration, so the normal appender path is used for those.
Comment #5 by github-bugzilla — 2015-01-08T00:27:03Z
Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/f81de7d9b9a9d4d8af4615306b2b759f845bd792 Fix Issue 13877 - join assumes forward range can be cheaply iterated twice e.g. with `r.map!(someExpensiveFunction).join`, `join` would previously iterate twice: once to compute length, and again to build up the result. The extra iteration to compute length may be disproportionately expensive, so we now only precompute length for built-in arrays, since we know those are cheap to iterate. The heuristic could be improved over time, although I can't think of anything better right now. https://issues.dlang.org/show_bug.cgi?id=13877 https://github.com/D-Programming-Language/phobos/commit/ba100ff803f8d266f97e9c02503e19f961be1e7e Merge pull request #2837 from Poita/Issue13877 Fix Issue 13877 - join assumes forward range can be cheaply iterated twice
Comment #6 by github-bugzilla — 2015-02-18T03:41:06Z