← Back to index | Original Bugzilla link

Bug 13877 – Problem with map+join

Status: RESOLVED
Resolution: FIXED
Severity: normal
Priority: P1
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2014-12-19T14:16:00Z
Last change time: 2015-02-18T03:41:06Z
Assigned to: peter.alexander.au
Creator: bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2014-12-19T14:16:00Z

This shows a different computational complexity using apparently equivalent range-based code: import std.stdio: writeln; import std.algorithm: map, join; uint count1, count2; const(int)[] foo1(in int[] data, in int i, in int max) { count1++; if (i < max) { typeof(return) result; foreach (immutable n; data) result ~= foo1(data, i + 1, max); return result; } else { return data; } } const(int)[] foo2(in int[] data, in int i, in int max) { count2++; if (i < max) { return data.map!(n => foo2(data, i + 1, max)).join; } else { return data; } } void main() { const r1 = foo1([1, 2, 3, 4, 5], 1, 7); writeln(count1); // 19531 const r2 = foo2([1, 2, 3, 4, 5], 1, 7); writeln(count2); // 1111111 assert(r1 == r2); // Results are equally correct. } I think such performance traps should not be present using ranges. The right complexity is restored using: return data.map!(n => foo2(data, i + 1, max)).cache.joiner.array;

Comment #1 by bearophile_hugs — 2014-12-19T14:16:43Z

The work-around was found by "anon": http://forum.dlang.org/thread/[email protected]#post-rincbqgpzcezcuiksvos:40forum.dlang.org

Comment #2 by ketmar — 2014-12-19T15:01:25Z

btw, with such modification both variants becomes equivalent: auto stripSave(RT) (auto ref RT rng) { static struct R { RT r; @disable void save (); alias r this; } return R(rng); } const(int)[] foo2 (in int[] data, in int i, in int max) { count2++; if (i < max) { return data.map!(n => foo2(data, i + 1, max)).stripSave.join; } else { return data; } } the code turns MapResult to non-forward range, and then join doesn't calculate everything twice. the question is: should MapResult has '.save' here in the first place? or maybe we need some API — as `stripSave` to explicitly remove features from ranges? it's very obvious to the programmer that recalculating range values is costly in this case, so he can just remove `.save` to stop Phobos using it.

Comment #3 by safety0ff.bugz — 2014-12-20T17:00:57Z

IMO the problem is that join assumes that traversal of the range of ranges is cheap. All the optimization is doing is avoiding reallocation costs while building the array of results (by computing length then allocating appropriately sized array for the results.) As shown by bearophile, this shouldn't be assumed in the general case. I ran into an issue where std.parallelism assumed that opIndex was as cheap as popFront, but for my specific range it was easier to compute the next front knowing the previous one than it was to find the element given by the index. If these type of optimizations are important, perhaps there should be some "performance hint tags" for range operations. e.g. popFront on an array range is "cheap" whereas popFront on a MapResult may vary arbitrarily.

Comment #4 by peter.alexander.au — 2015-01-02T17:32:43Z

https://github.com/D-Programming-Language/phobos/pull/2837 For now, I'm limiting the optimization to arrays only. Any other range is conservatively assumed to have arbitrarily expensive iteration, so the normal appender path is used for those.

Comment #5 by github-bugzilla — 2015-01-08T00:27:03Z

Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/f81de7d9b9a9d4d8af4615306b2b759f845bd792 Fix Issue 13877 - join assumes forward range can be cheaply iterated twice e.g. with `r.map!(someExpensiveFunction).join`, `join` would previously iterate twice: once to compute length, and again to build up the result. The extra iteration to compute length may be disproportionately expensive, so we now only precompute length for built-in arrays, since we know those are cheap to iterate. The heuristic could be improved over time, although I can't think of anything better right now. https://issues.dlang.org/show_bug.cgi?id=13877 https://github.com/D-Programming-Language/phobos/commit/ba100ff803f8d266f97e9c02503e19f961be1e7e Merge pull request #2837 from Poita/Issue13877 Fix Issue 13877 - join assumes forward range can be cheaply iterated twice

Comment #6 by github-bugzilla — 2015-02-18T03:41:06Z

Commits pushed to 2.067 at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/f81de7d9b9a9d4d8af4615306b2b759f845bd792 Fix Issue 13877 - join assumes forward range can be cheaply iterated twice https://github.com/D-Programming-Language/phobos/commit/ba100ff803f8d266f97e9c02503e19f961be1e7e Merge pull request #2837 from Poita/Issue13877