Bug 19705 – Static foreach slow for numeric ranges
Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86
OS
Windows
Creation time
2019-02-27T09:09:32Z
Last change time
2020-06-28T14:20:24Z
Keywords
performance, pull
Assigned to
No Owner
Creator
Simen Kjaeraas
Comments
Comment #0 by simen.kjaras — 2019-02-27T09:09:32Z
For whatever reason, static foreach is a lot slower when iterating over a numeric range like 0..N than when iterating over a tuple. On my machine the following examples take around 11 seconds for the 0..N cases and 5-6 seconds when iterating over a tuple.
unittest {
enum N = 10000;
alias A = generate!N;
static foreach (i; 0..N) {} // A 11s
static foreach (i; 0..A.Length) {} // B 11s
static foreach (i; A) {} // C 5s
static foreach (i, _; A) {} // D 5s
}
template generate(int n) {
import std.meta : AliasSeq;
mixin("alias generate = AliasSeq!("~{
string result;
static foreach (i; 0..n) {
result ~= i.stringof~", ";
}
return result;
}()~");");
}
Clearly, D should be no faster than A, since it does more.
Comment #1 by uplink.coder — 2019-02-27T13:50:24Z
I can shed light on the reason, at least.
So static foreach is conceptually just syntactic sugar over tuple foreach.
As such it works in two steps
1. Create a type-tuple from the numeric range.
2. Iterate the type-tuple.
To iterate a tuple we need to expand tuples which may be elements of the tuple
such that: tuple(1,2,3,tuple(4,5,6)) becomes tuple(1,2,3,4,5,6);
that process has to be done lazily since a tuple could be generated on the fly to model infinite sequences.
therefore iteration over three elements would go like this
(ResultElems[] elems;
elem[0] = iterateTuple(t); resetTuple(t);
iterateTuple(t); elems[1] = iterateTuple(t); resetTuple(t);
iterateTuple(t); iterateTuple(t); elems[2] = iterateTuple(t); resetTuple(t);
)
as you can this O(n!) (factorial)
and that's why it's slow.
Templates suffer from the same problem, and the compiler does some work to memorize previous instances which effectively diminishes the factorial term (it's technically still there though and can dominate if N gets large enough!)
One way to slove this is to keep tuple_expansion state when iterating/expanding the tuple which would effectively diminish the factorial term rather well.
Comment #2 by boris2.9 — 2020-06-22T08:17:03Z
I made a change that improve these cases.
https://github.com/dlang/dmd/pull/11303
Now:
static foreach (i; 0..10000) {}
is faster than:
alias A = generate!10000;
static foreach (i; A) {}
as it should be.
Comment #3 by dlang-bot — 2020-06-25T11:07:13Z
@BorisCarvajal updated dlang/dmd pull request #11303 "static foreach over array and numeric range speed up by ~7x" fixing this issue:
- Fix Issue 19705 - Static foreach slow for numeric ranges
https://github.com/dlang/dmd/pull/11303
Comment #4 by dlang-bot — 2020-06-28T14:20:24Z
dlang/dmd pull request #11303 "static foreach over array and numeric range speed up by ~7x" was merged into master:
- ef66363709b9de8bef5f48ba56dc1c1a1c545770 by Boris Carvajal:
Fix Issue 19705 - Static foreach slow for numeric ranges
https://github.com/dlang/dmd/pull/11303