Bug 4725 – std.algorithm.sum()

Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2010-08-25T18:10:00Z
Last change time
2014-08-18T23:43:42Z
Assigned to
andrei
Creator
bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2010-08-25T18:10:40Z
Writing Python code shows that computing the sum of a sequence of int/FP values is a very common operation. In D you may write it as: import std.stdio: writeln; import std.algorithm: reduce; void main() { auto arr = new double[10]; arr[] = 1.0; auto t = reduce!q{a + b}(arr); writeln(t); } But I suggest to create std.algorithm.sum() as shorthand, because reduce requires a bit of cognitive burden that is out of place for this so common and basic operation: import std.stdio: writeln; import std.algorithm: sum; void main() { auto arr = new double[10]; arr[] = 1.0; auto t = sum(arr); writeln(t); } As second optional argument sum() may accept the starting value: auto total = sum(range, 0.0);
Comment #1 by bearophile_hugs — 2010-10-11T13:33:52Z
One test case for sum(): import std.algorithm: sum; void main() { bool[] array = [true, false, true, true]; assert(sum(array) == 3); } Currently this doesn't compile: reduce!q{a + b}(array) You need to add a 0 start: reduce!q{a + b}(0, array)
Comment #2 by bearophile_hugs — 2010-12-12T23:41:50Z
sum() may be implemented better than just using reduce() because a smarter sum() may keep inside two variables and sum them in parallel each loop. And then sum the two partial sums at the end and return them. Experiments (and theory) show that on modern CPUs this is more efficient than a normal loop with a single accumulation variable. I mean code like (specialized for random access ranges, that's a very common case worth specializing for, because sum() is a very common operation that needs to be fast): switch (array.length) { case 0: break; case 1: total = array[0]; break; default: total = array[0]; auto total2 = cast(typeof(total))array[1]; int stop = array.length & (~1); for (int i = 2; i < stop; i += 2) { total += array[i]; total2 += array[i + 1]; } total += (array.length % 2 ? (total2 + array[$-1]) : total2); break; } // return total here
Comment #3 by denis.spir — 2010-12-24T11:18:19Z
Comment #4 by denis.spir — 2010-12-24T11:23:28Z
(In reply to comment #2) > sum() may be implemented better than just using reduce() because a smarter > sum() may keep inside two variables and sum them in parallel each loop. And > then sum the two partial sums at the end and return them. Experiments (and > theory) show that on modern CPUs this is more efficient than a normal loop with > a single accumulation variable. > > I mean code like (specialized for random access ranges, that's a very common > case worth specializing for, because sum() is a very common operation that > needs to be fast): > > > switch (array.length) { > case 0: > break; > case 1: > total = array[0]; > break; > default: > total = array[0]; > auto total2 = cast(typeof(total))array[1]; > int stop = array.length & (~1); > for (int i = 2; i < stop; i += 2) { > total += array[i]; > total2 += array[i + 1]; > } > total += (array.length % 2 ? (total2 + array[$-1]) : total2); > break; > } > // return total here You should add a 'case 2:' to avoid complication of a simple and common case. Maybe also avoid // computation under a given number of elements (to be defined). denis
Comment #5 by bearophile_hugs — 2010-12-24T22:55:05Z
(In reply to comment #4) > You should add a 'case 2:' to avoid complication of a simple and common case. > Maybe also avoid // computation under a given number of elements (to be > defined). The code I have shown is tuned with timing experiments.
Comment #6 by bearophile_hugs — 2011-05-15T07:53:34Z
A first slow and not much tested implementation: import std.stdio, std.traits, std.range; template IsSummable(T) { enum bool IsSummable = __traits(compiles, {return T.init + T.init;}); } auto sum(R, T)(R items, T start) if (IsSummable!T) { foreach (item; items) start += item; return start; } auto sum(R)(R items) if (IsSummable!(ForeachType!R)) { alias ForeachType!R T; static if (is(T == cfloat) || is(T == cdouble) || is(T == creal)) T result = 0+0i; else static if (is(T == bool)) int result; else static if (isSomeChar!T) uint result; else T result = 0; foreach (item; items) result += item; return result; } void main() { //assert(sum([]) == 0); assert(sum(new int[0]) == 0); assert(sum([1, 2, 3]) == 6); assert(sum([1.5L, 2.5L, 3.5L]) == 7.5L); assert(sum(iota(0)) == 0); assert(sum(iota(10)) == 45); assert(sum(iota(10)) == 45); assert(sum([true, false, true]) == 2); assert(sum([1+0i, 2+1i]) == 3+1i); ubyte[] a1 = [100, 200]; assert(sum(a1, 0) == 300); assert(sum(a1) == 44); // overflow! assert(sum("hello") == 532); assert(sum([1:10, 2:20]) == 30); }
Comment #7 by andrei — 2011-05-15T08:13:20Z
Hm, I don't get why we should start from scratch instead of reusing reduce. auto sum(R)(R range) if(is(typeof(ElementType!(R).init + ElementType!(R).init))) { return reduce!"a + b"(0, range); }
Comment #8 by bearophile_hugs — 2011-05-15T09:10:38Z
(In reply to comment #7) > Hm, I don't get why we should start from scratch instead of reusing reduce. > > auto sum(R)(R range) > if(is(typeof(ElementType!(R).init + ElementType!(R).init))) > { > return reduce!"a + b"(0, range); > } I suggest to avoid using reduce here to allow a more efficient implementation of sum() (keeping two summing values in parallel), see Comment #2. Note1: I have shown two versions of sum() here, one accepts a starting value too. Note2: I think your code doesn't work with the little unittest (main) I have added, even if you comment out this line sum(a1,0).
Comment #9 by dsimcha — 2011-05-15T09:33:10Z
I'm going to have to agree with Bearophile on this one. Using multiple summation variables is both faster (because it breaks some dependency chains and uses instruction level parallelism better) and more accurate (because you effectively have more precision). Here's a test program: import std.stdio, std.datetime, std.random, std.range, std.traits; // Adapted from dstats.summary.sum(). auto ilpSum(T)(T data) if(isRandomAccessRange!T) { enum nILP = 8; // Empirically optimal on Athlon 64 X2. Unqual!(ElementType!T)[nILP] sum = 0; size_t i = 0; if(data.length > 2 * nILP) { for(; i + nILP < data.length; i += nILP) { foreach(j; 0..nILP) { sum[j] += data[i + j]; } } foreach(j; 1..nILP) { sum[0] += sum[j]; } } for(; i < data.length; i++) { sum[0] += data[i]; } return sum[0]; } auto naiveSum(T)(T data) if(isRandomAccessRange!T) { Unqual!(ElementType!T) ret = 0; foreach(elem; data) { ret += elem; } return ret; } void main() { auto nums = new double[10_000_000]; foreach(ref x; nums) x = uniform(0.0, 1.0); auto sw = StopWatch(AutoStart.yes); auto s = naiveSum(nums); immutable naiveTime = sw.peek.msecs; // Printing out s to make sure the compiler doesn't optimize away the // whole function. writefln("naive: Sum = %s, %s milliseconds.", s, naiveTime); sw.reset(); s = ilpSum(nums); immutable ilpTime = sw.peek.msecs; writefln("ilp: Sum = %s, %s milliseconds.", s, ilpTime); } Results: naive: Sum = 4.9988e+06, 51 milliseconds. ilp: Sum = 4.9988e+06, 33 milliseconds.
Comment #10 by andrei — 2011-05-15T11:36:24Z
No need to teach or convince me of the virtues of ILP. We've long discussed in the newsgroup how associative operations can be significantly accelerated. I was simply replying to the implementation shown, which is a duplication of reduce. Ideally we'd capture more operation than summation in e.g. assoc_reduce. Then sum would become a simple alias.
Comment #11 by dsimcha — 2011-05-15T11:53:59Z
(In reply to comment #10) > No need to teach or convince me of the virtues of ILP. We've long discussed in > the newsgroup how associative operations can be significantly accelerated. I > was simply replying to the implementation shown, which is a duplication of > reduce. Ideally we'd capture more operation than summation in e.g. > assoc_reduce. Then sum would become a simple alias. Great idea! When we figure hot how we want to do this, I should probably incorporate it into std.parallelism, too, since std.parallelism already assumes associativity anyhow.
Comment #12 by andrei — 2011-05-15T12:05:38Z
Probably we need two more reduce primitives, both of which assume associativity. One uses ILP and simple loop unrolling whereas the other uses full-blown threads. There are definitely needs for each.
Comment #13 by dsimcha — 2011-05-15T12:15:45Z
(In reply to comment #12) > Probably we need two more reduce primitives, both of which assume > associativity. One uses ILP and simple loop unrolling whereas the other uses > full-blown threads. There are definitely needs for each. Right. We already have the full blown threads one in std.parallelism. What I'm saying is that, if you're using threads, you're already assuming associativity. Therefore, you may as well use ILP and loop unrolling, too. std.parallelism currently doesn't do this and once we work out the details and create an assocReduce in std.algorithm, these techniques should be ported to std.parallelism.reduce, too.
Comment #14 by bearophile_hugs — 2011-11-24T02:39:05Z
In Fortran 90+ c'e' una sum(): http://www.nsc.liu.se/~boein/f77to90/a5.html#section14 I think sum() needs a specialization for things like float[4], int[4], double[4], ecc. If a float[4] is represented in a SSE register then sum(float[4]) needs just 2 sums.
Comment #15 by bearophile_hugs — 2011-11-25T01:04:35Z
A good signature for sum(): sum(sequence[, dim=0[, start]]) In Fortran's sum() 'dim' is an optional value that defaults to 0 (well to the first index of an array, that is 0 in D), it specifies what dimension to perfom the sum to. On default it acts like the Python sum(). In Fortran you often have arrays of vectors (or arrays of tuples in D), and Fortran supports vector operations similar to D ones, so: VA = [[1,2,3], [4,5,6]] sum(VA) ==> [5, 7, 9] sum(VA, 0) ==> [5, 7, 9] sum(VA, 1) ==> [6, 15] sum(VA, x) ==> Error if x > 1 Example usage, points is an array of double[3] that represent 3D points: totalDistance = sqrt(sum(points[] ^^ 2, dim=0));
Comment #16 by bearophile_hugs — 2012-04-17T14:14:24Z
See also Issue 7934 for an extra improvement.
Comment #17 by andrei — 2013-03-12T22:14:45Z
Comment #18 by bearophile_hugs — 2013-03-13T11:46:47Z
(In reply to comment #17) > https://github.com/D-Programming-Language/phobos/pull/1205 In the patch this the code used to sum in the general case: + Result seed = 0; + return reduce!"a + b"(seed, r); Beside the idea in Issue 7934 , I suggest to add a "static if" that tests if the input is an array (or a random access range) and in such case instead of reduce uses a loop like this: for (int i = 2; i < stop; i += 2) { total += array[i]; total2 += array[i + 1]; }
Comment #19 by bearophile_hugs — 2013-03-25T10:49:55Z
Is it worth adding to sum() an optional argument with the start value, as in the Python sum()? In this program in the first case there is an array of floats and you want to use a real sum for max precision. In the second example there is an array of ints, and the programmer wants to sum inside a long to avoid overflow: import std.stdio, std.algorithm; void main() { float[] a1 = [1.21, 1.3, 1.4]; real s1 = reduce!q{a + b}(0.0L, a1); writefln("%.19f", s1); writefln("%.19f", a1.sum); int[] a2 = [int.max, int.max, int.max]; real s2 = reduce!q{a + b}(0L, a2); writeln(s2); writeln(a2.sum); }
Comment #20 by bearophile_hugs — 2013-05-04T13:27:36Z
An use case. Given some permutations of the chars "ABCD" this program finds the missing one: import std.stdio, std.string, std.algorithm, std.conv, std.range; void main() { const perms = "ABCD CABD ACDB DACB BCDA ACBD ADCB CDAB DABC BCAD CADB CDBA CBAD ABDC ADBC BDCA DCBA BACD BADC BDAC CBDA DBCA DCAB".split; immutable rowSum = perms[0].reduce!q{a + b}; foreach (immutable i; 0 .. perms[0].length) { immutable sumColumns = reduce!q{a + b}(0, perms.transversal(i)); write(cast(char)(rowSum - sumColumns % rowSum)); } writeln; } Output: DBAC Using a sum() function: import std.stdio, std.string, std.algorithm, std.conv, std.range; void main() { const perms = "ABCD CABD ACDB DACB BCDA ACBD ADCB CDAB DABC BCAD CADB CDBA CBAD ABDC ADBC BDCA DCBA BACD BADC BDAC CBDA DBCA DCAB".split; immutable rowSum = perms[0].sum(0); foreach (immutable i; 0 .. perms[0].length) { immutable sumColumns = perms.transversal(i).sum(0); write(cast(char)(rowSum - sumColumns % rowSum)); } writeln; } As in the Python sum() I have added a seed value, in this case the 0 int.
Comment #21 by bearophile_hugs — 2013-06-08T15:15:10Z
Fortran has a built in sum() that supports a second useful optional argument, that specifies the dimension along to compute the sum, it's useful for 2D or nD arrays: ke = 0.5d0 * dot_product(mass, sum(v ** 2, dim=1)) See: http://gcc.gnu.org/onlinedocs/gfortran/SUM.html
Comment #22 by andrei — 2013-06-08T15:24:12Z
Time to merge the pull request that's rotting in there...
Comment #23 by github-bugzilla — 2014-02-14T23:20:08Z
Comment #24 by bearophile_hugs — 2014-02-15T01:44:42Z
See Issue 12169
Comment #25 by bearophile_hugs — 2014-02-15T01:48:32Z
See Issue 12170 for another problem
Comment #26 by bearophile_hugs — 2014-02-15T01:53:08Z
See also Issue 12171
Comment #27 by bearophile_hugs — 2014-02-15T02:01:58Z
Another problem is in Issue 12172
Comment #28 by bearophile_hugs — 2014-02-15T02:35:00Z
See also Issue 12173
Comment #29 by bearophile_hugs — 2014-02-15T02:46:55Z
Probably not related: Issue 12174
Comment #30 by bearophile_hugs — 2014-02-15T02:53:59Z
See also Issue 12175
Comment #31 by bearophile_hugs — 2014-02-15T03:12:07Z
See also Issue 12176
Comment #32 by bearophile_hugs — 2014-02-25T14:30:48Z
The main request is now addressed. There are numerous enhancements/bugs related, but they all have their own issue number. Ready to close? Currently sum is not listed in this page: http://dlang.org/phobos/std_algorithm.html
Comment #33 by andrei — 2014-02-25T15:16:08Z
Yah. For the record, this sux :o).
Comment #34 by blah38621 — 2014-08-18T23:35:20Z
I'll assume that the fact that the only version of sum that this supported is an array specialized version is one of the known issues?
Comment #35 by andrei — 2014-08-18T23:38:37Z
sum should work with all ranges, and get specialized only for specific kinds of ranges. Is that not the case?
Comment #36 by blah38621 — 2014-08-18T23:43:42Z
Ack, woops, sorry, was thinking of the wrong function, it was max that I had an issue with, as it didn't work when I chained it after a map!().