← Back to index | Original Bugzilla link

Bug 4725 – std.algorithm.sum()

Status: RESOLVED
Resolution: FIXED
Severity: enhancement
Priority: P2
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2010-08-25T18:10:00Z
Last change time: 2014-08-18T23:43:42Z
Assigned to: andrei
Creator: bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2010-08-25T18:10:40Z

Writing Python code shows that computing the sum of a sequence of int/FP values is a very common operation. In D you may write it as: import std.stdio: writeln; import std.algorithm: reduce; void main() { auto arr = new double[10]; arr[] = 1.0; auto t = reduce!q{a + b}(arr); writeln(t); } But I suggest to create std.algorithm.sum() as shorthand, because reduce requires a bit of cognitive burden that is out of place for this so common and basic operation: import std.stdio: writeln; import std.algorithm: sum; void main() { auto arr = new double[10]; arr[] = 1.0; auto t = sum(arr); writeln(t); } As second optional argument sum() may accept the starting value: auto total = sum(range, 0.0);

Comment #1 by bearophile_hugs — 2010-10-11T13:33:52Z

One test case for sum(): import std.algorithm: sum; void main() { bool[] array = [true, false, true, true]; assert(sum(array) == 3); } Currently this doesn't compile: reduce!q{a + b}(array) You need to add a 0 start: reduce!q{a + b}(0, array)

Comment #2 by bearophile_hugs — 2010-12-12T23:41:50Z

sum() may be implemented better than just using reduce() because a smarter sum() may keep inside two variables and sum them in parallel each loop. And then sum the two partial sums at the end and return them. Experiments (and theory) show that on modern CPUs this is more efficient than a normal loop with a single accumulation variable. I mean code like (specialized for random access ranges, that's a very common case worth specializing for, because sum() is a very common operation that needs to be fast): switch (array.length) { case 0: break; case 1: total = array[0]; break; default: total = array[0]; auto total2 = cast(typeof(total))array[1]; int stop = array.length & (~1); for (int i = 2; i < stop; i += 2) { total += array[i]; total2 += array[i + 1]; } total += (array.length % 2 ? (total2 + array[$-1]) : total2); break; } // return total here

Comment #3 by denis.spir — 2010-12-24T11:18:19Z

See comment to issue 4705 http://d.puremagic.com/issues/show_bug.cgi?id=4705

Comment #4 by denis.spir — 2010-12-24T11:23:28Z

(In reply to comment #2) > sum() may be implemented better than just using reduce() because a smarter > sum() may keep inside two variables and sum them in parallel each loop. And > then sum the two partial sums at the end and return them. Experiments (and > theory) show that on modern CPUs this is more efficient than a normal loop with > a single accumulation variable. > > I mean code like (specialized for random access ranges, that's a very common > case worth specializing for, because sum() is a very common operation that > needs to be fast): > > > switch (array.length) { > case 0: > break; > case 1: > total = array[0]; > break; > default: > total = array[0]; > auto total2 = cast(typeof(total))array[1]; > int stop = array.length & (~1); > for (int i = 2; i < stop; i += 2) { > total += array[i]; > total2 += array[i + 1]; > } > total += (array.length % 2 ? (total2 + array[$-1]) : total2); > break; > } > // return total here You should add a 'case 2:' to avoid complication of a simple and common case. Maybe also avoid // computation under a given number of elements (to be defined). denis

Comment #5 by bearophile_hugs — 2010-12-24T22:55:05Z

(In reply to comment #4) > You should add a 'case 2:' to avoid complication of a simple and common case. > Maybe also avoid // computation under a given number of elements (to be > defined). The code I have shown is tuned with timing experiments.

Comment #6 by bearophile_hugs — 2011-05-15T07:53:34Z

A first slow and not much tested implementation: import std.stdio, std.traits, std.range; template IsSummable(T) { enum bool IsSummable = __traits(compiles, {return T.init + T.init;}); } auto sum(R, T)(R items, T start) if (IsSummable!T) { foreach (item; items) start += item; return start; } auto sum(R)(R items) if (IsSummable!(ForeachType!R)) { alias ForeachType!R T; static if (is(T == cfloat) || is(T == cdouble) || is(T == creal)) T result = 0+0i; else static if (is(T == bool)) int result; else static if (isSomeChar!T) uint result; else T result = 0; foreach (item; items) result += item; return result; } void main() { //assert(sum([]) == 0); assert(sum(new int[0]) == 0); assert(sum([1, 2, 3]) == 6); assert(sum([1.5L, 2.5L, 3.5L]) == 7.5L); assert(sum(iota(0)) == 0); assert(sum(iota(10)) == 45); assert(sum(iota(10)) == 45); assert(sum([true, false, true]) == 2); assert(sum([1+0i, 2+1i]) == 3+1i); ubyte[] a1 = [100, 200]; assert(sum(a1, 0) == 300); assert(sum(a1) == 44); // overflow! assert(sum("hello") == 532); assert(sum([1:10, 2:20]) == 30); }

Comment #7 by andrei — 2011-05-15T08:13:20Z

Hm, I don't get why we should start from scratch instead of reusing reduce. auto sum(R)(R range) if(is(typeof(ElementType!(R).init + ElementType!(R).init))) { return reduce!"a + b"(0, range); }

Comment #8 by bearophile_hugs — 2011-05-15T09:10:38Z

(In reply to comment #7) > Hm, I don't get why we should start from scratch instead of reusing reduce. > > auto sum(R)(R range) > if(is(typeof(ElementType!(R).init + ElementType!(R).init))) > { > return reduce!"a + b"(0, range); > } I suggest to avoid using reduce here to allow a more efficient implementation of sum() (keeping two summing values in parallel), see Comment #2. Note1: I have shown two versions of sum() here, one accepts a starting value too. Note2: I think your code doesn't work with the little unittest (main) I have added, even if you comment out this line sum(a1,0).

Comment #9 by dsimcha — 2011-05-15T09:33:10Z

I'm going to have to agree with Bearophile on this one. Using multiple summation variables is both faster (because it breaks some dependency chains and uses instruction level parallelism better) and more accurate (because you effectively have more precision). Here's a test program: import std.stdio, std.datetime, std.random, std.range, std.traits; // Adapted from dstats.summary.sum(). auto ilpSum(T)(T data) if(isRandomAccessRange!T) { enum nILP = 8; // Empirically optimal on Athlon 64 X2. Unqual!(ElementType!T)[nILP] sum = 0; size_t i = 0; if(data.length > 2 * nILP) { for(; i + nILP < data.length; i += nILP) { foreach(j; 0..nILP) { sum[j] += data[i + j]; } } foreach(j; 1..nILP) { sum[0] += sum[j]; } } for(; i < data.length; i++) { sum[0] += data[i]; } return sum[0]; } auto naiveSum(T)(T data) if(isRandomAccessRange!T) { Unqual!(ElementType!T) ret = 0; foreach(elem; data) { ret += elem; } return ret; } void main() { auto nums = new double[10_000_000]; foreach(ref x; nums) x = uniform(0.0, 1.0); auto sw = StopWatch(AutoStart.yes); auto s = naiveSum(nums); immutable naiveTime = sw.peek.msecs; // Printing out s to make sure the compiler doesn't optimize away the // whole function. writefln("naive: Sum = %s, %s milliseconds.", s, naiveTime); sw.reset(); s = ilpSum(nums); immutable ilpTime = sw.peek.msecs; writefln("ilp: Sum = %s, %s milliseconds.", s, ilpTime); } Results: naive: Sum = 4.9988e+06, 51 milliseconds. ilp: Sum = 4.9988e+06, 33 milliseconds.

Comment #10 by andrei — 2011-05-15T11:36:24Z

No need to teach or convince me of the virtues of ILP. We've long discussed in the newsgroup how associative operations can be significantly accelerated. I was simply replying to the implementation shown, which is a duplication of reduce. Ideally we'd capture more operation than summation in e.g. assoc_reduce. Then sum would become a simple alias.

Comment #11 by dsimcha — 2011-05-15T11:53:59Z

(In reply to comment #10) > No need to teach or convince me of the virtues of ILP. We've long discussed in > the newsgroup how associative operations can be significantly accelerated. I > was simply replying to the implementation shown, which is a duplication of > reduce. Ideally we'd capture more operation than summation in e.g. > assoc_reduce. Then sum would become a simple alias. Great idea! When we figure hot how we want to do this, I should probably incorporate it into std.parallelism, too, since std.parallelism already assumes associativity anyhow.

Comment #12 by andrei — 2011-05-15T12:05:38Z

Probably we need two more reduce primitives, both of which assume associativity. One uses ILP and simple loop unrolling whereas the other uses full-blown threads. There are definitely needs for each.

Comment #13 by dsimcha — 2011-05-15T12:15:45Z

(In reply to comment #12) > Probably we need two more reduce primitives, both of which assume > associativity. One uses ILP and simple loop unrolling whereas the other uses > full-blown threads. There are definitely needs for each. Right. We already have the full blown threads one in std.parallelism. What I'm saying is that, if you're using threads, you're already assuming associativity. Therefore, you may as well use ILP and loop unrolling, too. std.parallelism currently doesn't do this and once we work out the details and create an assocReduce in std.algorithm, these techniques should be ported to std.parallelism.reduce, too.

Comment #14 by bearophile_hugs — 2011-11-24T02:39:05Z

In Fortran 90+ c'e' una sum(): http://www.nsc.liu.se/~boein/f77to90/a5.html#section14 I think sum() needs a specialization for things like float[4], int[4], double[4], ecc. If a float[4] is represented in a SSE register then sum(float[4]) needs just 2 sums.

Comment #15 by bearophile_hugs — 2011-11-25T01:04:35Z

A good signature for sum(): sum(sequence[, dim=0[, start]]) In Fortran's sum() 'dim' is an optional value that defaults to 0 (well to the first index of an array, that is 0 in D), it specifies what dimension to perfom the sum to. On default it acts like the Python sum(). In Fortran you often have arrays of vectors (or arrays of tuples in D), and Fortran supports vector operations similar to D ones, so: VA = [[1,2,3], [4,5,6]] sum(VA) ==> [5, 7, 9] sum(VA, 0) ==> [5, 7, 9] sum(VA, 1) ==> [6, 15] sum(VA, x) ==> Error if x > 1 Example usage, points is an array of double[3] that represent 3D points: totalDistance = sqrt(sum(points[] ^^ 2, dim=0));

Comment #16 by bearophile_hugs — 2012-04-17T14:14:24Z

See also Issue 7934 for an extra improvement.

Comment #17 by andrei — 2013-03-12T22:14:45Z

https://github.com/D-Programming-Language/phobos/pull/1205

Comment #18 by bearophile_hugs — 2013-03-13T11:46:47Z

(In reply to comment #17) > https://github.com/D-Programming-Language/phobos/pull/1205 In the patch this the code used to sum in the general case: + Result seed = 0; + return reduce!"a + b"(seed, r); Beside the idea in Issue 7934 , I suggest to add a "static if" that tests if the input is an array (or a random access range) and in such case instead of reduce uses a loop like this: for (int i = 2; i < stop; i += 2) { total += array[i]; total2 += array[i + 1]; }

Comment #19 by bearophile_hugs — 2013-03-25T10:49:55Z

Is it worth adding to sum() an optional argument with the start value, as in the Python sum()? In this program in the first case there is an array of floats and you want to use a real sum for max precision. In the second example there is an array of ints, and the programmer wants to sum inside a long to avoid overflow: import std.stdio, std.algorithm; void main() { float[] a1 = [1.21, 1.3, 1.4]; real s1 = reduce!q{a + b}(0.0L, a1); writefln("%.19f", s1); writefln("%.19f", a1.sum); int[] a2 = [int.max, int.max, int.max]; real s2 = reduce!q{a + b}(0L, a2); writeln(s2); writeln(a2.sum); }

Comment #20 by bearophile_hugs — 2013-05-04T13:27:36Z

An use case. Given some permutations of the chars "ABCD" this program finds the missing one: import std.stdio, std.string, std.algorithm, std.conv, std.range; void main() { const perms = "ABCD CABD ACDB DACB BCDA ACBD ADCB CDAB DABC BCAD CADB CDBA CBAD ABDC ADBC BDCA DCBA BACD BADC BDAC CBDA DBCA DCAB".split; immutable rowSum = perms[0].reduce!q{a + b}; foreach (immutable i; 0 .. perms[0].length) { immutable sumColumns = reduce!q{a + b}(0, perms.transversal(i)); write(cast(char)(rowSum - sumColumns % rowSum)); } writeln; } Output: DBAC Using a sum() function: import std.stdio, std.string, std.algorithm, std.conv, std.range; void main() { const perms = "ABCD CABD ACDB DACB BCDA ACBD ADCB CDAB DABC BCAD CADB CDBA CBAD ABDC ADBC BDCA DCBA BACD BADC BDAC CBDA DBCA DCAB".split; immutable rowSum = perms[0].sum(0); foreach (immutable i; 0 .. perms[0].length) { immutable sumColumns = perms.transversal(i).sum(0); write(cast(char)(rowSum - sumColumns % rowSum)); } writeln; } As in the Python sum() I have added a seed value, in this case the 0 int.

Comment #21 by bearophile_hugs — 2013-06-08T15:15:10Z

Fortran has a built in sum() that supports a second useful optional argument, that specifies the dimension along to compute the sum, it's useful for 2D or nD arrays: ke = 0.5d0 * dot_product(mass, sum(v ** 2, dim=1)) See: http://gcc.gnu.org/onlinedocs/gfortran/SUM.html

Comment #22 by andrei — 2013-06-08T15:24:12Z

Time to merge the pull request that's rotting in there...

Comment #23 by github-bugzilla — 2014-02-14T23:20:08Z

Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/1564671acef88478a8eec7e0ecfcf794023c58d6 Fix Issue 4725 - std.algorithm.sum() https://github.com/D-Programming-Language/phobos/commit/acdb1d0a9ae9ce2b40ee16b3d45c42817cd00b7e Merge pull request #1205 from andralex/4725 Fix Issue 4725 - std.algorithm.sum()

Comment #24 by bearophile_hugs — 2014-02-15T01:44:42Z

See Issue 12169

Comment #25 by bearophile_hugs — 2014-02-15T01:48:32Z

See Issue 12170 for another problem

Comment #26 by bearophile_hugs — 2014-02-15T01:53:08Z