Bug 4125 – std.numeric.gcd can use a binary GCD

Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2010-04-24T16:57:00Z
Last change time
2017-01-16T23:25:44Z
Keywords
bootcamp
Assigned to
nobody
Creator
bearophile_hugs

Attachments

IDFilenameSummaryContent-TypeSize
1313gcd_bench.dFaster GCDapplication/octet-stream6539

Comments

Comment #0 by bearophile_hugs — 2010-04-24T16:57:35Z
std.numeric.gcd can use a faster Binary GCD algorithm, especially when the input type is unsigned. This page has both C code (and asm, but the C code is probably enough in many situations): http://en.wikipedia.org/wiki/Binary_GCD_algorithm
Comment #1 by Marco.Leise — 2012-01-31T12:11:35Z
Replace uint with ulong for the longer version ;) I use this and it is notably faster than what I used before. uint gcd(uint u, uint v) { int shift; if (u == 0 || v == 0) return u | v; for (shift = 0; ((u | v) & 1) == 0; ++shift) { u >>= 1; v >>= 1; } while ((u & 1) == 0) u >>= 1; do { while ((v & 1) == 0) v >>= 1; if (u < v) { v -= u; } else { uint diff = u - v; u = v; v = diff; } v >>= 1; } while (v != 0); return u << shift; }
Comment #2 by peter.alexander.au — 2013-01-05T11:15:04Z
(In reply to comment #1) > Replace uint with ulong for the longer version ;) > I use this and it is notably faster than what I used before. I implemented this (exactly as you have it) and it was slower than the algorithm that is already there. I tested on all pairs of integers below 10,000, and also on the pairs (x^2, y) for all x,y < 10,000. At best it was 50% slower, at worst 3x slower. All tests used dmd -O -release -inline I suspect the reason for the performance reduction is due to poor pipelining. The binary version involves a lot more branching, and more loop iterations than the standard algorithm. Also, the branches taken are highly unpredictable. Maybe I'll look at this again in the future to try and make it faster, but it's pretty low on my priority list.
Comment #3 by bearophile_hugs — 2013-01-06T03:44:33Z
(In reply to comment #2) > I implemented this (exactly as you have it) and it was slower than the > algorithm that is already there. I tested on all pairs of integers below > 10,000, and also on the pairs (x^2, y) for all x,y < 10,000. At best it was 50% > slower, at worst 3x slower. > ... > Maybe I'll look at this again in the future to try and make it faster, but it's > pretty low on my priority list. Thank you for doing some experiments. Once the experiments are conclusive, this enhancement can be closed. (Then at the moment a more important function for Phobos is an efficient GCD for bigints.)
Comment #4 by clugdbug — 2013-01-08T02:37:41Z
FWIW, you can get rid of most of the conditional branches by using: min(u,v) = v + ( (cast(int)(u-v)) >> (8*int.sizeof - 1)) & (u-v) the shift smears the sign bit of u-v so that it makes a mask either 0x0000_0000 or 0xFFFF_FFFF. I think the general consensus is that (at least if you use asm), binary GCD is faster on all known processors, but not necessarily by a large amount.
Comment #5 by lomereiter — 2013-01-08T06:42:51Z
(In reply to comment #0) > std.numeric.gcd can use a faster Binary GCD algorithm, especially when the > input type is unsigned. This page has both C code (and asm, but the C code is > probably enough in many situations): > > http://en.wikipedia.org/wiki/Binary_GCD_algorithm Maybe instead of reinventing the wheel LibTomMath library should be used? It is in public domain, has decent performance, and is stable enough to provide implementation of big integers in TCL and Rubinius.
Comment #6 by bearophile_hugs — 2014-01-07T18:04:17Z
Created attachment 1313 Faster GCD Code for a binary GCD that on LDC2 is about twice faster on uint values, and it's rather faster with dmd too. Timings ("gcd" is the Phobos one): DMD 2.061alpha (32 bit): gcd, time = 1924 gcd_recursive, time = 1937 gcd_iterative_sub, time = 2931 gcd_iterative_mod, time = 1948 gcd_binary, time = 3357 gcd_binary_mod, time = 2136 gcd_binary_2, time = 1406 gcd_binary_2, time = 1391 Results sum: 1284258816 LDC2 (32 bit): gcd, time = 1930 gcd_recursive, time = 1924 gcd_iterative_sub, time = 3148 gcd_iterative_mod, time = 1926 gcd_binary, time = 2635 gcd_binary_mod, time = 2036 gcd_binary_2, time = 1461 gcd_binary_3, time = 1026 Results sum: 1284258816
Comment #7 by alexandru.razvan.c — 2016-12-09T15:46:15Z
After conducting some benchmarks we arrived to the conclusion that currently the dmd compiler works best with Euclid's algorithm for GCD, otherwise we use Stein's algorithm for ldc or gdc. I tested both the previously shared benchmarks on the forum and a couple of my own and noted that gcd_binary2 implemented by [email protected] has the best results, so I used that. PR: https://github.com/dlang/phobos/pull/4940
Comment #8 by github-bugzilla — 2016-12-12T14:20:02Z
Commits pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/6b4c2585fe5d25488a2df6a7b9b7c49031c63253 Fix Issue 4125 - std.numeric.gcd can use a binary GCD https://github.com/dlang/phobos/commit/19445fc71e8aabdbd42f0ad8a571a57601a5ff39 Merge pull request #4940 from Darredevil/issue-4125 Fix Issue 4125 - std.numeric.gcd can use a binary GCD
Comment #9 by github-bugzilla — 2017-01-07T03:03:03Z
Commits pushed to stable at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/6b4c2585fe5d25488a2df6a7b9b7c49031c63253 Fix Issue 4125 - std.numeric.gcd can use a binary GCD https://github.com/dlang/phobos/commit/19445fc71e8aabdbd42f0ad8a571a57601a5ff39 Merge pull request #4940 from Darredevil/issue-4125
Comment #10 by github-bugzilla — 2017-01-16T23:25:44Z
Commits pushed to newCTFE at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/6b4c2585fe5d25488a2df6a7b9b7c49031c63253 Fix Issue 4125 - std.numeric.gcd can use a binary GCD https://github.com/dlang/phobos/commit/19445fc71e8aabdbd42f0ad8a571a57601a5ff39 Merge pull request #4940 from Darredevil/issue-4125