← Back to index | Original Bugzilla link

Bug 4125 – std.numeric.gcd can use a binary GCD

Status: RESOLVED
Resolution: FIXED
Severity: enhancement
Priority: P2
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2010-04-24T16:57:00Z
Last change time: 2017-01-16T23:25:44Z
Keywords: bootcamp
Assigned to: nobody
Creator: bearophile_hugs

Attachments

ID	Filename	Summary	Content-Type	Size
1313	gcd_bench.d	Faster GCD	application/octet-stream	6539

Comments

Comment #0 by bearophile_hugs — 2010-04-24T16:57:35Z

std.numeric.gcd can use a faster Binary GCD algorithm, especially when the input type is unsigned. This page has both C code (and asm, but the C code is probably enough in many situations): http://en.wikipedia.org/wiki/Binary_GCD_algorithm

Comment #1 by Marco.Leise — 2012-01-31T12:11:35Z

Replace uint with ulong for the longer version ;) I use this and it is notably faster than what I used before. uint gcd(uint u, uint v) { int shift; if (u == 0 || v == 0) return u | v; for (shift = 0; ((u | v) & 1) == 0; ++shift) { u >>= 1; v >>= 1; } while ((u & 1) == 0) u >>= 1; do { while ((v & 1) == 0) v >>= 1; if (u < v) { v -= u; } else { uint diff = u - v; u = v; v = diff; } v >>= 1; } while (v != 0); return u << shift; }

Comment #2 by peter.alexander.au — 2013-01-05T11:15:04Z

(In reply to comment #1) > Replace uint with ulong for the longer version ;) > I use this and it is notably faster than what I used before. I implemented this (exactly as you have it) and it was slower than the algorithm that is already there. I tested on all pairs of integers below 10,000, and also on the pairs (x^2, y) for all x,y < 10,000. At best it was 50% slower, at worst 3x slower. All tests used dmd -O -release -inline I suspect the reason for the performance reduction is due to poor pipelining. The binary version involves a lot more branching, and more loop iterations than the standard algorithm. Also, the branches taken are highly unpredictable. Maybe I'll look at this again in the future to try and make it faster, but it's pretty low on my priority list.

Comment #3 by bearophile_hugs — 2013-01-06T03:44:33Z

(In reply to comment #2) > I implemented this (exactly as you have it) and it was slower than the > algorithm that is already there. I tested on all pairs of integers below > 10,000, and also on the pairs (x^2, y) for all x,y < 10,000. At best it was 50% > slower, at worst 3x slower. > ... > Maybe I'll look at this again in the future to try and make it faster, but it's > pretty low on my priority list. Thank you for doing some experiments. Once the experiments are conclusive, this enhancement can be closed. (Then at the moment a more important function for Phobos is an efficient GCD for bigints.)

Comment #4 by clugdbug — 2013-01-08T02:37:41Z

FWIW, you can get rid of most of the conditional branches by using: min(u,v) = v + ( (cast(int)(u-v)) >> (8*int.sizeof - 1)) & (u-v) the shift smears the sign bit of u-v so that it makes a mask either 0x0000_0000 or 0xFFFF_FFFF. I think the general consensus is that (at least if you use asm), binary GCD is faster on all known processors, but not necessarily by a large amount.

Comment #5 by lomereiter — 2013-01-08T06:42:51Z

(In reply to comment #0) > std.numeric.gcd can use a faster Binary GCD algorithm, especially when the > input type is unsigned. This page has both C code (and asm, but the C code is > probably enough in many situations): > > http://en.wikipedia.org/wiki/Binary_GCD_algorithm Maybe instead of reinventing the wheel LibTomMath library should be used? It is in public domain, has decent performance, and is stable enough to provide implementation of big integers in TCL and Rubinius.

Comment #6 by bearophile_hugs — 2014-01-07T18:04:17Z

Created attachment 1313 Faster GCD Code for a binary GCD that on LDC2 is about twice faster on uint values, and it's rather faster with dmd too. Timings ("gcd" is the Phobos one): DMD 2.061alpha (32 bit): gcd, time = 1924 gcd_recursive, time = 1937 gcd_iterative_sub, time = 2931 gcd_iterative_mod, time = 1948 gcd_binary, time = 3357 gcd_binary_mod, time = 2136 gcd_binary_2, time = 1406 gcd_binary_2, time = 1391 Results sum: 1284258816 LDC2 (32 bit): gcd, time = 1930 gcd_recursive, time = 1924 gcd_iterative_sub, time = 3148 gcd_iterative_mod, time = 1926 gcd_binary, time = 2635 gcd_binary_mod, time = 2036 gcd_binary_2, time = 1461 gcd_binary_3, time = 1026 Results sum: 1284258816

Comment #7 by alexandru.razvan.c — 2016-12-09T15:46:15Z

After conducting some benchmarks we arrived to the conclusion that currently the dmd compiler works best with Euclid's algorithm for GCD, otherwise we use Stein's algorithm for ldc or gdc. I tested both the previously shared benchmarks on the forum and a couple of my own and noted that gcd_binary2 implemented by [email protected] has the best results, so I used that. PR: https://github.com/dlang/phobos/pull/4940

Comment #8 by github-bugzilla — 2016-12-12T14:20:02Z

Commits pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/6b4c2585fe5d25488a2df6a7b9b7c49031c63253 Fix Issue 4125 - std.numeric.gcd can use a binary GCD https://github.com/dlang/phobos/commit/19445fc71e8aabdbd42f0ad8a571a57601a5ff39 Merge pull request #4940 from Darredevil/issue-4125 Fix Issue 4125 - std.numeric.gcd can use a binary GCD

Comment #9 by github-bugzilla — 2017-01-07T03:03:03Z

Commits pushed to stable at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/6b4c2585fe5d25488a2df6a7b9b7c49031c63253 Fix Issue 4125 - std.numeric.gcd can use a binary GCD https://github.com/dlang/phobos/commit/19445fc71e8aabdbd42f0ad8a571a57601a5ff39 Merge pull request #4940 from Darredevil/issue-4125

Comment #10 by github-bugzilla — 2017-01-16T23:25:44Z

Commits pushed to newCTFE at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/6b4c2585fe5d25488a2df6a7b9b7c49031c63253 Fix Issue 4125 - std.numeric.gcd can use a binary GCD https://github.com/dlang/phobos/commit/19445fc71e8aabdbd42f0ad8a571a57601a5ff39 Merge pull request #4940 from Darredevil/issue-4125