Bug 12958 – core.checkedint.mulu is broken

Status
RESOLVED
Resolution
FIXED
Severity
critical
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
x86
OS
All
Creation time
2014-06-20T11:45:00Z
Last change time
2014-08-22T07:17:22Z
Keywords
pull
Assigned to
nobody
Creator
code

Comments

Comment #0 by code — 2014-06-20T11:45:30Z
The current [1] version of core.checkedint.mulu is broken. For example, 2^(n/2) * 2^(n/2), where n is the bit width of the integer result, will produce r = 0, but no overflow will be detected. For the ulong case, I think that short of using floating point or trial division for verification, the best implementation might be to split up the numbers into 32 bit parts and perform the combining step manually (and checking for overflow there). [1] https://github.com/D-Programming-Language/druntime/blob/12a0deafe2b1a573b489bce1719971fae0b219ff/src/core/checkedint.d#L445
Comment #1 by safety0ff.bugz — 2014-06-22T20:17:51Z
I tried this, I tried checking the partial results individually and combining then checking. The latter worked better (though ldc still didn't combine the mul's) Attached is the code with the additional unit tests (tests already in checkedint are ommitted.) // CODE ulong mulu(ulong x, ulong y, ref bool overflow) { uint xlo = cast(uint)x; uint xhi = cast(uint)(x>>32); uint ylo = cast(uint)y; uint yhi = cast(uint)(y>>32); ulong xlo_ylo = cast(ulong)xlo * ylo; ulong xhi_ylo = cast(ulong)xhi * ylo; ulong xlo_yhi = cast(ulong)xlo * yhi; ulong xhi_yhi = cast(ulong)xhi * yhi; ulong res_lo = (xhi_ylo<<32) + (xlo_yhi<<32) + xlo_ylo; // '|' or '||' would suffice ulong res_hi = xhi_yhi + (xhi_ylo>>32) + (xlo_yhi>>32); if (res_hi) overflow = true; return res_lo; } unittest { bool overflow; assert(mulu(1uL<<32,1uL<<32, overflow) == 0); assert(overflow); overflow = false; assert(mulu((1uL<<32) +1,1uL<<32, overflow) == 1uL<<32); assert(overflow); overflow = false; assert(mulu((1uL<<32) +1,(1uL<<32)+1, overflow) == 1+(2uL<<32)); assert(overflow); overflow = false; }
Comment #2 by uber.daveb — 2014-06-23T23:02:32Z
That won't work because there can still be overflows when summing the partial products. Consider this test case: assert(mulu(2uL*uint.max,uint.max,overflow) == 18446744056529682434uL); assert(overflow); To do the full thing, I'm pretty sure we essentially need to use addu() (inlined of course) to sum the partials. I also noticed that addu should be simplified: ((x+y<x) || (x+y<y)) == (x+y < (x|y)) So here's my proposed solution (below): The overflow checks are written in such a way that DMD is able to generate branchless code (I checked the assembler). uint mulu(uint x, uint y, ref bool overflow) { ulong r = cast(ulong)x*y; overflow |= !!(r >> 32); return cast(uint)r; } ulong mulu(ulong x, ulong y, ref bool overflow) { // x = b:a, y = d:c immutable uint a = cast(uint)x; immutable uint b = x >> 32; immutable uint c = cast(uint)y; immutable uint d = y >> 32; immutable low = cast(ulong)a*c; immutable mid1 = cast(ulong)b*c; immutable mid2 = cast(ulong)a*d; immutable mid = mid1+mid2; overflow |= !!b & !!d; overflow |= mid < (mid1|mid2); overflow |= !!(mid >> 32); immutable mid_lo = mid<<32; immutable r = low + mid_lo; overflow |= r < (low|mid_lo); return r; }
Comment #3 by code — 2014-06-24T14:30:25Z
(In reply to David Bregman from comment #2) > To do the full thing, I'm pretty sure we essentially need to use addu() > (inlined of course) to sum the partials. I agree. > So here's my proposed solution (below): The overflow checks are written in > such a way that DMD is able to generate branchless code (I checked the > assembler). At first glance, this looks good, but chances are I've missed to check some other edge cases. To this end, it would be awesome if somebody could write a quick test harness that verifies random multiplication results using BigInt arithmetic. Let me also note that performance does not really matter here, as any such explicit implementation will be much slower than simply using the appropriate hardware instructions (mul/seto or mul/jno/mov). LDC will map the functions onto the respective LLVM intrinsics, and I suppose DMD will do something similar.
Comment #4 by uber.daveb — 2014-06-24T21:21:21Z
If performance is not a concern, then it's much easier - just use a slow division to check: ulong mulu(ulong x, ulong y, ref bool overflow) { immutable xy = x*y; if(x && xy/x != y) { overflow = true; } return xy; }
Comment #5 by safety0ff.bugz — 2014-06-25T15:12:11Z
(In reply to David Nadlinger from comment #3) > > At first glance, this looks good, but chances are I've missed to check some > other edge cases. To this end, it would be awesome if somebody could write a > quick test harness that verifies random multiplication results using BigInt > arithmetic. I written some randomized tests using inline assembly, so far so good.
Comment #6 by code — 2014-06-25T15:42:24Z
David, do you want to submit a druntime pull request then?
Comment #7 by uber.daveb — 2014-06-26T05:55:08Z
Thanks for asking, I'd like to but I'm not set up to create pull requests right now. Please go ahead and use any of my code snippets if you like, I don't require any credits for this.
Comment #8 by bugzilla — 2014-07-14T20:45:05Z
Comment #9 by bearophile_hugs — 2014-07-14T20:53:22Z
(In reply to Walter Bright from comment #8) > https://github.com/D-Programming-Language/druntime/pull/890 This looks like a case where this could be useful: https://d.puremagic.com/issues/show_bug.cgi?id=13014
Comment #10 by code — 2014-07-14T20:57:26Z
(In reply to bearophile_hugs from comment #9) > This looks like a case where this could be useful: > https://d.puremagic.com/issues/show_bug.cgi?id=13014 In theory, yes, but the problem is: What do you use as the reference implementation? The new implementation for detecting the overflow, trial division, is just about the simplest thing that can work. Of course, you could compare it with an inline asm implementation, but platform-specific tests for a platform-independent piece of code seem a bit strange.
Comment #11 by github-bugzilla — 2014-07-14T22:35:42Z
Commits pushed to master at https://github.com/D-Programming-Language/druntime https://github.com/D-Programming-Language/druntime/commit/3787f9b79d807631af37a986a8a164a192413a1b Fix issue 12958 - core.checkedint.mulu is broken. This implements the (uint, uint) case using ulong arithmetic and the (ulong, ulong) case using trial division. See Bugzilla for an ulong implementation by David Bregman that does not require a division. As the functions are expected to be implemented as compiler intrinsics to make use of architecture-specific primitives anyway, the simpler version has been chosen to reduce the potential for bugs, even though it is slower. https://github.com/D-Programming-Language/druntime/commit/7d2a2ef77fd07796ef1e96ec1ef5fac9016f511d Merge pull request #890 from klickverbot/fix-checkedint-umul Fix issue 12958 - core.checkedint.mulu is broken.
Comment #12 by github-bugzilla — 2014-07-15T10:26:22Z
Commit pushed to 2.066 at https://github.com/D-Programming-Language/druntime https://github.com/D-Programming-Language/druntime/commit/e2c9626e0668bdce88a2ec44034bbb1ed1528c61 Merge pull request #890 from klickverbot/fix-checkedint-umul Fix issue 12958 - core.checkedint.mulu is broken.
Comment #13 by github-bugzilla — 2014-08-22T07:17:22Z