The current [1] version of core.checkedint.mulu is broken. For example, 2^(n/2) * 2^(n/2), where n is the bit width of the integer result, will produce r = 0, but no overflow will be detected.
For the ulong case, I think that short of using floating point or trial division for verification, the best implementation might be to split up the numbers into 32 bit parts and perform the combining step manually (and checking for overflow there).
[1] https://github.com/D-Programming-Language/druntime/blob/12a0deafe2b1a573b489bce1719971fae0b219ff/src/core/checkedint.d#L445
Comment #1 by safety0ff.bugz — 2014-06-22T20:17:51Z
I tried this, I tried checking the partial results individually and combining then checking. The latter worked better (though ldc still didn't combine the mul's)
Attached is the code with the additional unit tests (tests already in checkedint are ommitted.)
// CODE
ulong mulu(ulong x, ulong y, ref bool overflow)
{
uint xlo = cast(uint)x;
uint xhi = cast(uint)(x>>32);
uint ylo = cast(uint)y;
uint yhi = cast(uint)(y>>32);
ulong xlo_ylo = cast(ulong)xlo * ylo;
ulong xhi_ylo = cast(ulong)xhi * ylo;
ulong xlo_yhi = cast(ulong)xlo * yhi;
ulong xhi_yhi = cast(ulong)xhi * yhi;
ulong res_lo = (xhi_ylo<<32) + (xlo_yhi<<32) + xlo_ylo;
// '|' or '||' would suffice
ulong res_hi = xhi_yhi + (xhi_ylo>>32) + (xlo_yhi>>32);
if (res_hi)
overflow = true;
return res_lo;
}
unittest
{
bool overflow;
assert(mulu(1uL<<32,1uL<<32, overflow) == 0);
assert(overflow);
overflow = false;
assert(mulu((1uL<<32) +1,1uL<<32, overflow) == 1uL<<32);
assert(overflow);
overflow = false;
assert(mulu((1uL<<32) +1,(1uL<<32)+1, overflow) == 1+(2uL<<32));
assert(overflow);
overflow = false;
}
Comment #2 by uber.daveb — 2014-06-23T23:02:32Z
That won't work because there can still be overflows when summing the partial products. Consider this test case:
assert(mulu(2uL*uint.max,uint.max,overflow) == 18446744056529682434uL);
assert(overflow);
To do the full thing, I'm pretty sure we essentially need to use addu() (inlined of course) to sum the partials.
I also noticed that addu should be simplified:
((x+y<x) || (x+y<y)) == (x+y < (x|y))
So here's my proposed solution (below): The overflow checks are written in such a way that DMD is able to generate branchless code (I checked the assembler).
uint mulu(uint x, uint y, ref bool overflow)
{
ulong r = cast(ulong)x*y;
overflow |= !!(r >> 32);
return cast(uint)r;
}
ulong mulu(ulong x, ulong y, ref bool overflow)
{
// x = b:a, y = d:c
immutable uint a = cast(uint)x;
immutable uint b = x >> 32;
immutable uint c = cast(uint)y;
immutable uint d = y >> 32;
immutable low = cast(ulong)a*c;
immutable mid1 = cast(ulong)b*c;
immutable mid2 = cast(ulong)a*d;
immutable mid = mid1+mid2;
overflow |= !!b & !!d;
overflow |= mid < (mid1|mid2);
overflow |= !!(mid >> 32);
immutable mid_lo = mid<<32;
immutable r = low + mid_lo;
overflow |= r < (low|mid_lo);
return r;
}
Comment #3 by code — 2014-06-24T14:30:25Z
(In reply to David Bregman from comment #2)
> To do the full thing, I'm pretty sure we essentially need to use addu()
> (inlined of course) to sum the partials.
I agree.
> So here's my proposed solution (below): The overflow checks are written in
> such a way that DMD is able to generate branchless code (I checked the
> assembler).
At first glance, this looks good, but chances are I've missed to check some other edge cases. To this end, it would be awesome if somebody could write a quick test harness that verifies random multiplication results using BigInt arithmetic.
Let me also note that performance does not really matter here, as any such explicit implementation will be much slower than simply using the appropriate hardware instructions (mul/seto or mul/jno/mov). LDC will map the functions onto the respective LLVM intrinsics, and I suppose DMD will do something similar.
Comment #4 by uber.daveb — 2014-06-24T21:21:21Z
If performance is not a concern, then it's much easier - just use a slow division to check:
ulong mulu(ulong x, ulong y, ref bool overflow)
{
immutable xy = x*y;
if(x && xy/x != y) {
overflow = true;
}
return xy;
}
Comment #5 by safety0ff.bugz — 2014-06-25T15:12:11Z
(In reply to David Nadlinger from comment #3)
>
> At first glance, this looks good, but chances are I've missed to check some
> other edge cases. To this end, it would be awesome if somebody could write a
> quick test harness that verifies random multiplication results using BigInt
> arithmetic.
I written some randomized tests using inline assembly, so far so good.
Comment #6 by code — 2014-06-25T15:42:24Z
David, do you want to submit a druntime pull request then?
Comment #7 by uber.daveb — 2014-06-26T05:55:08Z
Thanks for asking, I'd like to but I'm not set up to create pull requests right now. Please go ahead and use any of my code snippets if you like, I don't require any credits for this.
(In reply to bearophile_hugs from comment #9)
> This looks like a case where this could be useful:
> https://d.puremagic.com/issues/show_bug.cgi?id=13014
In theory, yes, but the problem is: What do you use as the reference implementation? The new implementation for detecting the overflow, trial division, is just about the simplest thing that can work. Of course, you could compare it with an inline asm implementation, but platform-specific tests for a platform-independent piece of code seem a bit strange.
Comment #11 by github-bugzilla — 2014-07-14T22:35:42Z