When comparing the DMD-style inline assembler path vs. the generic cephes implementation, observed the following.
These test points:
(1):
[ 0x1.1p+13L, 0x1.29aeffefc8ec645p+12557L ], // near overflow
http://www.wolframalpha.com/input/?i=exp%288704%29
1.256523125474349284688509465050423995871881189122581533e3780L
feqrel(cephes, iasm) = 52
feqrel(iasm, wolfram) = 52
feqrel(cephes, wolfram) = 64
=========
(2):
[-0x1.18p+13L, 0x1.5e4bf54b4806db9p-12927L ], // near underflow
http://www.wolframalpha.com/input/?i=exp%28-8960%29
5.26553067155989234457558352116540147577224389983906948e-3892L
feqrel(cephes, iasm) = 51
feqrel(iasm, wolfram) = 51
feqrel(cephes, wolfram) = 64
=========
(3):
[-0x1.625p+13L, 0x1.a6bd68a39d11f35cp-16358L ], // ditto
http://www.wolframalpha.com/input/?i=exp%28-11338%29
9.31459938556868263111811562594117020130237890448547800e-4925L
feqrel(cephes, iasm) = 53
feqrel(iasm, wolfram) = 53
feqrel(cephes, wolfram) = 64
=========
(4):
[-0x1.62dafp+13L, 0x1.96c53d30277021dp-16383L ], // near underflow - subnormal
http://www.wolframalpha.com/input/?i=exp%28-11355.4%29
2.58487886630776605706860595662918121578535663781714980e-4932L
feqrel(cephes, iasm) = 54
feqrel(iasm, wolfram) = 5
feqrel(cephes, wolfram) = 5
=========
In (1), (2), and (3) the generic version is more accurate than the IASM version, matching bit-for-bit the result given from wolfram.
What is interesting is that in each case, they share up to double precision in common. This suggests that whatever the IASM branch does (I assume this happens in L_subnormal) it truncates or rounds the value to double.
Test (4) is simply way off the mark for both IASM and generic, and should probably be scrapped as it's clear we can't guarantee any reasonable level of accuracy with any lower number. If I understand the documentation correctly, the domain that the exp() function has been tested with is +-10000.
Comment #1 by clugdbug — 2015-11-20T15:53:43Z
Note that these errors occur only in the slow path of the asm exp(). Thus, a more accurate, slower calculation can be added with no impact on speed.
Clearly, there is a loss of precision bug. I am not certain, but the problem may be in the extraction of the integer part, which is obviously limited to 64 bits.
I need to look into this.
This title of this bug report is a bit alarmist though, it's not very generous to describe 52 bit accurate as "utterly wrong". :)
Comment #2 by ibuclaw — 2015-11-20T16:13:38Z
(In reply to Don from comment #1)
> Note that these errors occur only in the slow path of the asm exp(). Thus, a
> more accurate, slower calculation can be added with no impact on speed.
>
> Clearly, there is a loss of precision bug. I am not certain, but the problem
> may be in the extraction of the integer part, which is obviously limited to
> 64 bits.
> I need to look into this.
>
> This title of this bug report is a bit alarmist though, it's not very
> generous to describe 52 bit accurate as "utterly wrong". :)
What about 5 bit accuracy? :-P
Comment #3 by ibuclaw — 2015-11-22T10:56:44Z
Stepped through the path dmd code uses for exp().
(1) 0x1.1p13L - Uses L_normal path
(2) -0x1.18p+13L - Uses L_normal path
(3) -0x1.625p+13L - Uses L_normal path
(4) -0x1.62dafp+13L - Uses L_normal path
Interestingly, the last test exp(-11398) uses the L_subnormal path, and returns results that are correct to wolfram in at least the bits that can be represented.
Also tried exp(-11370) and exp(-11387.4), and on the surface it looks like any near underflow/subnormal that represents a decimal will only be correct up to double precision.
Also tested the first iteration of exp() and it returns the same results.
Comment #4 by robert.schadek — 2024-12-01T16:25:24Z