Some analysis on speed differences between the three algorithms in charge
application/pdf
207739
Comments
Comment #0 by andrei — 2016-06-24T13:41:28Z
Stepanov discusses in http://www.stepanovpapers.com/PAM.pdf how the usual pow implementation does more computing than strictly necessary and proposes a better version. That has duplicated code, which can also be eliminated as follows:
double newPow(double b, uint e)
{
if (e <= 1)
{
if (e == 1) return b;
return 1;
}
double r = b;
--e;
// Loop invariant: r * (b ^^ e) is the actual result
for (;; e /= 2)
{
if (e % 2)
{
r *= b;
if (e == 1) break;
}
b *= b;
}
return r;
}
void main(string[] args)
{
import std.datetime: benchmark, Duration;
import std.stdio: writeln, writefln;
import std.conv: to;
auto a = 5.0;
auto b = 25;
auto l = 0.0;
void f0() { l += newPow(a, b); }
void f1() { import std.math; l += std.math.pow(a, b); }
void f2() { l += a ^^ b; }
auto rs = benchmark!(f0, f1, f2)(100_000_000);
foreach(j,r;rs)
{
version (GNU)
writefln("%d %d secs %d ms", j, r.seconds(), r.msecs());
else
writeln(j, " ", r.to!Duration);
}
// prevent any optimization
writeln(l);
}
When built with
dmd -O -inline -release gd/powtest.d&& ./powtest
the code above prints:
0 986 ms, 913 μs, and 4 hnsecs
1 2 secs, 321 ms, 606 μs, and 6 hnsecs
2 4 secs, 957 ms, 933 μs, and 5 hnsecs
8.9407e+25
i.e. a large improvement in performance. Therefore the code above should be included in the built-in ^^ operator and pow implementation.
See more discussion, including gdc/ldc, at http://forum.dlang.org/thread/[email protected].
Comment #1 by dlang-bugzilla — 2017-07-02T17:41:25Z
(In reply to Andrei Alexandrescu from comment #0)
> Stepanov discusses
(to clarify, that's Alex Stepanov, not related to D contributor Igor Stepanov)
Comment #2 by bugzilla — 2021-02-02T11:10:25Z
Results on my computer:
dmd:
0 1 sec, 331 ms, 327 μs, and 6 hnsecs
1 2 secs, 940 ms, 293 μs, and 9 hnsecs
2 7 secs, 428 ms, 953 μs, and 5 hnsecs
ldc:
0 330 ms, 3 μs, and 2 hnsecs
1 117 ms, 824 μs, and 1 hnsec
2 1 sec, 324 ms, 530 μs, and 6 hnsecs
IMHO the program compares apples with oranges, that is doubles with reals. Replacing double with real I get:
dmd:
0 2 secs, 335 ms, 325 μs, and 8 hnsecs
1 2 secs, 942 ms, 572 μs, and 9 hnsecs
2 7 secs, 421 ms, 349 μs, and 3 hnsecs
ldc:
0 530 ms, 947 μs, and 1 hnsec
1 117 ms, 768 μs, and 1 hnsec
2 1 sec, 324 ms, 124 μs, and 9 hnsecs
Not so clear, that the new algorithm is superior...
(I did a lot of more speed checking with these algorithms - I will write the results down and post a link here, once I'm done.)
See also https://github.com/dlang/phobos/pull/7783 for results on checking correctness.
Comment #3 by bugzilla — 2021-02-03T11:20:43Z
Created attachment 1815
Some analysis on speed differences between the three algorithms in charge
I added now an attachment with a speed analysis of the three algorithms (stepanovs is not identical to the one given above). Main result: None of the three algorithms is clearly better than the others. (But there are nice diagrams and an other idea on how to improve the speed of the pow function.)
Comment #4 by dlang-bot — 2021-03-01T19:41:14Z
@berni44 created dlang/phobos pull request #7823 "Fix Issue 16200 - Faster pow implementation for integral exponents" fixing this issue:
- Fix Issue 16200 - Faster pow implementation for integral exponents
https://github.com/dlang/phobos/pull/7823
Comment #5 by dlang-bot — 2021-03-09T06:12:11Z
dlang/phobos pull request #7823 "Fix Issue 16200 - Faster pow implementation for integral exponents" was merged into master:
- 1c5ab751434f9c7f2c8be786129cbdd4ad19e98d by berni44:
Fix Issue 16200 - Faster pow implementation for integral exponents
https://github.com/dlang/phobos/pull/7823