Comment #0 by bearophile_hugs — 2011-09-02T03:11:35Z
In the std.random module I suggest to use Xorshift as default generator (the one used when a generator is not specified, like in uniform(0,5)).
Two or three times I have had to hunt for performance problems in D code, and in the end the cause was the high-quality but slow Mersenne Twister used on default. Replacing it with the Xorshift has solved my problems, because it is significantly faster. And the quality of numbers generated by Xorshift was enough for all my purposes, it's better than the linear congruences generator.
The disadvantage of using Xorshift on default is that where you need really high quality random numbers, the default is not enough any more. But I think such situations are not common. If in your code you need very high quality random numbers, then I think you know it, and you know to use the Mersenne Twister (or even better).
Comment #1 by joseph.wakeling — 2013-07-05T06:58:35Z
Have to register a disagreement here. The default behaviour should be statistically reliable -- choosing a faster but lower-quality RNG should be done advisedly. We shouldn't reproduce the awful situation of C/C++ where the default RNG is awful and users have to be educated to avoid it.
However, we could think about easy ways to specify the default RNG type Random at compile-time.
Comment #2 by monarchdodra — 2013-07-05T07:41:11Z
MersenneTwister is supposed to be fast though...
I think the problem could have something to do with the whole "PRNG value semantic" fiasco:
pragma(msg, Mt19937.sizeof); //2504u
pragma(msg, Xorshift.sizeof); //16u
The mersenne twister range has a size of 2.5 KB (!)
The performance penalty most probably comes from passing it (by value) to algorithms or whatnot.
I know for a fact this *will* cause significant slowdown. I observed this while implementing a Lagged Fibonacci PRNG, whose size varies a lot. The bigger flavors of the PRNG would make the runtime grind to a halt, while the biggest versions simply stack overflowed.
----
All this to say, I think it is not the algorithm of mersenne itself that is slow, but rather the range implementation. Until this is fixed, we *may* want to change the default.
Comment #3 by joseph.wakeling — 2013-07-05T08:37:38Z
(In reply to comment #2)
> The performance penalty most probably comes from passing it (by value) to
> algorithms or whatnot.
If you're passing it by value, you have bigger problems than slowdown ... :-)
> I know for a fact this *will* cause significant slowdown. I observed this while
> implementing a Lagged Fibonacci PRNG, whose size varies a lot. The bigger
> flavors of the PRNG would make the runtime grind to a halt, while the biggest
> versions simply stack overflowed.
>
> ----
> All this to say, I think it is not the algorithm of mersenne itself that is
> slow, but rather the range implementation. Until this is fixed, we *may* want
> to change the default.
Fully agree that the real solution here is to change the RNGs to reference types. I don't think switching default to Xorshift is a good idea, though, even as a short term method. Apart from any other consideration, Issue #10550 should give us pause on that idea.
Comment #4 by bearophile_hugs — 2013-07-08T01:49:01Z
(In reply to comment #1)
> Have to register a disagreement here. The default behaviour should be
> statistically reliable -- choosing a faster but lower-quality RNG should be
> done advisedly.
> I don't think switching default to Xorshift is a good idea, though,
> even as a short term method. Apart from any other consideration, Issue
> #10550 should give us pause on that idea.
I have to agree.
Closed.
Comment #5 by joseph.wakeling — 2013-07-08T06:09:36Z
Comment #6 by bearophile_hugs — 2013-07-08T18:10:07Z
(In reply to comment #5)
> I also recently came across some further work extending MT to work with SIMD
> and GPU implementations, which might be worth looking into:
> http://www.math.sci.hiroshima-u.ac.jp/~saito/articles/sfmt.pdf
The SIMD MT seems very fit for Phobos (including its block-generation and the variant that generates directly two doubles at a time).