Bug 5849 – std.random.dice is better as a range

Status
NEW
Severity
enhancement
Priority
P4
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2011-04-16T18:06:59Z
Last change time
2024-12-01T16:14:03Z
Keywords
bootcamp
Assigned to
No Owner
Creator
bearophile_hugs
Moved to GitHub: phobos#9901 →

Attachments

IDFilenameSummaryContent-TypeSize
1060fast_dice4.dFirst draft of a fast dice rangetext/plain5989
1066fast_dice.dVersion 0.5 of the fast dicetext/plain6526

Comments

Comment #0 by bearophile_hugs — 2011-04-16T18:06:59Z
When I have to use std.random.dice the frequencies don't change often, but I usually have to compute many random values efficiently. Calling dice() many times is not efficient, because it performs some preprocessing of the given frequencies. So I suggest to replace the function dice(xxx) with a range that generates the results. So the current usage of: dice(70, 20, 10) gets replaced by: dice(70, 20, 10).front And you are able to write: take(dice(70, 20, 10), 5) The good thing of this generator is its performance compared to the function. See the performance difference between the two following implementations (5.3 seconds for the first version and 0.7 seconds for the second one). ------------------------ import std.stdio, std.random, std.string; void main() { enum int N = 10_000_000; enum pr = [1/5., 1/6., 1/7., 1/8., 1/9., 1/10., 1/11., 1759/27720.]; double[pr.length] counts = 0.0; foreach (i; 0 .. N) counts[dice(pr)]++; foreach (i, p; pr) writefln("%.8f %.8f", p, counts[i] / N); } ------------------------ import std.stdio, std.random, std.string; void main() { enum int N = 10_000_000; enum pr = [1/5., 1/6., 1/7., 1/8., 1/9., 1/10., 1/11., 1759/27720.]; double[pr.length] cumulatives = pr[]; foreach (i, ref c; cumulatives[1 .. $-1]) c += cumulatives[i]; cumulatives[$-1] = 1.0; double[pr.length] counts = 0.0; auto rnd = Xorshift(unpredictableSeed()); foreach (i; 0 .. N) { double rnum = rnd.front() / cast(double)typeof(rnd.front()).max; rnd.popFront(); int j; for ( ; rnum > cumulatives[j]; j++) {} counts[j]++; } foreach (i, p; pr) writefln("%.8f %.8f", p, counts[i] / N); } ------------------------- See also some other improvements I have suggested for std.random.
Comment #1 by bearophile_hugs — 2011-04-24T14:26:40Z
See also issue 5883
Comment #2 by bearophile_hugs — 2011-12-29T15:37:50Z
Comment #3 by andrei — 2011-12-29T16:00:18Z
You may want to package contributions as pull requests. Thanks!
Comment #4 by bearophile_hugs — 2012-01-04T14:58:04Z
Created attachment 1066 Version 0.5 of the fast dice > You may want to package contributions as pull requests. Thanks! Thank you. I think the current code is not yet fit for a pull request, there are some things to be done/solved first: - Create statistical unittests. - Is popLast() overkill/not useful enough? - Is the Range correct now? - Is uint[] enough for mAlias? - ddoc comments are missing still. - Currently the Dice constructor lacks a pre-condition: I think the input probabilities must sum to 1.0. But is this necessary? An alternative design is to accept any array of numbers (integers too), and normalize their sum to 1.0 inside the Dice constructor itself.
Comment #5 by bearophile_hugs — 2012-05-28T19:19:55Z
Comment #6 by bearophile_hugs — 2012-07-01T22:09:25Z
Comment #7 by bearophile_hugs — 2013-08-31T17:44:25Z
If we don't want to break the API of dice() making it a range, a simple solution is to call the dice range "diceRange" and keep both (but perhaps dice becomes a small function that returns diceRange.front). Another interesting suggestion: in my code I've seen that most times I want to use a dice/diceRange I have to map its result on the items of an array: import std.stdio, std.random; void main() { immutable probabilities = [0.2, 0.6, 0.1, 0.1]; immutable values = "ACGT"; foreach (_; 0 .. 10) values[probabilities.dice].write; } That outputs something similar to: CGCCAAACCC So an improvement for diceRange() is to accept as first argument the range of probabilities, and as second optional argument a range of items. If the second argument is not given, then it generates integers as dice(). If the second argument is given, then it yields the items, according to their probabilities. So that program becomes: import std.stdio, std.random, std.range; void main() { immutable probabilities = [0.2, 0.6, 0.1, 0.1]; immutable values = "ACGT"; probabilities.diceRange(values).take(10).writeln; } This is quite handy.
Comment #8 by bearophile_hugs — 2013-08-31T17:47:46Z
(In reply to comment #7) > import std.stdio, std.random, std.range; > void main() { > immutable probabilities = [0.2, 0.6, 0.1, 0.1]; > immutable values = "ACGT"; > probabilities.diceRange(values).take(10).writeln; > } > > > This is quite handy. If the second optional argument is not supported, then the code becomes: import std.stdio, std.random, std.range; void main() { immutable probabilities = [0.2, 0.6, 0.1, 0.1]; immutable values = "ACGT"; probabilities.diceRange.map!(i => values[i]).take(10).writeln; } But I am not sure it's strictly equivalent.
Comment #9 by bearophile_hugs — 2013-08-31T19:47:44Z
Also, why is the random generator engine the first optional argument of dice(), while it's the last for uniform()?
Comment #10 by bearophile_hugs — 2013-11-22T04:43:56Z
See also: http://www.keithschwarz.com/interesting/code/?dir=alias-method Once paid the time to create the range, all the successive popFront() become just something like this (where for speed nextDouble is not a call to uniform(), but it's more like uniform01): int column = random.nextInt(probability.length); boolean coinToss = random.nextDouble() < probability[column]; return coinToss ? column : alias[column];
Comment #11 by greensunny12 — 2018-03-31T15:28:02Z
FYI: Here's my implementation of the alias method for mir: https://github.com/libmir/mir/blob/da76cf406d06957e472b9ba90b4c90b917480cb9/source/mir/random/discrete.d (it's Boost-licensed)
Comment #12 by robert.schadek — 2024-12-01T16:14:03Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9901 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB