Working minimal example illustrating the inconsistencies described.
text/x-dsrc
765
Comments
Comment #0 by joseph.wakeling — 2012-06-14T12:27:53Z
Created attachment 1116
Working minimal example illustrating the inconsistencies described.
The randomSample function in std.random can be called with or without
specifying a random number generator to use.
If no RNG is specified, then each lazy evaluation of the sample evaluates
differently, i.e. if you do
sample1 = randomSample(iota(0, 100), 5);
writeln(sample1);
writeln(sample1);
writeln(sample1);
you will get 3 different samples.
Conversely, if a random number generator is specified, you will get 3 times the
same result:
sample2 = randomSample(iota(0, 100), 5, Random(unpredictableSeed));
writeln(sample2);
writeln(sample2);
writeln(sample2);
Note that the seeding of the RNG is important, because if an already-existing
RNG is provided to create multiple different samples, they will evaluate
identically, e.g.
sample3 = randomSample(iota(0, 100), 5, rndGen);
writeln(sample3);
sample4 = randomSample(iota(0, 100), 5, rndGen);
writeln(sample4);
sample5 = randomSample(iota(0, 100), 5, rndGen);
writeln(sample5);
... will produce the same output 3 times. This happens because the RNG passed
to randomSample is copied rather than used by reference.
These inconsistencies lead to a lot of potential confusion and sources of bugs.
So, first of all, we need a firm decision on how the lazy evaluation of
RandomSample should behave -- should it
(1) always evaluate to the same sample, or
(2) always evaluate to a different sample?
... and depending on the answer, we then need to address how to specify and
seed an RNG for RandomSample.
Comment #1 by joseph.wakeling — 2012-06-14T12:35:44Z
Comment #2 by jens.k.mueller — 2012-06-14T13:41:13Z
I opt for the returning the same sample (option 1). I want the sample to stay the same.
Comment #3 by issues.dlang — 2012-06-14T15:27:33Z
If you want randomSample to be consistent as to which you get, it needs to be made to handle both reference and value type random number generating ranges identically, since they could be either. At present, all of those in std.random are value types, which is actually a problem in general. They really should reference types. But regardless of which they are, there's nothing stopping someone from implementing either a value type or reference type range which is a random number generator, in which case you'll get inconsistent behavior if randomSample doesn't code for using both by using save where appropriate.
Comment #4 by jens.k.mueller — 2012-06-15T00:54:20Z
Now I see why you want to pass RNG by reference. Because you may want that two functions share the same generator.
But then I would go with passing them all by reference for consistency reasons. And all functions have as default argument rndGen() which could be renamed to defaultRNG().
randomShuffle is already doing it this way. Though I don't see why it sets the template argument RandomGen to Random by default. This should be inferred automatically by the default argument rndGen() anyway.
So randomCover and randomSample should follow the same approach.
I do not see why one needs to pass a RNG by value then. Admittedly I have never used std.random. So I may have wrong use cases in mind.
But having a thread local RNG that is used by default should be okay.
@Jonathan
Why should a RNG type have reference semantics? I think it's fine to pass them by reference where needed.
Comment #5 by issues.dlang — 2012-06-15T01:05:40Z
> Why should a RNG type have reference semantics? I think it's fine to pass them
by reference where needed.
Because it makes no sense for it to have value semantics. Take this for example
auto func(R)(R r)
{
r.popFront();
auto var1 = r.front;
///...
}
func(generator);
generator.popFront();
auto var2 = generator.front;
Both var1 and var2 will have the exact same value. This is an easy mistake to make, and since random number generators are supposed to be returning random numbers, having them return the _same_ number after popFront has been called is definitely problematic.
By making them reference types, the only time that you get the same number multiple times in a row is when you do it on purpose (e.g. by storing the value of front or by calling save on the range).
There's a discussion on this in issue# 7067.
Comment #6 by jens.k.mueller — 2012-06-15T01:22:31Z
(In reply to comment #5)
> > Why should a RNG type have reference semantics? I think it's fine to pass them
> by reference where needed.
>
> Because it makes no sense for it to have value semantics. Take this for example
>
> auto func(R)(R r)
> {
> r.popFront();
> auto var1 = r.front;
> ///...
> }
>
> func(generator);
> generator.popFront();
> auto var2 = generator.front;
>
> Both var1 and var2 will have the exact same value. This is an easy mistake to
> make, and since random number generators are supposed to be returning random
> numbers, having them return the _same_ number after popFront has been called is
> definitely problematic.
>
> By making them reference types, the only time that you get the same number
> multiple times in a row is when you do it on purpose (e.g. by storing the value
> of front or by calling save on the range).
>
> There's a discussion on this in issue# 7067.
I see. Thanks.
Since passing around RNGs should be by default by reference RNGs should be reference types. Otherwise everybody writing own functions accepting RNGs has to use ref which is error-prone.
Using ref when passing RNGs in std.random won't solve this general design issue.
Comment #7 by robert.schadek — 2024-12-01T16:15:10Z