← Back to index | Original Bugzilla link

Bug 8247 – Inconsistent behaviour of randomSample depending on whether a random number generator is specified

Status: NEW
Severity: normal
Priority: P3
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2012-06-14T12:27:53Z
Last change time: 2024-12-01T16:15:10Z
Keywords: bootcamp
Assigned to: No Owner
Creator: Joseph Rushton Wakeling

Attachments

ID	Filename	Summary	Content-Type	Size
1116	sampleinconsistencies.d	Working minimal example illustrating the inconsistencies described.	text/x-dsrc	765

Comments

Comment #0 by joseph.wakeling — 2012-06-14T12:27:53Z

Created attachment 1116 Working minimal example illustrating the inconsistencies described. The randomSample function in std.random can be called with or without specifying a random number generator to use. If no RNG is specified, then each lazy evaluation of the sample evaluates differently, i.e. if you do sample1 = randomSample(iota(0, 100), 5); writeln(sample1); writeln(sample1); writeln(sample1); you will get 3 different samples. Conversely, if a random number generator is specified, you will get 3 times the same result: sample2 = randomSample(iota(0, 100), 5, Random(unpredictableSeed)); writeln(sample2); writeln(sample2); writeln(sample2); Note that the seeding of the RNG is important, because if an already-existing RNG is provided to create multiple different samples, they will evaluate identically, e.g. sample3 = randomSample(iota(0, 100), 5, rndGen); writeln(sample3); sample4 = randomSample(iota(0, 100), 5, rndGen); writeln(sample4); sample5 = randomSample(iota(0, 100), 5, rndGen); writeln(sample5); ... will produce the same output 3 times. This happens because the RNG passed to randomSample is copied rather than used by reference. These inconsistencies lead to a lot of potential confusion and sources of bugs. So, first of all, we need a firm decision on how the lazy evaluation of RandomSample should behave -- should it (1) always evaluate to the same sample, or (2) always evaluate to a different sample? ... and depending on the answer, we then need to address how to specify and seed an RNG for RandomSample.

Comment #1 by joseph.wakeling — 2012-06-14T12:35:44Z

Online discussion on this: http://forum.dlang.org/thread/[email protected]

Comment #2 by jens.k.mueller — 2012-06-14T13:41:13Z

I opt for the returning the same sample (option 1). I want the sample to stay the same.

Comment #3 by issues.dlang — 2012-06-14T15:27:33Z

If you want randomSample to be consistent as to which you get, it needs to be made to handle both reference and value type random number generating ranges identically, since they could be either. At present, all of those in std.random are value types, which is actually a problem in general. They really should reference types. But regardless of which they are, there's nothing stopping someone from implementing either a value type or reference type range which is a random number generator, in which case you'll get inconsistent behavior if randomSample doesn't code for using both by using save where appropriate.

Comment #4 by jens.k.mueller — 2012-06-15T00:54:20Z

Now I see why you want to pass RNG by reference. Because you may want that two functions share the same generator. But then I would go with passing them all by reference for consistency reasons. And all functions have as default argument rndGen() which could be renamed to defaultRNG(). randomShuffle is already doing it this way. Though I don't see why it sets the template argument RandomGen to Random by default. This should be inferred automatically by the default argument rndGen() anyway. So randomCover and randomSample should follow the same approach. I do not see why one needs to pass a RNG by value then. Admittedly I have never used std.random. So I may have wrong use cases in mind. But having a thread local RNG that is used by default should be okay. @Jonathan Why should a RNG type have reference semantics? I think it's fine to pass them by reference where needed.

Comment #5 by issues.dlang — 2012-06-15T01:05:40Z

> Why should a RNG type have reference semantics? I think it's fine to pass them by reference where needed. Because it makes no sense for it to have value semantics. Take this for example auto func(R)(R r) { r.popFront(); auto var1 = r.front; ///... } func(generator); generator.popFront(); auto var2 = generator.front; Both var1 and var2 will have the exact same value. This is an easy mistake to make, and since random number generators are supposed to be returning random numbers, having them return the _same_ number after popFront has been called is definitely problematic. By making them reference types, the only time that you get the same number multiple times in a row is when you do it on purpose (e.g. by storing the value of front or by calling save on the range). There's a discussion on this in issue# 7067.

Comment #6 by jens.k.mueller — 2012-06-15T01:22:31Z

(In reply to comment #5) > > Why should a RNG type have reference semantics? I think it's fine to pass them > by reference where needed. > > Because it makes no sense for it to have value semantics. Take this for example > > auto func(R)(R r) > { > r.popFront(); > auto var1 = r.front; > ///... > } > > func(generator); > generator.popFront(); > auto var2 = generator.front; > > Both var1 and var2 will have the exact same value. This is an easy mistake to > make, and since random number generators are supposed to be returning random > numbers, having them return the _same_ number after popFront has been called is > definitely problematic. > > By making them reference types, the only time that you get the same number > multiple times in a row is when you do it on purpose (e.g. by storing the value > of front or by calling save on the range). > > There's a discussion on this in issue# 7067. I see. Thanks. Since passing around RNGs should be by default by reference RNGs should be reference types. Otherwise everybody writing own functions accepting RNGs has to use ref which is error-prone. Using ref when passing RNGs in std.random won't solve this general design issue.

Comment #7 by robert.schadek — 2024-12-01T16:15:10Z

THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9591 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB