Bug 7128 – Cartesian product of ranges

Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2011-12-18T09:20:32Z
Last change time
2022-04-03T19:03:06Z
Keywords
bootcamp
Assigned to
Seb
Creator
bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2011-12-18T09:20:32Z
It's useful to have a cartesian product of ranges, like a std.range.product: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=31050 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=31054 See also the handy design of Python itertools.product: http://docs.python.org/library/itertools.html#itertools.product itertools.product(*iterables[, repeat]) Cartesian product of input iterables. product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
Comment #1 by hsteoh — 2013-01-03T21:29:14Z
Comment #2 by bearophile_hugs — 2013-01-03T21:54:44Z
(In reply to comment #1) > https://github.com/D-Programming-Language/phobos/pull/856 I guess the Python "repeat" optional argument is not supported: >>> from itertools import product >>> list(product("abc", repeat=4)) [('a', 'a', 'a', 'a'), ('a', 'a', 'a', 'b'), ('a', 'a', 'a', 'c'), ('a', 'a', 'b', 'a'), ('a', 'a', 'b', 'b'), ...] So you have to write: array(cartesianProduct("abc", "abc", "abc", "abc"))
Comment #3 by bearophile_hugs — 2013-02-05T18:06:03Z
Partially implemented: http://forum.dlang.org/thread/[email protected] Currently three or more arguments are not accepted: import std.stdio, std.algorithm; void main() { cartesianProduct([0,1], [0,1], [0,1]).writeln(); } Another interesting feature of Python itertools.product() that's missing is the optional "repeat" argument: >>> from itertools import product >>> list(product([0,1], repeat=3)) [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)] >>> >>> list(product([0,1], repeat=4)) [(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 0), (0, 0, 1, 1), (0, 1, 0, 0), (0, 1, 0, 1), (0, 1, 1, 0), (0, 1, 1, 1 ), (1, 0, 0, 0), (1, 0, 0, 1), (1, 0, 1, 0), (1, 0, 1, 1), (1, 1, 0, 0), (1, 1, 0, 1), (1, 1, 1, 0), (1, 1, 1, 1)]
Comment #4 by hsteoh — 2013-02-05T18:26:05Z
Now that the basic 2-argument case is working, I'll start looking into the multi-argument case. The simplest implementation is to just alias the variadic version to the 2-argument version chained to a recursive invocation of itself, but this has the disadvantage that the resulting range will have nested tuples rather than a single tuple of multiple values at the top-level. For the repeated case, I'm thinking that with D's compile-time introspection, we can probably implement: auto cartesianPower(int n, R)(R range) { ... } which returns the cartesian product of n copies of the range with. Or maybe: auto cartesianProduct(R)(R range, int repeat) { ... } Don't really know which API is better.
Comment #5 by hsteoh — 2013-02-05T18:32:23Z
P.S. on second thoughts, probably the second version is not implementable right now because we can't compute the type of the return value at runtime.
Comment #6 by bearophile_hugs — 2013-02-05T18:34:12Z
(In reply to comment #4) Thank you for your work. I use cartesian often enough in Python. > but this has > the disadvantage that the resulting range will have nested tuples rather than a > single tuple of multiple values at the top-level. I think this solution is not good enough.
Comment #7 by bearophile_hugs — 2013-02-05T18:37:32Z
(In reply to comment #5) > P.S. on second thoughts, probably the second version is not implementable right > now because we can't compute the type of the return value at runtime. To support this API this cartesianProduct has to return a range of dynamic arrays(this is possible because the types of the items inside the arrays is uniform), but the others return a range of tuples... auto cartesianProduct(R)(R range, int repeat) { ... }
Comment #8 by bearophile_hugs — 2013-02-07T09:33:02Z
See also Issue 6788
Comment #9 by hsteoh — 2013-02-07T09:41:05Z
*** Issue 6788 has been marked as a duplicate of this issue. ***
Comment #10 by bearophile_hugs — 2013-02-07T10:17:40Z
(In reply to comment #9) > *** Issue 6788 has been marked as a duplicate of this issue. *** It's not a dupe.
Comment #11 by hsteoh — 2013-02-07T12:36:01Z
Comment #12 by andrei — 2013-02-07T15:01:41Z
Comment #13 by bearophile_hugs — 2013-02-07T18:01:50Z
The repetition number is not yet supported: auto cartesianProduct(size_t nRepetitions=0, R)(R range) { ... } If the name if the function is different, then it's even possible to support this API, where it returns a range of dynamic arrays: auto cartesianPower(R)(R range, in size_t n) { ... }
Comment #14 by hsteoh — 2013-02-07T21:14:59Z
The power must be a compile-time parameter, otherwise it won't compile. The variadic cartesianProduct only works if the number of arguments is known at compile-time.
Comment #15 by bearophile_hugs — 2013-02-08T04:26:20Z
(In reply to comment #14) > The power must be a compile-time parameter, otherwise it won't compile. I think it's possible to create a range like this: auto cartesianPower(R)(R range, in size_t n) { ... } If you call it with: int n = 3; auto result = cartesianPower([0, 1], n).array(); result should be: [[0, 0, 0], [1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 1], [1, 0, 1], [0, 1, 1], [1, 1, 1]] One use case: generate dstrings (dchar[][]) for a "Bulls and Cows" game, where they need to be composed of N distinct nonzero digits: dchar[][] genCases(in size_t nDigits) { return cartesianPower("123456789"d.dup, nDigits) .filter!(a => a.sort().uniq().walkLength() == nDigits)() .array(); } So genCases(4).writeln() should print: ["4321", "5321", "6321", "7321", "8321", "9321", "3421", "5421", "6421", "7421", "8421", "9421", "3521", "4521", "6521", "7521", "8521", "9521", "3621", "4621", "5621", "7621", "8621", "9621", "3721", "4721", "5721", "6721", "8721", "9721", "3821", "4821", "5821", "6821", "7821", "9821", "3921", "4921", "5921", ...]
Comment #16 by hsteoh — 2013-02-08T20:32:06Z
Hmm, you're right. The cartesian power has the advantage that the output will be tuples of homogenous elements, so we can use arrays instead of tuples, which will allow runtime variation.
Comment #17 by bearophile_hugs — 2013-02-09T03:17:30Z
(In reply to comment #16) > Hmm, you're right. The cartesian power has the advantage that the output will > be tuples of homogenous elements, so we can use arrays instead of tuples, which > will allow runtime variation. Right. On the other hand if we do this, cartesianProduct() and cartesianPower() will have different type (this means: they will be ranges that yield different class of types, the first tuples, the second dynamic arrays). This requires people that want to use such functions to remember this difference (a broken symmetry). So this design decision has a little associated cost. But in the end I prefer cartesianPower() to yield dynamic arrays because dynamic arrays are more flexible, they offer both the size_t n to be a run-time value, and dynamic arrays are more usable than tuples in D (I think tuples are not ranges). Python itertools.cartesian doesn't have such problems because Python is dynamically typed, so the items that itertools.cartesian are usable like normal sequences (unlike D tuples). If you use a cartesianProduct() to produce the data for the "Bulls and cows" game you write something like: auto d9 = "123456789"d.dup; auto choices = cartesianProduct(d9, d9, d9, d9) .map!(t => [t.tupleof])() .filter!(a => a.sort().uniq().walkLength() == 4)() .array(); The map!(t => [t.tupleof])() is used to convert a tuple into a dynamic array, so later on it you can perform more processing, with a filter() and an array(). D tuples don't play very well with ranges. - - - - - - - - - - - Maybe std.typecons.Tuple should define a handy std.typecons.Tuple.arrayof attribute (that returns [this.tupleof]) when all the items of the tuple have the same type.
Comment #18 by bearophile_hugs — 2013-02-09T03:22:18Z
Extra on cartesianProduct: maybe you should add a little benchmark, to be sure the way you have implemented the variadic cartesianProduct() (with the flattening of nested tuples) is efficient/fast enough: enum N = 10; // Change me. int[] items = iota(N).array(); immutable len = cartesianProduct(items, items, items, items, items).walkLength();
Comment #19 by bearophile_hugs — 2014-07-10T21:46:08Z
This pull has fixed the most glaring problems of cartesianProduct: https://github.com/D-Programming-Language/phobos/pull/2276 I leave this ER open for the "repeat" argument request (compile time argument or run-time one), and for the desire of benchmarking.
Comment #20 by bearophile_hugs — 2015-01-08T09:22:57Z
Thinking some more about this. Currently the "repeat" argument needs to be a template argument: cartesianProduct!2([0, 1]) ==> tuple(0,0), tuple(0,1), tuple(1,0), tuple(1,1) But it's probably better to change everything, so now I think it's better to have a differently named function with a different API. Something like: cartesianPower!true([0, 1], 2) => [0,0], [0,1], [1,0], [1,1] cartesianPower!false([0, 1], 2) => [0,0], [0,1], [1,0], [1,1] This means it yields arrays instead of tuples, and the power argument is a run-time argument. The boolean template argument is true by default, and it's named doCopy, if it's true every output array is a new array, otherwise if doCopy is false, it's always the same array with different contents. cartesianPower works efficiently, it's not recursive. If you like this idea, I can close down this issue, and open a new enhancement request for cartesianPower.
Comment #21 by bearophile_hugs — 2015-01-08T09:43:43Z
A third optional argument can be a "buffer" array, if it's large enough (as much as the "power" argument or larger), it gets used to make cartesianPower a @nogc function: cartesianPower!false([0, 1], 2, buffer)
Comment #22 by bearophile_hugs — 2015-01-08T09:50:55Z
The 3-arguments cartesianPower!false overload can't be annotated with @nogc because the given buffer array can be shorter than the "power" argument. Unless you generate a run-time error in this case.
Comment #23 by bearophile_hugs — 2015-01-10T23:45:44Z
This doesn't use an efficient algorithm, but it shows the kind of API I was thinking of (it can be a random access range): struct CartesianPower(bool doCopy=true, T) { T[] items; uint repeat; T[] row; uint i, maxN; this(T[] items_, in uint repeat_, T[] buffer) pure nothrow @safe @nogc { this.items = items_; this.repeat = repeat_; row = buffer[0 .. repeat]; row[] = items[0]; maxN = items.length ^^ repeat; } static if (doCopy) { @property T[] front() pure nothrow @safe @nogc { return row.dup; } } else { @property T[] front() pure nothrow @safe @nogc { return row; } } @property bool empty() pure nothrow @safe @nogc { return i >= maxN; } void popFront() pure nothrow @safe @nogc { i++; if (empty) return; uint n = i; size_t count = repeat - 1; while (n) { row[count] = items[n % items.length]; count--; n /= items.length; } } } auto cartesianPower(bool doCopy=true, T)(T[] items, in uint repeat) pure nothrow @safe { return CartesianPower!(doCopy, T)(items, repeat, new T[repeat]); } auto cartesianPower(bool doCopy=true, T)(T[] items, in uint repeat, T[] buffer) pure nothrow @safe @nogc { if (buffer.length >= repeat) { return CartesianPower!(doCopy, T)(items, repeat, buffer); } else { static immutable err = new Error("buffer.length < repeat"); throw err; } } void main() @nogc { import core.stdc.stdio; int[3] items = [10, 20, 30]; int[4] buf; foreach (p; cartesianPower!false(items, 4, buf)) printf("(%d, %d, %d, %d)\n", p[0], p[1], p[2], p[3]); }
Comment #24 by greeenify — 2016-03-01T01:12:01Z
As @quickfur linked to this: I put a PR for cartesianProduct together - so maybe this issue can be tackled :) https://github.com/D-Programming-Language/phobos/pull/4026
Comment #25 by greeenify — 2016-10-15T11:46:59Z
@andralex: as there was no positive decision on the PR it's now part of Mir (http://docs.mir.dlang.io/latest/mir_combinatorics.html) with a couple of other goodies (more functions, allocator support). So there is no need to use this as bootcamp task as this is already done and reviewed and it was only blocked by your "No" a couple of months ago. If you want to have it in Phobos, I am happy to reopen the regarding PR again ;-)
Comment #26 by andrei — 2016-10-15T12:25:01Z
@greenify thanks, I'v bootcamped it so it gets looked at again :o).
Comment #27 by josipp — 2022-04-03T19:01:52Z
*** Issue 6788 has been marked as a duplicate of this issue. ***
Comment #28 by josipp — 2022-04-03T19:03:06Z
Considered done with the introduction of `std.algorithm.setops.cartesianProduct`