← Back to index | Original Bugzilla link

Bug 6788 – std.algorithm.combinations

Status: RESOLVED
Resolution: DUPLICATE
Severity: enhancement
Priority: P2
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2011-10-07T16:19:10Z
Last change time: 2022-04-03T19:01:52Z
Keywords: bootcamp
Assigned to: No Owner
Creator: bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2011-10-07T16:19:10Z

This D2 program contains a very common iteration patten: import std.stdio, std.typecons; void main() { auto data = [1, 2, 3, 4]; foreach (i, x1; data) foreach (x2; data[i+1 .. $]) writeln(tuple(x1, x2)); // do something with x1 and x2 } It prints (dmd 2.056head): Tuple!(int,int)(1, 2) Tuple!(int,int)(1, 3) Tuple!(int,int)(1, 4) Tuple!(int,int)(2, 3) Tuple!(int,int)(2, 4) Tuple!(int,int)(3, 4) As you see it is quite different from "zip". So I suggest to add a std.range.pairwise range that does the same thing (same output): import std.stdio, std.range; void main() { auto data = [1, 2, 3, 4]; foreach (tup; pairwise(data)) writeln(tup); } This coding pattern is very common. It happens when you have a sequence of items, and you want to work (apply a function) on all the pairs, and your diadic function (function with two arguments) doesn't care for its arguments order (symmetric function). Often O(n ** 2) algorithms have to see all the pairs. But often the diadic function is symmetric. So you need to scan only half of the matrix, an upper triangle, or lower one. This is where pairwise is useful for. It's not a necessary range (even zip is not necessary in D), because two loops are enough to replace it, but it helps a bit because it avoids mistakes with the array indexes. I have done similar mistakes more than once in past. pairwise(sequence) is mostly meant to be used with a random-access input sequence. It's not hard to make it work with forward ranges too, but it becomes less efficient. In Python I use a similar generator, defined just like this: from itertools import combinations def pairwise(seq): return combinations(seq, 2) Usage examples: >>> list( pairwise([]) ) [] >>> list(pairwise([1,2])) [(1, 2)] >>> list( pairwise("abc") ) [('a', 'b'), ('a', 'c'), ('b', 'c')] >>> list(pairwise([0, 1, 2, 3])) [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)] >>> list(pairwise(range(5))) [(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)] I expect Phobos to eventually have two very useful lazy ranges that generate permutations and combinations. But pairwise is supposed to be very efficient, here I'd like the compiler to generate code similar to two loops inside each other, like in the first code example. This is probably hard to do if pairwise is implemented with a call to a combinations() function. Why is pairwise just about pairs? Isn't a more general solution better? [20:33] bearophile A general function that takes n ranges and yields all possible unsorted n-tuples is possible, but in practice this is a very uncommon thing in my code. This is why I have limited it to pairs. It is just about pairs because this is the most common case. A potential problem: some users might wrongly think pairwise(sequence) is the same as std.range.chunks(sequence,2). The documentation has to specify they are two quite different things, to avoid some of such mistakes. This is an alternative design, similar to how lockstep works: import std.stdio, std.range; void main() { auto data = [1, 2, 3, 4]; foreach (x1, x2; pairwise(data)) writeln(x1, " ", x2); }

Comment #1 by bearophile_hugs — 2013-02-07T09:33:12Z

See also Issue 7128

Comment #2 by hsteoh — 2013-02-07T09:41:04Z

This is the same as the cartesianProduct of two ranges, which is already checked into git HEAD. *** This issue has been marked as a duplicate of issue 7128 ***

Comment #3 by bearophile_hugs — 2013-02-07T10:03:25Z

(In reply to comment #2) > This is the same as the cartesianProduct of two ranges, which is already > checked into git HEAD. > > *** This issue has been marked as a duplicate of issue 7128 *** See this test code: import std.stdio, std.algorithm; void main() { auto data = [1, 2, 3, 4]; foreach (xy; cartesianProduct(data, data)) writeln(xy); } Tuple!(int, int)(1, 1) Tuple!(int, int)(2, 1) Tuple!(int, int)(3, 1) Tuple!(int, int)(4, 1) Tuple!(int, int)(1, 2) Tuple!(int, int)(2, 2) Tuple!(int, int)(3, 2) Tuple!(int, int)(4, 2) Tuple!(int, int)(1, 3) Tuple!(int, int)(2, 3) Tuple!(int, int)(3, 3) Tuple!(int, int)(4, 3) Tuple!(int, int)(1, 4) Tuple!(int, int)(2, 4) Tuple!(int, int)(3, 4) Tuple!(int, int)(4, 4) import std.stdio, std.range; void main() { auto data = [1, 2, 3, 4]; foreach (tup; pairwise(data)) writeln(tup); } Should print: Tuple!(int,int)(1, 2) Tuple!(int,int)(1, 3) Tuple!(int,int)(1, 4) Tuple!(int,int)(2, 3) Tuple!(int,int)(2, 4) Tuple!(int,int)(3, 4)

Comment #4 by bearophile_hugs — 2013-03-12T12:04:54Z

Basic version for a random access range: import std.range: ForeachType, isRandomAccessRange, hasLength; import std.traits: Unqual, isNarrowString; import std.typecons: Tuple; struct Pairwise(Range) { alias R = Unqual!Range; alias Pair = Tuple!(ForeachType!R, "a", ForeachType!R, "b"); R _input; size_t i, j; this(Range r_) { this._input = r_; j = 1; } @property bool empty() { return j >= _input.length; } @property Pair front() { return Pair(_input[i], _input[j]); } void popFront() { if (j >= _input.length - 1) { i++; j = i + 1; } else { j++; } } } Pairwise!Range pairwise(Range)(Range r) if (isRandomAccessRange!Range && hasLength!Range && !isNarrowString!Range) { return typeof(return)(r); } import std.stdio: writeln; void main() { (new int[0]).pairwise.writeln; [10].pairwise.writeln; [10, 20].pairwise.writeln; [10, 20, 30].pairwise.writeln; [10, 20, 30, 40].pairwise.writeln; } Once combinations() is implemented that code can be replaced with something like: auto pairwise(Range)(Range r) if (...) { alias R = Unqual!Range; alias Pair = Tuple!(ForeachType!R, "a", ForeachType!R, "b"); return combinations(r, 2).map(t => Pair(t.tupleof)); }

Comment #5 by hsteoh — 2014-12-05T21:14:19Z

Hmm. Sounds like what you want is n-element subsets, rather than tuples specifically. I'll see if I can find any good algorithms for n-subset enumeration...

Comment #6 by bearophile_hugs — 2014-12-05T23:39:03Z

(In reply to hsteoh from comment #5) > Hmm. Sounds like what you want is n-element subsets, rather than tuples > specifically. I'll see if I can find any good algorithms for n-subset > enumeration... Isn't the code in comment #4 good enough? (A pairwise range needs to be very efficient, so people can use it instead of two nested loops in more cases).

Comment #7 by hsteoh — 2014-12-06T18:34:26Z

If order doesn't matter, then a series of nested loops should do what you want. However, we can't actually use nested loops for a generic algorithm, since (1) the nesting level depends on how many elements you want in the subset, and (2) we have to lazy-fy the loops to conform to the range API anyway. So I've to think of some equivalent way of enumerating these subsets... I'm thinking that a forward range should be sufficient for such an enumeration. So if we have a stack of .save'd ranges, that should do the trick. First, to generate k-element subsets, we prime the stack with .save'd copies of the range up to the k'th element, then .front simply walks the stack and retrieves .front from each item on the stack. In popFront(), we simply advance the range at the top of the stack. If it's empty, we pop it off, and advance the previous range on the stack, then fill up the stack again with the subsequent element. Each time, if advancing a range makes it empty, we pop it off and advance the previous range on the stack instead, then fill up the stack again. Yes, I think this will work... PR time! :-)

Comment #8 by bearophile_hugs — 2014-12-06T19:00:52Z

(In reply to hsteoh from comment #7) > If order doesn't matter, The order matters, this: foreach (tup; iota(1, 5).pairwise) tup.writeln; Should print only in this order: Tuple!(int,int)(1, 2) Tuple!(int,int)(1, 3) Tuple!(int,int)(1, 4) Tuple!(int,int)(2, 3) Tuple!(int,int)(2, 4) Tuple!(int,int)(3, 4) > However, we can't actually use nested loops for a generic algorithm, > since (1) the nesting level depends on how many elements you want in the > subset, and (2) we have to lazy-fy the loops to conform to the range API > anyway. The function name "pairwise" refers to "pairs", because 95% of the times I only need two indexes. And code like "[1, 2, 3, 4].pairwise" doesn't contain a "2" to specify two indexes. So if you want to reduce your implementation work, you can just use the version that yields pairs, like in #Comment 4. But if you want to also implement the generation of 3-tuples or more, then I guess you can use a syntax like: iota(1, 5).pairwise!3 But now the name "pairwise" sounds like a falsity. > I'm thinking that a forward range should be sufficient for such an > enumeration. We can make the range more powerful later, if necessary. > So if we have a stack of .save'd ranges, that should do the > trick. First, to generate k-element subsets, we prime the stack with .save'd > copies of the range up to the k'th element, then .front simply walks the > stack and retrieves .front from each item on the stack. In popFront(), we > simply advance the range at the top of the stack. If it's empty, we pop it > off, and advance the previous range on the stack, then fill up the stack > again with the subsequent element. Each time, if advancing a range makes it > empty, we pop it off and advance the previous range on the stack instead, > then fill up the stack again. Yes, I think this will work... PR time! :-) I suggest to add a compile-time optimization for the most general case (n=2) to make it fast, because it's the most common and it should be very fast.

Comment #9 by bearophile_hugs — 2014-12-06T20:18:57Z

This shows a simple use case of pairwise, to compute the pairs of closest points of an array 2D points. The function closestPair1 uses just a pair of loops. closestPair2 uses cartesianProduct of all index pairs plus a filter to avoid comparing already seen pairs (because a distance is comutative), the code is slow, it looks bad and it's noisy. closestPair3 uses pairwise, and it's much nicer and simpler to write. closestPair4 is similar, but it uses the optional "key" function argument (as in the Python min/max functions) to further simplify the code. I'd really like min/max of Phobos to be extended to support such optional key function. Currently some /*in*/ are commented out to avoid troubles with immutable seeds of reduce. import std.algorithm, std.traits, std.typecons, std.range; struct Pairwise(Range) { alias R = Unqual!Range; alias Pair = Tuple!(ForeachType!R, "a", ForeachType!R, "b"); R _input; size_t i, j; this(Range r_) { this._input = r_; j = 1; } @property bool empty() { return j >= _input.length; } @property Pair front() { return Pair(_input[i], _input[j]); } void popFront() { if (j >= _input.length - 1) { i++; j = i + 1; } else { j++; } } } Pairwise!Range pairwise(Range)(Range r) if (isRandomAccessRange!Range && hasLength!Range && !isNarrowString!Range) { return typeof(return)(r); } //----------------------- struct V2 { double x, y; } double distance(in V2 p1, in V2 p2) pure nothrow @nogc @safe { return (p1.x - p2.x) ^^ 2 + (p1.y - p2.y) ^^ 2; } auto closestPair1(in V2[] points) pure nothrow @nogc @safe { auto minD = Unqual!(typeof(V2.x)).infinity; V2 minP1, minP2; foreach (immutable i, const p1; points.dropBackOne) foreach (const p2; points[i + 1 .. $]) { immutable d = distance(p1, p2); if (d < minD) { minD = d; minP1 = p1; minP2 = p2; } } return tuple(minP1, minP2); } auto closestPair2(/*in*/ V2[] points) pure nothrow @nogc @safe { auto pairs = cartesianProduct(points.length.iota, points.length.iota) .filter!(ij => ij[1] > ij[0]); immutable ij = reduce!((ij1, ij2) => distance(points[ij1[0]], points[ij1[1]]) < distance(points[ij2[0]], points[ij2[1]]) ? ij1 : ij2) (pairs.front, pairs.dropOne); return tuple(points[ij[0]], points[ij[1]]); } auto closestPair3(/*in*/ V2[] points) pure @safe { return points .pairwise .reduce!((t1, t2) => t1[].distance < t2[].distance ? t1 : t2); } template min(alias F) { T min(T)(T x, T y) { return F(x) < F(y) ? x : y; } } auto closestPair4(/*in*/ V2[] points) pure @safe { return points.pairwise.reduce!(min!(t => t[].distance)); } void main() { import std.stdio, std.random; auto points = new V2[10]; foreach (ref p; points) p = V2(uniform01, uniform01); points.writeln; points.closestPair1.writeln; points.closestPair2.writeln; points.closestPair3.writeln; points.closestPair4.writeln; }

Comment #10 by bearophile_hugs — 2015-01-05T12:23:37Z

Elsewhere I have suggested to add to cartesianProduct an optional template argument that specifies the "repetitions", just like the Python version of product: https://docs.python.org/2/library/itertools.html#itertools.product Python code: product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111 Equivalently for the D code (the "repeat" argument "3" needs to be a template argument because cartesianProduct yields tuples, that must have their structure known at compile-time): cartesianProduct!3([0, 1]) => 000 001 010 011 100 101 110 111 Now I've seen the same optional repeat template argument is useful for pairwise too, because if you are inside a UFCS chain and you want to perform a pairwise on the sequence items, you don't need to break the chain: auto seq = 10.iota .map!(...) .filter!(...); auto result = pairwise(seq, seq) .filter!(...); Becomes a single nicer chain: auto result = 10.iota .map!(...) .filter!(...) .pairwise!2 .filter!(...); Such use case "pairwise(seq,seq)" with repeated argument is probably common.

Comment #11 by bearophile_hugs — 2015-01-05T13:03:20Z

(In reply to bearophile_hugs from comment #10) > .pairwise!2 pairwise takes only one argument sequence, so that comment was bad. The template argument is used to denote how many items you want in your "pairs" (where 2 is the default). So: [1, 2, 3, 4].pairwise Or: [1, 2, 3, 4].pairwise!2 should yield: (1, 2) (1, 3) (1, 4) (2, 3) (2, 4) (3, 4) [1, 2, 3, 4].pairwise!3 should yield: (1, 2, 3) (1, 2, 4) (1, 3, 4) (2, 3, 4) Like in this lower level code: void main() { import std.stdio, std.range, std.algorithm; auto a = [1, 2, 3, 4]; foreach (immutable i; 0 .. a.length) foreach (immutable j; i + 1 .. a.length) writefln("(%d, %d)", a[i], a[j]); writeln; foreach (immutable i; 0 .. a.length) foreach (immutable j; i + 1 .. a.length) foreach (immutable k; j + 1 .. a.length) writefln("(%d, %d, %d)", a[i], a[j], a[k]); }

Comment #12 by greeenify — 2016-10-15T11:50:10Z

@andralex: I think there is also a decision-blocked PR pending in the Phobos queue: https://github.com/dlang/phobos/pull/4027

Comment #13 by andrei — 2017-11-07T17:11:36Z

I see in https://docs.python.org/2/library/itertools.html that the recipe given for pairwise would produce the pairs (1, 2), (2, 3), and (3, 4) starting from [1, 2, 3, 4]. So defining it with different semantics is potentially confusing. The functionality proposed is really "combinations of 2 elements in the given range". In general we'd be looking at combinations of k elements of the given range, yielding a range of k-tuples. See https://en.wikipedia.org/wiki/Combination Fittingly enough, Python offers https://docs.python.org/2/library/itertools.html#itertools.combinations and https://docs.python.org/2/library/itertools.html#itertools.combinations_with_replacement I think ws should define a function as follows: auto combinations(uint k, Flag!"Duplicates" f = Duplicates.no, R)(R range); The function returns a range of k-tuples containing combinations of elements in the range. Depending on the flag, duplicates are present or not.

Comment #14 by greensunny12 — 2018-02-09T11:54:47Z

> I see in https://docs.python.org/2/library/itertools.html that the recipe given for pairwise would produce the pairs (1, 2), (2, 3), and (3, 4) starting from [1, 2, 3, 4]. So defining it with different semantics is potentially confusing. Yes, especially now that we have pairwise as slides. > I think ws should define a function as follows: > auto combinations(uint k, Flag!"Duplicates" f = Duplicates.no, R)(R range); > The function returns a range of k-tuples containing combinations of elements in the range. Depending on the flag, duplicates are present or not. +1 (renamed accordingly) @ students - see also: http://docs.mir.dlang.io/latest/mir_combinatorics.html (this is my module, it can be ported)

Comment #15 by greensunny12 — 2018-02-09T12:07:02Z

*** Issue 13039 has been marked as a duplicate of this issue. ***

Comment #16 by josipp — 2022-04-03T19:01:52Z

Obviated by the introduction of `std.algorithm.setops.cartesianProduct` into the standard library. *** This issue has been marked as a duplicate of issue 7128 ***