Created attachment 1497
example code
This example code, compiled with -debug, triggers an AssertError.
-----
import std.algorithm, std.conv, std.range, std.stdio;
void main () {
auto arr = [7, 5, 7, 3, 3, 5, 3, 3, 0, 3, 1, 1, 5, 1,
1, 1, 2, 2, 8, 5, 8, 8];
arr.sort !((x, y) => arr.count (x) > arr.count (y) ||
(arr.count (x) == arr.count (y) && x < y))
.map !(to !(string))
.join (" ")
.writeln;
// prints 1 1 1 1 1 3 3 3 3 3 5 5 5 5 8 8 8 2 2 0 7 7
// should 1 1 1 1 1 3 3 3 3 3 5 5 5 5 8 8 8 2 2 7 7 0
}
-----
Reproducible on Win64 with DMD 2.066 or 2.067, -m32 or -m64.
Without -debug, it produces wrong output with -m32 and crashes with -m64.
The D.learn discussion resulting in this example:
http://forum.dlang.org/thread/[email protected]
Comment #1 by gassa — 2015-04-03T10:06:58Z
The culprit is optimisticInsertionSort.
When the range hasAssignableElements, it changes the array too heavily while calling the predicate:
https://github.com/D-Programming-Language/phobos/blob/4abe95ef/std/algorithm/sorting.d#L799-L806
The problem is:
1. The predicate (count) depends on array integrity (order does not matter, contents do) at all times it is called.
2. The library is too optimistic taking an element to a temporary variable and then overwriting the elements, all the way calling the predicate.
I'm unsure what should be done.
On one hand, the user has the right to abstract away from sort implementation.
On the other hand, the speedup for a range which hasAssignableElements may be significant for more trivial cases.
I'd suggest to be on the safe side. First, find the greatest value of j using pred, just like now:
-----
for (; j < maxJ && pred(r[j + 1], temp); ++j) {}
-----
Only after that, perform all swaps:
-----
auto temp = r[i];
for (size_t k = i; k < j; ++k)
{
r[k] = r[k + 1];
}
r[j] = temp;
-----
After all, the last thing one wants is a failing library sort function, no matter how weird its usage may be.
Another solution would be to note in the documentation that, with unstable sort, the predicate must not depend on the range. But most users won't care to read or remember that unless the sort goes wrong.
Ivan Kazmenko.
This is just outright wrong. The sort function was never intended to work in this way nor is there any reasonable expectation for it to do so either. The predicate is expected to define a total order [1] which your predicate does not. To put it in simpler terms, the predicate is expected to return the same output given the same inputs.
From a theoretical perspective, this problem is impossible to solve using a sorting algorithm which runs in polynomial time. In the general case, given a predicate we know nothing about, the only way to find the correct order (or any correct order) is to test all permutations which would require O(n!) time. More importantly, the predicate (really, I should be calling it an invariant) may be impossible to satisfy meaning there is no solution.
On a side note, for ranges with length <= 25, it bypasses quick sort and jumps straight into insertion sort. So even if we fix insertion sort to work with your specific predicate here, given a larger range, it will likely still be broken.
[1] https://en.wikipedia.org/wiki/Total_order
I won't close this issue but I advise marking this as WONTFIX.
Comment #4 by gassa — 2015-07-20T00:26:45Z
Have you followed the previous discussion?
Note that "count" is not just any predicate, but:
(1) a useful one,
(2) one that just requires the array to contain its elements, in any order, whenever the predicate is evaluated,
(3) one that seems to work flawlessly in competitor languages in similar situation.
Regarding (1), please refer to the original discussion. The link again:
http://forum.dlang.org/thread/[email protected]
This is indeed one of the easiest (yet slow) ways to express sorting by the number of occurrences. If it is not intended to work, a way that does work must be easy to see and somehow encouraged over this one.
Comment #5 by gassa — 2015-07-20T00:32:32Z
(In reply to Xinok from comment #3)
> ... The predicate is expected to define a total order [1]
> which your predicate does not.
But it does! The order is "x < y if:
(1) number of xs in the array is greater than number of ys,
(2) if (1) is equal, just use ascending order".
Nothing wrong about that.
There's an even simpler example in the added unittest here involving only count:
https://github.com/GassaFM/phobos/commit/fb6e4ddfcad656b8093aa51cf59726cf38e2cfbe
What's wrong is optimisticInsertionSort overwriting elements while hiding one of them in a temporary variable, and calling the predicate all the way while the array is in an invalid state.
Comment #6 by xinok — 2015-07-20T01:00:37Z
Gah, sorry, I misunderstood the problem. >_< For some reason, I thought your predicate was dependent on the order of the elements. Ignore my last post...
I still feel that this is bad practice and should be discouraged in general. The simpler solution would be to dup' the range and refer to that, in which case you could have a strongly pure predicate instead. However, if you all feel this is worth fixing, then I won't argue anymore.
I'll just add that this is an issue for the stable sort as well but that won't be so easy to fix. The algorithm used is Timsort, which is a variant of merge sort, meaning it allocates a buffer to store elements temporarily. The simplest solution would be to move to an in-place algorithm like block sort [1] but then you'll be losing the benefits of Timsort.
[1] https://en.wikipedia.org/wiki/Block_sort
Comment #7 by gassa — 2015-07-20T01:13:34Z
Hmm. If you are sure Timsort suffers from the same problem, it should be visible in other languages as well. We have to test that in Java (sorting objects) and Python, too.
Comment #8 by xinok — 2015-07-24T15:13:26Z
I can confirm this doesn't work in Python either:
https://ideone.com/oEkJPi
This might be justification to add a "sorted" function to Phobos so we can exhibit similar behavior in D, but I don't think the solution is to "fix" the sorting implementations that we already have.
Comment #9 by andrei — 2015-10-29T14:15:03Z
I'll close this. There is no guarantee we offer that the array being sorted has the same content at all times when the predicate is being used.