← Back to index | Original Bugzilla link

Bug 15583 – [REG] topN without uniform can show quadratic performance

Status: RESOLVED
Resolution: FIXED
Severity: regression
Priority: P1
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2016-01-18T19:59:00Z
Last change time: 2016-11-20T13:44:14Z
Assigned to: nobody
Creator: gassa

Attachments

ID	Filename	Summary	Content-Type	Size
1575	fail_topn.d	fail_topn	text/plain	1326

Comments

Comment #0 by gassa — 2016-01-18T19:59:59Z

After Phobos pull #3921 (https://github.com/D-Programming-Language/phobos/pull/3921), topN is deterministic. As a result, it can now show quadratic performance on certain pre-generated inputs. This violates the stronger complexity guarantee it previously had, when quadratic performance was highly unlikely on a pre-generated input. Here goes the test which shows quadratic behavior for the new version: http://dpaste.dzfl.pl/e4b3bc26c3cf (dpaste kills the slow code before it completes the task) My local timings with "-O -release -noboundscheck -inline": building Element array: 4989 msecs checking Element array: 5018 msecs checking int array: 626 msecs With "-debug -unittest": building Element array: 8384 msecs checking Element array: 8380 msecs checking int array: 2987 msecs If we change the length MAX_N to something observable, say, 50, the array is: [0, 536870911, 2, 536870911, 4, 536870911, 6, 36, 8, 33, 10, 35, 12, 536870911, 14, 32, 16, 34, 18, 536870911, 536870911, 22, 23, 21, 20, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 31, 30, 536870911, 29, 536870911, 28, 536870911, 27, 536870911, 26, 536870911, 25, 536870911, 24, 536870911] From this forum post: http://forum.dlang.org/post/[email protected]

Comment #1 by gassa — 2016-01-18T20:01:30Z

Created attachment 1575 fail_topn Added version of the code which I used to check with Digger that the culprit is Phobos pull #3921.

Comment #2 by gassa — 2016-01-18T20:10:16Z

(copied from forum post: http://forum.dlang.org/post/[email protected] - as it is more relevant here) Perhaps I should include a textual summary as well. The code on DPaste starts by constructing an array of Elements of size MAX_N; in the code, MAX_N is 50_000. After that, we run the victim function on our array. Here, the victim is topN (array, MAX_N / 2); it could be sort (array) or something else. An Element contains, or rather, pretends to contain, an int value. Here is how Element is special: the result of comparison for two Elements is decided on-the-fly. An Element can be either UNDECIDED or have a fixed value. Initially, all elements are UNDECIDED. When we compare two Elements and at least one of them has a fixed value, the comparison is resolved naturally, and UNDECIDED element is greater than any fixed element. When we compare two UNDECIDED elements, the one which participated more in the comparisons so far gets a fixed value: greater than any other value fixed so far, but still less than UNDECIDED. This way, the results of old comparisons are consistent with the new fixed value. Now, what do we achieve by running the victim function? Turns out that the algorithms using the idea of QuickSort or QuickSelect tend to make most comparisons against their current pivot value. Our Element responds to that by fixing the pivot to one of the lowest available values. After that, a partition using such pivot will have only few, O(1), elements before the pivot, and the rest after the pivot. In total, this will lead to quadratic performance. After running the victim function on our array of Elements (which - careful here - already takes quadratic time), we reorder them in their original order (to do that, each Element also stores its original index). Now, we can re-run the algorithm on the array obtained so far. If the victim function is (strongly) pure, it will inevitably make the same comparisons in the same order. The only difference is that their result will already be decided. Alternatively, we can make an int array of the current values in our array of Elements (also in their original order). Running the victim function on the int array must also make the same (quadratic number of) comparisons in the same order. The inspiration is the paper "A Killer Adversary for Quicksort": http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

Comment #3 by andrei — 2016-01-19T15:31:34Z

https://github.com/andralex/phobos/commit/4ba95cd1bd5124b53324c441e62c51d759481b04 which is part of https://github.com/D-Programming-Language/phobos/pull/3934 should fix this.

Comment #4 by bugzilla — 2016-04-15T10:44:44Z

(In reply to Andrei Alexandrescu from comment #3) > https://github.com/andralex/phobos/commit/ > 4ba95cd1bd5124b53324c441e62c51d759481b04 which is part of > https://github.com/D-Programming-Language/phobos/pull/3934 should fix this. That PR was closed :-(

Comment #5 by andrei — 2016-09-24T23:11:08Z

https://github.com/dlang/phobos/pull/4815 should fix this