Bug 15553 – topN very inefficient [slower than sort, even for topN(0)] but should be O(n)

Status
RESOLVED
Resolution
FIXED
Severity
blocker
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
x86
OS
All
Creation time
2016-01-11T12:21:00Z
Last change time
2016-03-19T20:22:23Z
Assigned to
andrei
Creator
timothee.cour2

Comments

Comment #0 by timothee.cour2 — 2016-01-11T12:21:08Z
It's a serious bug because people are either generating slow code with topN or are using sort instead of topN (as in http://jackstouffer.com/blog/nd_slice.html) ---- module tests.bench_topn; /+ testing: test_sort, test_topn, test_topn_zeroth, test_max $ldc_X -O3 -inline -release -boundscheck=off -mcpu=native -L=-L$dmd_build_D -I=$phobos_D -of=/tmp/benchmark_ndslice $code/tests/bench_topn.d /tmp/benchmark_ndslice [640, 1248, 731, 146] $dmd_069_X -O -inline -release -noboundscheck $code/tests/bench_topn.d -of/tmp/benchmark_ndslice /benchmark_ndslice [746, 2357, 1440, 281] => topN slower than sort, and even topN(0) is slower than sort (but should be same speed as max) +/ import std.algorithm; import std.random; void test_sort(T)(T a) { a.sort(); } void test_topn(T)(T a) { a.topN(a.length / 2); } // should be roughly as fast as test_max void test_topn_zeroth(T)(T a) { a.topN(0); } void test_max(T)(T a) { static int counter; counter = a.reduce!max; } void fun() { alias T = ubyte; int n = 100; auto a = new T[n]; foreach (ref ai; a) ai = cast(T)(uniform(0, T.max)); auto iter = 1000000; import std.datetime; import std.stdio; import std.conv : to; import std.array; iter.benchmark!({ test_sort(a.dup); }, { test_topn(a.dup); }, { test_topn_zeroth(a.dup); }, { test_max(a.dup); })[].map!(a => a.to!Duration.total!`msecs`).array.writeln; } void main() { fun; } ----
Comment #1 by timothee.cour2 — 2016-01-11T12:29:09Z
haven't profiled, but: could uniform be slow https://github.com/D-Programming-Language/phobos/commit/7291d2e47d4a7fb17565a7d7cd04ba8f893c1a27 not sure if those could help: #5514 Maybe this is most relevant: [std.algorithm: implement deterministic topN] https://issues.dlang.org/show_bug.cgi?id=12141 At the very least, it shouldn't be slower than sort
Comment #2 by andrei — 2016-01-11T16:35:54Z
Comment #3 by andrei — 2016-01-11T16:36:34Z
BTW uniform() isn't the problem; replacing it with r.length / 2 keeps things slow.
Comment #4 by github-bugzilla — 2016-01-13T18:07:15Z
Commit pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/10da3381c1d8d92ba8f0ef96e157beb1ada83f49 Merge pull request #3921 from andralex/15553 Fix Issue 15553 - topN very inefficient [slower than sort, even for topN(0)] but should be O(n)
Comment #5 by gassa — 2016-01-18T20:48:32Z
(In reply to Andrei Alexandrescu from comment #3) > BTW uniform() isn't the problem; replacing it with r.length / 2 keeps things > slow. Can uniform() call be returned, then? Otherwise, topN can now show quadratic performance on a pre-generated input: https://issues.dlang.org/show_bug.cgi?id=15583
Comment #6 by github-bugzilla — 2016-03-19T20:22:23Z