← Back to index | Original Bugzilla link

Bug 4489 – std.array.insert is slow

Status: RESOLVED
Resolution: FIXED
Severity: regression
Priority: P2
Component: phobos
Product: D
Version: D2
Platform: Other
OS: Windows
Creation time: 2010-07-20T10:14:00Z
Last change time: 2012-12-13T09:23:17Z
Keywords: performance
Assigned to: andrei
Creator: torhu

Comments

Comment #0 by torhu — 2010-07-20T10:14:18Z

DMD 2.046. std.array.insert seemed slow when I tried it, so I made a simple benchmark comparing it to a simple alternative using memmove. When compiled with -O -release -inline, the lowest numbers I got where 127 ms for myInsert and 254 ms for std.array.insert. That's a 100% speedup just from using the most obvious implementation. --- import std.array; import std.perf; import std.stdio; import std.c.string; enum COUNT = 100_000; void myInsert(T)(ref T[] arr, size_t where, T value) { assert(where <= arr.length); size_t oldLen = arr.length; arr.length = arr.length + 1; T* ptr = arr.ptr + where; memmove(ptr + 1, ptr, (oldLen - where) * T.sizeof); arr[where] = value; } void bench(alias insert)(ref int[] arr) { auto pc = new PerformanceCounter; pc.start(); foreach (_; 0..10) { arr.length = 0; foreach (i; 0..COUNT) insert(arr, i, i); } pc.stop(); writeln(pc.milliseconds); } void main() { int[] arr = new int[COUNT]; bench!(myInsert)(arr); bench!(std.array.insert)(arr); } ---

Comment #1 by torhu — 2012-08-01T15:56:03Z

I just revisited this. I tried with with DMD 2.046 again, just to be sure I'm doing the same benchmark, and I got: myInsert(T) 130 insert(T,Range) 240 Basically the same numbers. Then I tried DMD 2.059 and got this: myInsert(T) 123 insert(T,Range) if (isInputRange!(Range) && is(ElementEncodingType!(Range) : T)) 78716 Wow. About 700 times slower. Not a very realistic use case, though. So I tried inserting in the middle instead (i/2 instead if i): myInsert(T) 19694 insert(T,Range) if (isInputRange!(Range) && is(ElementEncodingType!(Range) : T)) 210606 Well, that's only about 20 times slower. I don't what's going on here, but this is pretty much in line with my general impression of Phobos 2. There should be a big ALPHA sticker on the whole thing. Signed, Disgruntled D1/Tango user

Comment #2 by andrei — 2012-08-01T16:46:38Z

Brought the code to date with: ---- import std.array; import std.datetime; import std.stdio; import std.c.string; enum COUNT = 100_000; void myInsert(T)(ref T[] arr, size_t where, T value) { assert(where <= arr.length); size_t oldLen = arr.length; arr.length = arr.length + 1; T* ptr = arr.ptr + where; memmove(ptr + 1, ptr, (oldLen - where) * T.sizeof); arr[where] = value; } void bench(alias insert)(ref int[] arr) { auto sw = StopWatch(AutoStart.yes); foreach (_; 0..10) { arr.length = 0; foreach (i; 0..COUNT) insert(arr, i, i); } writeln(sw.peek.msecs); } void main() { int[] arr = new int[COUNT]; bench!(myInsert)(arr); bench!(std.array.insertInPlace)(arr); } ---- I was able to measure a notable the performance mismatch. The culprit is: ---- void insertInPlace(T, U...)(ref T[] array, size_t pos, U stuff) if(!isSomeString!(T[]) && allSatisfy!(isInputRangeOrConvertible!T, U)) { T[staticConvertible!(T, U)] stackSpace = void; auto range = chain(makeRangeTuple(stackSpace[], stuff).expand); insertInPlaceImpl(array, pos, range); } ---- I replaced that with: ---- void insertInPlace(T, U...)(ref T[] array, size_t pos, U stuff) if(!isSomeString!(T[]) && allSatisfy!(isInputRangeOrConvertible!T, U)) { immutable oldLen = array.length; array.length += stuff.length; auto ptr = array.ptr + pos; import core.stdc.string; memmove(ptr + stuff.length, ptr, (oldLen - pos) * T.sizeof); ptr = array.ptr + pos; foreach (i, Unused; U) { emplace(ptr + i, stuff[i]); } } ---- and was able to measure similar performance. Clearly the code once approved belongs to the entire team, not only to its author, and I should be a better man than this, but I can't stop noticing the irony of this given http://forum.dlang.org/thread/[email protected] I'll make a pull request soon.

Comment #3 by dmitry.olsh — 2012-09-28T11:27:53Z

Actually the culprit is the final step and not the simmingly complicated packing of parameters into a range. I'm talking of this one: private void insertInPlaceImpl(T, Range)(ref T[] array, size_t pos, Range stuff) if(isInputRange!Range && (is(ElementType!Range : T) || isSomeString!(T[]) && is(ElementType!Range : dchar))) { auto app = appender!(T[])(); app.put(array[0 .. pos]); app.put(stuff); app.put(array[pos .. $]); array = app.data; } I'm actually surprized that there is no other specialization but surely there was was one. Basically the thing above allocates an appender and does up to 3 resizes (1 per each put). To taste waters I tried this: static if(hasLength!Range) { import core.stdc.string; immutable to_insert = stuff.length; immutable len = array.length; T* ptr = array.ptr+pos; array.length += to_insert; memmove(ptr+to_insert, ptr, (len-pos)*T.sizeof); //TODO: need to blit T.init over vacant places if T.__dtor exists? copy(stuff, array[pos..pos+to_insert]); } else {// Generic implementation auto app = appender!(T[])(); app.put(array[0 .. pos]); app.put(stuff); app.put(array[pos .. $]); array = app.data; } And for inserting at front the numbers were 23387 23599 For inserting at i-th (last) index I got: 58 92 These last ones were not very stable but indicate that the this packing into a range also adds up. I'll get myself busy with pull request since I was adding this packing thingy. Still no idea when and how the underlying insertInPlaceImpl become 'create a copy and return'. Sorry wasn't on my watch :) Back when I last touched it, it looked like this: void insertInPlaceImpl(T, Range)(ref T[] array, size_t pos, Range stuff) static if(hasLength!Range && is(ElementEncodingType!Range : T) && !is(T == const T) && !is(T == immutable T)) { immutable delta = stuff.length, oldLength = array.length, newLength = oldLength + delta; // Reallocate the array to make space for new content array = (cast(T*) core.memory.GC.realloc(array.ptr, newLength * array[0].sizeof))[0 .. newLength]; assert(array.length == newLength); // Move data in pos .. pos + stuff.length to the end of the array foreach_reverse (i; pos .. oldLength) { // This will be guaranteed to not throw move(array[i], array[i + delta]); } // Copy stuff into array copy(stuff, array[pos .. pos + stuff.length]); } else { auto app = appender!(T[])(); app.put(array[0 .. pos]); app.put(stuff); app.put(array[pos .. $]); array = app.data; }

Comment #4 by dmitry.olsh — 2012-10-15T13:33:58Z

Pending a bugfix in compiler: https://github.com/D-Programming-Language/phobos/pull/858

Comment #5 by dmitry.olsh — 2012-12-10T11:14:03Z

Given the magnitude of slowdown I belive it worths the title of regression.

Comment #6 by bugzilla — 2012-12-10T22:24:00Z

(In reply to comment #4) > Pending a bugfix in compiler: > https://github.com/D-Programming-Language/phobos/pull/858 Phobos pulls are not part of the compiler.

Comment #7 by dmitry.olsh — 2012-12-11T00:19:55Z

(In reply to comment #6) > (In reply to comment #4) > > Pending a bugfix in compiler: > > https://github.com/D-Programming-Language/phobos/pull/858 > > Phobos pulls are not part of the compiler. Sorry, my comment must have been poorly phrased. What I meant was that there is a pull and it's waiting on a fix for the compiler on 64 bits. Either way I worked it around just a moment before you fixed the bug :) Now I going to revert the workaround in this pull and so that it's hopefully merged in time for 2.061.

Comment #8 by github-bugzilla — 2012-12-13T08:53:24Z

Commit pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/14e457b37d0a870a8ed7dd901e10679456edc6b3 fix issue 4489 std.array.insert is slow

Comment #9 by dmitry.olsh — 2012-12-13T09:23:17Z

(In reply to comment #8) > Commit pushed to master at https://github.com/D-Programming-Language/phobos > > https://github.com/D-Programming-Language/phobos/commit/14e457b37d0a870a8ed7dd901e10679456edc6b3 > fix issue 4489 std.array.insert is slow With latest master and compile options: -noboundscheck -O -inline -release I see no measurable difference in inserting at front or end as it should have been.