DMD 2.046.
std.array.insert seemed slow when I tried it, so I made a simple benchmark comparing it to a simple alternative using memmove.
When compiled with -O -release -inline, the lowest numbers I got where 127 ms for myInsert and 254 ms for std.array.insert. That's a 100% speedup just from using the most obvious implementation.
---
import std.array;
import std.perf;
import std.stdio;
import std.c.string;
enum COUNT = 100_000;
void myInsert(T)(ref T[] arr, size_t where, T value)
{
assert(where <= arr.length);
size_t oldLen = arr.length;
arr.length = arr.length + 1;
T* ptr = arr.ptr + where;
memmove(ptr + 1, ptr, (oldLen - where) * T.sizeof);
arr[where] = value;
}
void bench(alias insert)(ref int[] arr)
{
auto pc = new PerformanceCounter;
pc.start();
foreach (_; 0..10) {
arr.length = 0;
foreach (i; 0..COUNT)
insert(arr, i, i);
}
pc.stop();
writeln(pc.milliseconds);
}
void main()
{
int[] arr = new int[COUNT];
bench!(myInsert)(arr);
bench!(std.array.insert)(arr);
}
---
Comment #1 by torhu — 2012-08-01T15:56:03Z
I just revisited this. I tried with with DMD 2.046 again, just to be sure I'm doing the same benchmark, and I got:
myInsert(T) 130
insert(T,Range) 240
Basically the same numbers. Then I tried DMD 2.059 and got this:
myInsert(T) 123
insert(T,Range) if (isInputRange!(Range) && is(ElementEncodingType!(Range) : T))
78716
Wow. About 700 times slower. Not a very realistic use case, though.
So I tried inserting in the middle instead (i/2 instead if i):
myInsert(T) 19694
insert(T,Range) if (isInputRange!(Range) && is(ElementEncodingType!(Range) : T))
210606
Well, that's only about 20 times slower.
I don't what's going on here, but this is pretty much in line with my general impression of Phobos 2. There should be a big ALPHA sticker on the whole thing.
Signed,
Disgruntled D1/Tango user
Comment #2 by andrei — 2012-08-01T16:46:38Z
Brought the code to date with:
----
import std.array;
import std.datetime;
import std.stdio;
import std.c.string;
enum COUNT = 100_000;
void myInsert(T)(ref T[] arr, size_t where, T value)
{
assert(where <= arr.length);
size_t oldLen = arr.length;
arr.length = arr.length + 1;
T* ptr = arr.ptr + where;
memmove(ptr + 1, ptr, (oldLen - where) * T.sizeof);
arr[where] = value;
}
void bench(alias insert)(ref int[] arr)
{
auto sw = StopWatch(AutoStart.yes);
foreach (_; 0..10) {
arr.length = 0;
foreach (i; 0..COUNT)
insert(arr, i, i);
}
writeln(sw.peek.msecs);
}
void main()
{
int[] arr = new int[COUNT];
bench!(myInsert)(arr);
bench!(std.array.insertInPlace)(arr);
}
----
I was able to measure a notable the performance mismatch. The culprit is:
----
void insertInPlace(T, U...)(ref T[] array, size_t pos, U stuff)
if(!isSomeString!(T[]) && allSatisfy!(isInputRangeOrConvertible!T, U))
{
T[staticConvertible!(T, U)] stackSpace = void;
auto range = chain(makeRangeTuple(stackSpace[], stuff).expand);
insertInPlaceImpl(array, pos, range);
}
----
I replaced that with:
----
void insertInPlace(T, U...)(ref T[] array, size_t pos, U stuff)
if(!isSomeString!(T[]) && allSatisfy!(isInputRangeOrConvertible!T, U))
{
immutable oldLen = array.length;
array.length += stuff.length;
auto ptr = array.ptr + pos;
import core.stdc.string;
memmove(ptr + stuff.length, ptr, (oldLen - pos) * T.sizeof);
ptr = array.ptr + pos;
foreach (i, Unused; U)
{
emplace(ptr + i, stuff[i]);
}
}
----
and was able to measure similar performance.
Clearly the code once approved belongs to the entire team, not only to its author, and I should be a better man than this, but I can't stop noticing the irony of this given http://forum.dlang.org/thread/[email protected]
I'll make a pull request soon.
Comment #3 by dmitry.olsh — 2012-09-28T11:27:53Z
Actually the culprit is the final step and not the simmingly complicated packing of parameters into a range. I'm talking of this one:
private void insertInPlaceImpl(T, Range)(ref T[] array, size_t pos, Range stuff)
if(isInputRange!Range &&
(is(ElementType!Range : T) ||
isSomeString!(T[]) && is(ElementType!Range : dchar)))
{
auto app = appender!(T[])();
app.put(array[0 .. pos]);
app.put(stuff);
app.put(array[pos .. $]);
array = app.data;
}
I'm actually surprized that there is no other specialization but surely there was was one. Basically the thing above allocates an appender and does up to 3 resizes (1 per each put).
To taste waters I tried this:
static if(hasLength!Range)
{
import core.stdc.string;
immutable to_insert = stuff.length;
immutable len = array.length;
T* ptr = array.ptr+pos;
array.length += to_insert;
memmove(ptr+to_insert, ptr, (len-pos)*T.sizeof);
//TODO: need to blit T.init over vacant places if T.__dtor exists?
copy(stuff, array[pos..pos+to_insert]);
}
else
{// Generic implementation
auto app = appender!(T[])();
app.put(array[0 .. pos]);
app.put(stuff);
app.put(array[pos .. $]);
array = app.data;
}
And for inserting at front the numbers were
23387
23599
For inserting at i-th (last) index I got:
58
92
These last ones were not very stable but indicate that the this packing into a range also adds up.
I'll get myself busy with pull request since I was adding this packing thingy.
Still no idea when and how the underlying insertInPlaceImpl become 'create a copy and return'. Sorry wasn't on my watch :)
Back when I last touched it, it looked like this:
void insertInPlaceImpl(T, Range)(ref T[] array, size_t pos, Range stuff)
static if(hasLength!Range &&
is(ElementEncodingType!Range : T) &&
!is(T == const T) &&
!is(T == immutable T))
{
immutable
delta = stuff.length,
oldLength = array.length,
newLength = oldLength + delta;
// Reallocate the array to make space for new content
array = (cast(T*) core.memory.GC.realloc(array.ptr,
newLength * array[0].sizeof))[0 .. newLength];
assert(array.length == newLength);
// Move data in pos .. pos + stuff.length to the end of the array
foreach_reverse (i; pos .. oldLength)
{
// This will be guaranteed to not throw
move(array[i], array[i + delta]);
}
// Copy stuff into array
copy(stuff, array[pos .. pos + stuff.length]);
}
else
{
auto app = appender!(T[])();
app.put(array[0 .. pos]);
app.put(stuff);
app.put(array[pos .. $]);
array = app.data;
}
(In reply to comment #6)
> (In reply to comment #4)
> > Pending a bugfix in compiler:
> > https://github.com/D-Programming-Language/phobos/pull/858
>
> Phobos pulls are not part of the compiler.
Sorry, my comment must have been poorly phrased. What I meant was that there is a pull and it's waiting on a fix for the compiler on 64 bits.
Either way I worked it around just a moment before you fixed the bug :)
Now I going to revert the workaround in this pull and so that it's hopefully merged in time for 2.061.
Comment #8 by github-bugzilla — 2012-12-13T08:53:24Z