The following is an excerpt from the newsgroup thread that's in the URL field.
> Steven Schveighoffer wrote:
> > newCapacity isn't called when setting the length explicitly, because
> > it's not considered to be an append.
>
> You are right.
>
> Looks like I was successful in locating the culprit. :) I took liberty to write ++a.length and also outputted the number of elements that are in reserve at each allocation:
>
> import std.stdio;
> import std.array;
>
> void main()
> {
> int[] a;
> size_t oldCapacity;
>
> foreach (i; 0 .. 100_000_000) {
> ++a.length;
> a[$-1] = i;
>
> if (capacity(a) != oldCapacity) {
> writeln("length: ", a.length,
> " capacity: ", capacity(a),
> " ratio: ", cast(double)capacity(a) / oldCapacity,
> " reserve: ", capacity(a) - a.length);
> oldCapacity = capacity(a);
> }
> }
> }
>
> Now the spikes are gone, and the allocations asymptote at once-per-1024 elements for an int array:
>
> length: 1 capacity: 3 ratio: inf reserve: 2
> length: 4 capacity: 7 ratio: 2.33333 reserve: 3
> length: 8 capacity: 15 ratio: 2.14286 reserve: 7
> length: 16 capacity: 31 ratio: 2.06667 reserve: 15
> length: 32 capacity: 63 ratio: 2.03226 reserve: 31
> length: 64 capacity: 127 ratio: 2.01587 reserve: 63
> length: 128 capacity: 255 ratio: 2.00787 reserve: 127
> length: 256 capacity: 509 ratio: 1.99608 reserve: 253
> length: 512 capacity: 1021 ratio: 2.00589 reserve: 509
> length: 1022 capacity: 2045 ratio: 2.00294 reserve: 1023
> length: 2046 capacity: 3069 ratio: 1.50073 reserve: 1023
> length: 3070 capacity: 4093 ratio: 1.33366 reserve: 1023
> ...
>
> In the long run, there is one allocation per 1024 elements. That still feels like amortized constant, but considering that relocation times would be O(N), I think reserve should still grow with N, but maybe not too much. (I am not sure about this last point at all.)
>
4x1024 = 4096 == page size.
The newCapacity function is supposed to reduce the effect of this phenomenon. i.e., when appending, it's assumed you are going to continue appending, so to reduce the overhead, the amount added to the memory block is supposed to be related to the total number of bytes already allocated. It doesn't seem like the function is working very well... It could have been something I did, or it could have always been broken :) I honestly did not examine the code at all, I just only ever read the comments, assuming it works.
Please add these comments to the bug report, it will help when I look at it. I don't have a lot of time to look at it right now, so I'm afraid I'll forget what we were doing later :)
-Steve
Comment #4 by bearophile_hugs — 2010-07-02T09:31:02Z
This is a simplified version of that code:
import std.stdio: writeln;
void main() {
int[] array;
size_t oldCapacity;
foreach (i; 0 .. 20_000) {
array ~= i;
size_t newCapacity = array.capacity();
if (newCapacity != oldCapacity) {
writeln(cast(double)newCapacity / oldCapacity);
oldCapacity = newCapacity;
}
}
}
It prints:
inf
2.33333
2.14286
2.06667
2.03226
2.01587
2.00787
1.99608
2.00589
2.00294
1.50073
1.33366
1.25018
1.20012
1.16675
1.14292
1.12505
1.11115
1.10003
1.09093
1.08335
1.07694
1.07144
1.06668
1.06251
1.05883
1.05556
1.05264
This output shows that there is a bug: to allow an O(1) amortized append the size of the array has to grow geometrically.
So I suggest to use a geometric growth factor of 2 until the memory block is "large enough", something like 200MB or 300MB. And after such absolute memory limit then use a smaller geometric growth factor, like 1.3, to avoid wasting too much memory.
Comment #5 by bearophile_hugs — 2010-07-02T09:48:24Z
This is a modified program, after a suggestion by Andrei:
import std.stdio: writeln;
void main() {
int[] array;
int* old_ptr;
foreach (i; 0 .. 100_000_000) {
array ~= i;
int* new_ptr = array.ptr;
if (array.ptr != old_ptr) {
old_ptr = array.ptr;
writeln(cast(double)array.capacity() / array.length);
}
}
}
It outputs:
3
1.75
1.875
1.9375
1.96875
1.98438
1.99219
1.98828
1.99414
3.00001
2.7305
2.60017
2.48002
2.37
It shows that when relocation happens the capacity() grows significantly. But that grow factor looks quite large.
Comment #6 by schveiguy — 2010-07-02T10:30:19Z
OK, here's the deal:
When the current block can be extended into the next page, it is. This does not take into account any scaling factors. That is, if you ask for one more byte, and a page can be lumped onto the end, then only 1 page is added. This explains the behavior most of the time.
If a page cannot be added, then it uses the newCapacity function. However, I think this function may return too high a value.
Essentially, the function looks like this:
100 + (1000L * size) / log2plus1(newcap);
Where size is the size of a single element in bytes, newcap is the size of the *total* memory being requested (old size + appended size) in bytes, and log2plus1 essentially gets 1 + highest bit set. The result is a percentage of the total size requested to use.
Where I think the problem is using size in this equation. I don't really understand why size has to be taken into account since newcap already takes size into account.
What happens is, let's say you are doing an array of bytes, and you are requesting 100 bytes. size is 1, newcap is 100. log2plus1(100) is 7. So the resulting formula looks like:
100 + (1000L * 1) / 7 => 242, meaning 242% of the original size requested.
Now, if we use the integer, which is 4 bytes, as our array type, the formula now has 400 as newcap and size is 4. Resulting in:
100 + (1000L * 4) / 9 => 544, meaning 544% of the original size requested.
I'm not sure why larger array element types should need more relative space, it doesn't make a lot of sense, but not much is explained in the comment on why the formula is the way it is.
I'm unsure how to change this, any ideas?
For one, I think we can use the newCapacity function always, even when appending pages (I think it will just append as many pages as it can up to the requested size).
Comment #7 by fawzi — 2010-07-14T05:29:31Z
When this issue was discovered in tango I invested some time thinking about it.
What I came out with is
http://github.com/fawzi/blip/blob/master/blip/util/Grow.d
a version of which I had also used in tango.
Looking again at it I think that I decided to grow based on memory usage, and give "extra" space to arrays with large elements just using rounding up.
Particularly with 32-bit one should be aware that some calculation in extreme cases could overflow.
I release the growing code with whatever license needed.
Comment #8 by schveiguy — 2010-07-14T13:36:01Z
Fixed in changeset http://www.dsource.org/projects/druntime/changeset/332
This should grow at a more normal rate (logarigthmic in relation to the size of the array) There will be short growths, that's when just appending free pages can satisfy the append vs. having to commit more pages from the OS.
More tweaking may get this to be a better curve, but I'm pretty happy with it now.
Your test program now looks like this:
length: 1 capacity: 3 ratio: inf
length: 4 capacity: 7 ratio: 2.33333
length: 8 capacity: 15 ratio: 2.14286
length: 16 capacity: 31 ratio: 2.06667
length: 32 capacity: 63 ratio: 2.03226
length: 64 capacity: 127 ratio: 2.01587
length: 128 capacity: 255 ratio: 2.00787
length: 256 capacity: 511 ratio: 2.00392
length: 512 capacity: 1019 ratio: 1.99413
length: 1020 capacity: 2043 ratio: 2.00491
length: 2044 capacity: 4091 ratio: 2.00245
length: 4092 capacity: 7163 ratio: 1.75092
length: 7164 capacity: 8187 ratio: 1.14296
length: 8188 capacity: 9211 ratio: 1.12508
length: 9212 capacity: 15355 ratio: 1.66703
length: 15356 capacity: 24571 ratio: 1.6002
length: 24572 capacity: 25595 ratio: 1.04168
length: 25596 capacity: 40955 ratio: 1.60012
length: 40956 capacity: 41979 ratio: 1.025
length: 41980 capacity: 65531 ratio: 1.56104
length: 65532 capacity: 73723 ratio: 1.12501
length: 73724 capacity: 74747 ratio: 1.01389
length: 74748 capacity: 113659 ratio: 1.52058
length: 113660 capacity: 122875 ratio: 1.08108
length: 122876 capacity: 123899 ratio: 1.00833
length: 123900 capacity: 188411 ratio: 1.52068
length: 188412 capacity: 282619 ratio: 1.50001
length: 282620 capacity: 294907 ratio: 1.04348
length: 294908 capacity: 295931 ratio: 1.00347
length: 295932 capacity: 435195 ratio: 1.4706
length: 435196 capacity: 442363 ratio: 1.01647
length: 442364 capacity: 651259 ratio: 1.47223
length: 651260 capacity: 655355 ratio: 1.00629
length: 655356 capacity: 950267 ratio: 1.45
length: 950268 capacity: 951291 ratio: 1.00108
length: 951292 capacity: 1380347 ratio: 1.45102
length: 1380348 capacity: 1392635 ratio: 1.0089
...
Comment #9 by acehreli — 2010-07-14T14:00:49Z
Thanks! Looks pretty smooth now... :)
Comment #10 by bearophile_hugs — 2010-07-14T14:55:42Z
After this change how is the output of the program in Comment 5? Is it unchanged?
Comment #11 by schveiguy — 2010-07-15T07:22:10Z
The output from comment 5 looks like this:
3
1.75
1.875
1.9375
1.96875
1.98438
1.99219
1.99609
1.99023
1.50001
1.47222
1.45
1.43015
1.41
1.40007
1.38002
1.37002
and then my system starts thrashing :) I only have 1G of RAM