Bug 1804 – Severe GC leaks with repetitive array allocations
Status
RESOLVED
Resolution
FIXED
Severity
major
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Windows
Creation time
2008-01-24T15:18:00Z
Last change time
2015-06-09T05:11:52Z
Keywords
wrong-code
Assigned to
nobody
Creator
webmaster
Comments
Comment #0 by webmaster — 2008-01-24T15:18:29Z
Linux (Fedora 6), 64 bit
The following program will run out of memory when the allocations reach about 8 MB. The problem happens on *both* 1.026 and 2.010
FYI: I tried running fullCollect() periodically, just in case...no luck.
import std.stdio;
void main()
{
for(int i=1; ; i++)
{
auto tmp = new char[i*1024];
if(i % 1_000 == 0)
writefln("%d iterations, roughly %.1f MB", i, tmp.length/(1024*1024.0));
}
}
Comment #1 by sean — 2008-01-24T15:50:19Z
[email protected] wrote:
>
> FYI: I tried running fullCollect() periodically, just in case...no luck.
>
> import std.stdio;
> void main()
> {
> for(int i=1; ; i++)
> {
> auto tmp = new char[i*1024];
> if(i % 1_000 == 0)
> writefln("%d iterations, roughly %.1f MB", i,
> tmp.length/(1024*1024.0));
> }
> }
Interestingly, this app crashes in the same place if 'tmp' is deleted on
each iteration. If I had to guess, I'd say that empty pools aren't
being added to the free list after collections, even though there's code
in place that's supposed to do this. Perhaps it's just pools dedicated
to big allocations that are the problem. This portion of the GC isn't
really my area of expertise, but I'll submit a patch if I can figure
this one out.
Comment #2 by webmaster — 2008-01-24T17:18:22Z
This came up in a program where I had a large array, containing all of the prime numbers through some value. As I found more primes, I would append to the array. This works just fine, often (meaning that *sometimes* buffers are getting correctly collected), but after we hit some point the memory usage would spike because, whenever an append operation required a copy, the old array wasn't getting GC'd.
I originally theorized that the prime numbers in the array were happening to match addresses of arrays (and that the bug was that the GC was scanning non-pointer memory). But that doesn't play out, exactly. My gut is still that there is a false pointer out there somewhere. Are the memory allocator metadata structures scanned by the GC? Maybe, after lots of allocations, we by random get metadata that looks like a pointer, which pins some memory....and then it gradually builds from there. Dunno.
RECREATE UPDATE:
Update the writefln line to also trace the address:
writefln("%d iterations, roughly %.1f MB at %x", i, ..... , cast(uint)tmp.ptr);
Then change the loop to start at 1014, 1015, or 1016 (maybe others) and the 1st buffer to be allocated will only get garbage collected once: after it is allocated the 2nd time, it doesn't appear that it is ever collected again.
Comment #3 by sean — 2008-02-05T12:48:54Z
Okay, I think I know what's going on. For big allocations, the GC will look for a pre-existing set of contiguous pages within a single pool to hold the block. If one cannot be found, then it creates a new pool of the requested size to hold the block. However, in this app the allocation size just continuously increases, so as time goes on the GC is unable to re-use existing pools and so allocates successively larger pools to hold the requested block. The problem is that all these old pools which were only used once are held by the GC for future use, so the total memory used by the application steadily increases, with most of this memory going completely unused.
The simplest fix in this case would be for the GC to always release pools obtained for big allocations back to the OS when they are no longer needed. This should address the pathological case demonstrated in this test app. I'm going to make this change in Tango and see if it helps.
Comment #4 by sean — 2008-02-07T20:25:30Z
I've created a ticket for this in Tango:
http://www.dsource.org/projects/tango/ticket/878
The fix is done and will be checked in once DSource is back online. The
diff is pretty succinct, but I'd be happy to supply a patch for Phobos
if that would help.
Comment #5 by webmaster — 2008-03-31T12:34:27Z
FYI: I hit this problem on Phobos, not Tango.
Comment #6 by sean — 2008-04-03T18:20:43Z
Sure. But the same problem existed in Tango so I thought I'd diagnose it there and report my findings to save Walter some legwork with Phobos. Just trying to be helpful here :-)
Comment #7 by mk — 2013-04-01T11:17:06Z
Looks like it has been fixed somewhere between v1.056 and current v1.076.
Comment #8 by mk — 2013-06-01T05:12:20Z
I'm reopening this, because I noticed the problem (core.exception.OutOfMemoryError after about 20000 iterations) still exists on Windows XP 32bit with latest dmd 2.063. Works ok on Linux though.
Comment #9 by safety0ff.bugz — 2014-02-14T20:07:41Z
(In reply to comment #8)
> I'm reopening this, because I noticed the problem
> (core.exception.OutOfMemoryError after about 20000 iterations) still exists on
> Windows XP 32bit with latest dmd 2.063. Works ok on Linux though.
The original bug was fixed. Your observation is due to false pointers, consider the code at the end of this comment:
When version = A is enabled, the false pointers cause memory to be considered alive much longer, exactly like the code in the OP.
When version = A is disabled, only pointers to the start of the memory block make the GC consider it "alive," so less memory is retained by false pointers.
-------------- Begin Code ----------------------
import std.stdio;
import core.memory;
version = A;
void main()
{
for(uint i=1; i!=0; i++)
{
version(A) auto tmp = cast(char*)GC.malloc(i * 1024, GC.BlkAttr.NO_SCAN);
else auto tmp = cast(char*)GC.malloc(i * 1024, GC.BlkAttr.NO_SCAN | GC.BlkAttr.NO_INTERIOR);
for (uint j = 0; j<i*1024; j+=1024)
tmp[j] = 0; // poke the memory
if(i % 1_000 == 0)
writefln("%d iterations, roughly %.1f MiB", i, i/1024.0);
}
}
-------------- End Code -------------------------
Comment #10 by safety0ff.bugz — 2014-02-14T23:22:30Z
*** Issue 1980 has been marked as a duplicate of this issue. ***