Bug 17914 – [Reg 2.075] Fibers guard page uses a lot more memory mappings

Status
RESOLVED
Resolution
FIXED
Severity
regression
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2017-10-19T00:25:39Z
Last change time
2017-12-18T22:56:13Z
Keywords
industry
Assigned to
No Owner
Creator
briancschott

Comments

Comment #0 by briancschott — 2017-10-19T00:25:39Z
The fiber stack overflow protection introduced in 2.075 has introduced a per-process limit of the number of times a function implemented using fibers can be called. This limit is located in `/proc/sys/vm/max_map_count`. The problem is caused by unbalanced calls to `allocStack` and `freeStack` in core.thread. There is one call to mmap and one call to mprotect per fiber use, but no corresponding call to munmap to deallocate the memory protection information inside of the kernel. This is caused by the fact that Fiber is a class and `freeStack` is only called in Fiber's destructor. (Which of course means there's no guarantee that it is run.) Once this limit (65530) is hit, the program fails with a SIGABRT.
Comment #1 by briancschott — 2017-10-24T22:40:45Z
Minimal test case: void main() { import std.concurrency : Generator, yield; import std.stdio : File, writeln; auto f = File("/proc/sys/vm/max_map_count", "r"); ulong n; f.readf("%d", &n); writeln("/proc/sys/vm/max_map_count = ", n); foreach (i; 0 .. n + 1000) { if (i % 1000 == 0) writeln("i = ", i); new Generator!int({ yield(1); }); } }
Comment #2 by code — 2017-10-26T17:18:22Z
Here is what I'm seeing, calling mprotect splits the mapped range into 2, a 16K rw--- and a 4K ----- mapping. Calling GC.collect finalizes all unreachable Fibers and unmaps *both* regions with a single `munmap(p, 20K)` call, so we don't really leak anything here. What's actually causing the limit is the fact that the kernel can no longer merge the mappings, hence the limit gets reached much quicker. foreach (i; 0 .. 32 * 1_024) { new Fiber(function{}, 4 * 4096, 0); } 00007f63304e9000 123184K rw--- [ anon ] vs. foreach (i; 0 .. 32 * 1_024) { new Fiber(function{}, 3 * 4096, 4096); } 00007f2819cbf000 12K rw--- [ anon ] 00007f2819cc2000 4K ----- [ anon ] 00007f2819cc3000 12K rw--- [ anon ] 00007f2819cc6000 4K ----- [ anon ] 00007f2819cc7000 12K rw--- [ anon ] 00007f2819cca000 4K ----- [ anon ] 00007f2819ccb000 12K rw--- [ anon ] 00007f2819cce000 4K ----- [ anon ] 00007f2819ccf000 12K rw--- [ anon ] 00007f2819cd2000 4K ----- [ anon ] 00007f2819cd3000 12K rw--- [ anon ] 00007f2819cd6000 4K ----- [ anon ] I'd err on the side of memory safety. Stack overflows are an actual problem with Fibers and hard&time-consuming to debug. For using 32K+ Fibers it doesn't seem too reasonable to use custom stack and guard sizes, e.g. a fairly big default stack size without guard page would prevent most issues. Any better idea to detect/prevent stack overflows?
Comment #3 by schveiguy — 2017-10-26T17:37:28Z
(In reply to Martin Nowak from comment #2) > What's actually causing the limit is the fact that the kernel can no longer > merge the mappings, hence the limit gets reached much quicker. This would explain why the limit wasn't being reached before the change! > I'd err on the side of memory safety. Stack overflows are an actual problem > with Fibers and hard&time-consuming to debug. Except this breaks existing code that may not have a stack overflow problem. I hate to say it, but unfortunately, we are stuck with the status quo for now. > Any better idea to detect/prevent stack overflows? A possible idea is putting a pattern in the guard data (leave it read/write), and check the guard data on fiber deallocation, and optionally on every yield (could be a performance issue, but maybe only used in debug mode?) You can also check the current SP on yield to make sure it's in-bounds (cheap but doesn't cover everything). Another option is to put a read-only guard page on only some of the fibers. If you are allocating a lot of similar fibers, you are bound to hit one eventually. Still another option is to allocate the guard page on as many fibers as you can, and then back off when you can't. This could be difficult, however, as the relationship between mmap'd regions and the number of fibers may not be linear. None of this prevents 100% of corruption, but they are better than nothing.
Comment #4 by code — 2017-10-26T19:14:30Z
(In reply to Steven Schveighoffer from comment #3) > > I'd err on the side of memory safety. Stack overflows are an actual problem > > with Fibers and hard&time-consuming to debug. > > Except this breaks existing code that may not have a stack overflow problem. > I hate to say it, but unfortunately, we are stuck with the status quo for > now. The problem is, this either break toy code that just creates lots of useless Fibers or heavy production code that actually needs 30K Fibers. For the latter it's reasonable to ask for manual tuning of either the kernel or the Fiber stack&guard size. For the former it's fairly hard to justify that the default has such a severe memory and safety hole. Why isn't the number of guard pages a problem on Windows? That kernel also needs to keep track of lots of memory regions. If we really need a comprise here, I'd definitely increase the stack size further, e.g. to 64KiB (vibe.d's default).
Comment #5 by briancschott — 2017-10-26T20:30:40Z
> The problem is, this either break toy code that just > creates lots of useless Fibers or heavy production code > that actually needs 30K Fibers. That's not entirely accurate. I found this because the range implementation for a complex multi-dimensional tree structure was implemented as a std.concurrency.Generator. The problem is caused by doing many tree traversals in a loop on a machine that has 512GB of memory. We don't actually need 30k fibers. We need one fiber at a time 30k times in a row. By the time we start using the second fiber we don't need the first one anymore. It's possible to re-implement the range in question, but the amount of code required to do so is much greater than the current implementation using recursion and fibers.
Comment #6 by schveiguy — 2017-10-26T21:41:10Z
(In reply to briancschott from comment #5) > That's not entirely accurate. I found this because the range implementation > for a complex multi-dimensional tree structure was implemented as a > std.concurrency.Generator. Right, the issue here is not so much that you need a certain number of concurrent fibers, it's that the limited resource (64K items per process) is tied to memory management (essentially infinite number of items per process, and no requirement to release). It's the same argument as to why File is ref-counted with malloc and not a class. So toy examples may fail, large concurrent webservers may fail (but arguably, those can be tuned to work), OR reasonable innocuous code may fail if it falls into the right trap using GC-managed data. Worth noting that simply turning off GC collections can trigger this error. A heavy-handed solution might be that if there is a limit on Fiber creation, then you should have an object to manage them. But that would be a drastic design change. Another possibility is that we make a type-change to track the "good" behavior and the "bad" behavior separately. For example, SafeFiber, which does the guard page. Then at least it's opt-in. (In reply to Martin Nowak from comment #4) > Why isn't the number of guard pages a problem on Windows? That kernel also > needs to keep track of lots of memory regions. The limit is 64k *per process*, so obviously, the kernel can handle quite a bit more. Worth noting that Windows has a much different memory layout than Unix. > If we really need a comprise here, I'd definitely increase the stack size > further, e.g. to 64KiB (vibe.d's default). That's also a possible option. We could do that along with the reversal of the guard page.
Comment #7 by 4burgos — 2017-10-27T09:32:09Z
(In reply to briancschott from comment #5) > > The problem is, this either break toy code that just > > creates lots of useless Fibers or heavy production code > > that actually needs 30K Fibers. > > That's not entirely accurate. I found this because the range implementation > for a complex multi-dimensional tree structure was implemented as a > std.concurrency.Generator. The problem is caused by doing many tree > traversals in a loop on a machine that has 512GB of memory. We don't > actually need 30k fibers. We need one fiber at a time 30k times in a row. By > the time we start using the second fiber we don't need the first one anymore. > > It's possible to re-implement the range in question, but the amount of code > required to do so is much greater than the current implementation using > recursion and fibers. Is it not possible just to use a simple pool for the Fibers? So, instead of allocating a new fiber, you reset the recycled one to the new state. Is this not possible with the std.concurrency.Generator? > If we really need a comprise here, I'd definitely increase the stack size further, e.g. to 64KiB (vibe.d's default). This could be a problem on 32bit systems, where this would suck available address space very quickly, if I'm not mistaken. Of course, we could still make it a default for 64bit systems. > A possible idea is putting a pattern in the guard data (leave it read/write), and check the guard data on fiber deallocation, and optionally on every yield (could be a performance issue, but maybe only used in debug mode?) One problem I can see with this is that it doesn't show the problem on the spot - good thing about protected region is that we die as soon as we try accessing it. If the fiber does a lot of calls before yields, it may be that we're far from the place that actually corrupted memory, and it's not that easy to debug. Of course, it still prevents memory corruption (but at a cost that grows linearly with the number of fibers). > I'd err on the side of memory safety. Stack overflows are an actual problem with Fibers and hard&time-consuming to debug. Yes. I can't emphasize enough how many times we had these issues. And it's never been a problem for us to confirm that this is what's happening with the guard page in. It was always a hell to confirm what's going on with the previous runtime, mostly because of the fact that it's not the fiber that's actually corrupting memory the one it will crash, but some other poor fiber that happens to have a stack adjacent to the rouge one. Having said all this - why not just keep the default guard page on, but if somebody is confident they don't need it (and they could even do a `version(debug){}else{guardPageOff}`), simply providing 0 for the guard size in the Fiber's constructor will turn this entire thing off? Perhaps we could document this better (as I didn't expect this issue to raise when implementing this, not much documentation is there). However, I also thing Steven has a valid point when he says: > Except this breaks existing code that may not have a stack overflow problem. I hate to say it, but unfortunately, we are stuck with the status quo for now.
Comment #8 by code — 2017-10-27T12:46:14Z
We could keep track of the number of unfinalized Fibers with guard pages and trigger a GC collection when reaching 30K, but that's too hacky for my taste. > We don't actually need 30k fibers. We need one fiber at a time 30k times in a row. As for your specific problem Brian, could you use `std.typecons.scoped!(Generator!T)` which provides deterministic destruction? Scoped is only moveable, not copyable. auto gen = scoped!(Generator!int)({yield(42);}); auto gen2 = move(gen); // not copyable assert(gen2.front == 42); Potentially if you have unclear ownership over that Generator and cannot restrict yourself to `move`, you could combine it with `refCounted`, seems like that combination doesn't work atm. though :/. auto gen = refCounted(scoped!(Generator!int)({yield(42);})); auto gen2 = gen; // 2nd reference to generator assert(gen2.front == 42);
Comment #9 by code — 2017-10-27T23:18:52Z
If scoped isn't enough, you could use automem (~master or the next release) for ref-counted classes. http://code.dlang.org/packages/automem import automem.ref_counted; auto gen = RefCounted!(Generator!int)({yield(42);}); writeln(gen.front); gen.popFront; assert(gen.empty);
Comment #10 by dfj1esp02 — 2017-10-30T12:44:33Z
Judging by the code Generator can just destroy itself on termination. It's not reusable, is it?
Comment #11 by github-bugzilla — 2017-11-02T21:48:27Z
Commit pushed to stable at https://github.com/dlang/druntime https://github.com/dlang/druntime/commit/9fee0ca6365408d753aedb517c0896af36966b65 Merge pull request #1966 from nemanja-boric-sociomantic/fiber-docs Fix issue 17914: Document connection between fiber guard and mmap limit merged-on-behalf-of: Steven Schveighoffer <[email protected]>
Comment #12 by schveiguy — 2017-11-02T21:53:14Z
Not fixed yet, still waiting for: https://github.com/dlang/phobos/pull/5835
Comment #13 by github-bugzilla — 2017-11-21T13:19:45Z
Commit pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/1c12d6d59e6a3eccf42efe379066c0a9c658fa36 Add std.concurrency.Generator overloads to accept Fiber's stack guard's size As of 2.075 Fiber's stack protection pages are on by default, preventing OS to merge mmaped ranges, so the total number of mmaped ranges is much higher than before. In case there's no need for the stack overflow protection and the high number of fibers, it may be wise to change this setting, so the Generator's constructor overloads which support this are added. Fix issue 17914
Comment #14 by github-bugzilla — 2017-11-21T14:59:19Z
Commit pushed to master at https://github.com/dlang/druntime https://github.com/dlang/druntime/commit/9fee0ca6365408d753aedb517c0896af36966b65 Merge pull request #1966 from nemanja-boric-sociomantic/fiber-docs
Comment #15 by github-bugzilla — 2017-12-18T22:56:13Z
Commit pushed to stable at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/1c12d6d59e6a3eccf42efe379066c0a9c658fa36 Add std.concurrency.Generator overloads to accept Fiber's stack guard's size