Bug 16352 – dead-lock in std.allocator.free_list unittest

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2016-08-05T08:11:00Z
Last change time
2017-01-16T23:26:06Z
Assigned to
edi33416
Creator
code

Attachments

IDFilenameSummaryContent-TypeSize
1606faillog.txtCapture of log from failed test.text/plain156688

Comments

Comment #0 by code — 2016-08-05T08:11:07Z
Don't have more information but a failed auto-tester run where this test didn't complete within a minute (usually only takes a few ms). https://auto-tester.puremagic.com/show-run.ghtml?projectid=14&runid=2129483&isPull=true It was the release64 build that failed. Commits: dmd: 5a16fbbd9bcc65e52aabd517e6be8a77130cbc40 druntime: 0eade7404fa8bdea0d5088c3367eae7f7805ddce phobos: 01eb06bb3897cd359d01a6c268785e5ee42789c0
Comment #1 by schveiguy — 2016-08-05T17:44:05Z
Created attachment 1606 Capture of log from failed test. I'm pretty sure those logs go away. I've attached the log in any case.
Comment #2 by code — 2016-08-06T06:17:35Z
They do, thanks. There is not much information in the log other than it did hang at the commit hashes.
Comment #3 by john.loughran.colvin — 2016-12-12T22:25:31Z
After a bunch of testing I've managed to reproduce this reliably, stop it, attach gdb and get a backtrace. The hang happens here: https://github.com/dlang/phobos/blob/19445fc71e8aabdbd42f0ad8a571a57601a5ff39/std/experimental/allocator/building_blocks/free_list.d#L1025 In the backtrace you'll se std.experimental.allocator.building_blocks.free_list.__unittestL1020_10, that's just a consequence of some accidental reformatting before i tested, the real line number is 1025 as in the link above #0 0x0000667f4afa810f in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0 #1 0x000000000042e8cd in core.sync.condition.Condition.wait() () #2 0x0000000000414b80 in std.concurrency.MessageBox.get!(void(bool) pure nothrow @nogc @safe delegate, void(std.concurrency.LinkTerminated) pure @nogc @safe function, void(std.concurrency.OwnerTerminated) pure @nogc @safe function, void(std.variant.VariantN!(32uL).VariantN) function).get(void(bool) pure nothrow @nogc @safe delegate, void(std.concurrency.LinkTerminated) pure @nogc @safe function, void(std.concurrency.OwnerTerminated) pure @nogc @safe function, void(std.variant.VariantN!(32uL).VariantN) function) () #3 0x0000000000414146 in std.concurrency.receiveOnly!(bool).receiveOnly() () #4 0x0000000000402faa in std.experimental.allocator.building_blocks.free_list.__unittestL1020_10() () #5 0x0000000000419eba in std.experimental.allocator.building_blocks.free_list.__modtest() () #6 0x000000000042c5a1 in core.runtime.runModuleUnitTests().__foreachbody2(object.ModuleInfo*) () #7 0x000000000041bb6c in object.ModuleInfo.opApply(scope int(object.ModuleInfo*) delegate).__lambda2(immutable(object.ModuleInfo*)) () #8 0x0000000000421fb3 in rt.minfo.moduleinfos_apply(scope int(immutable(object.ModuleInfo*)) delegate).__foreachbody2(ref rt.sections_elf_shared.DSO) () #9 0x00000000004221b5 in rt.sections_elf_shared.DSO.opApply(scope int(ref rt.sections_elf_shared.DSO) delegate) () #10 0x0000000000421f44 in rt.minfo.moduleinfos_apply(scope int(immutable(object.ModuleInfo*)) delegate) () #11 0x000000000041bb48 in object.ModuleInfo.opApply(scope int(object.ModuleInfo*) delegate) () #12 0x000000000042c493 in runModuleUnitTests () #13 0x000000000041eab3 in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).runAll() () #14 0x000000000041ea51 in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).tryExec(scope void() delegate) () #15 0x000000000041e9cb in _d_run_main () #16 0x0000000000419ff6 in main () #17 0x0000667f4a4fa291 in __libc_start_main () from /usr/lib/libc.so.6 #18 0x000000000040280a in _start ()
Comment #4 by john.loughran.colvin — 2016-12-12T22:34:55Z
To reproduce on linux x86_64: % ../dmd/src/dmd -conf= -I../druntime/import -w -dip25 -m64 -O -release -main -unittest generated/linux/release/64/libphobos2.a -defaultlib= -debuglib= -L-ldl std/experimental/allocator/building_blocks/free_list.d % seq 10000 | xargs -Iz ./free_list
Comment #5 by john.loughran.colvin — 2016-12-13T12:48:09Z
It seems that all the threads exit but (in my tests) one message either is never sent or is never received by the main thread, so it sits in receiveOnly!bool
Comment #6 by safety0ff.bugz — 2016-12-22T10:37:30Z
SharedFreeList.allocate looks ABA prone: A thread does: do { oldRoot = _root; // atomic load if (!oldRoot) return allocateFresh(bytes); next = oldRoot.next; // atomic load } while (!cas(&_root, oldRoot, next)); But the value of `next` could have changed between the load and the cas.
Comment #7 by safety0ff.bugz — 2016-12-22T10:46:36Z
(In reply to safety0ff.bugz from comment #6) > > But the value of `next` could have changed between the load and the cas. I meant `oldRoot.next`. i.e. next != oldRoot.next after the cas succeeds.
Comment #8 by r.sagitario — 2016-12-22T17:17:38Z
> SharedFreeList.allocate looks ABA prone: I agree. The actual pattern to use depends on the hardware, but x86 usually uses a modification counter modified in lock step.
Comment #9 by safety0ff.bugz — 2016-12-22T18:12:19Z
(In reply to Rainer Schuetze from comment #8) > > I agree. The actual pattern to use depends on the hardware, but x86 usually > uses a modification counter modified in lock step. I'm just going to slap core.internal.spinlock on it for now. Somebody else can improve it later. I just don't want the autotester choking on unrelated changes. There's also the issue on x86_64 that we can't use the upper bits (because ParentAllocator could be GCAllocator,) and not all x86_64 machines have cmpxchg16b. AFAIK shared free lists aren't very good for high contention regardless.
Comment #10 by safety0ff.bugz — 2016-12-22T18:23:28Z
(In reply to safety0ff.bugz from comment #9) > > I'm just going to slap core.internal.spinlock on it for now. https://github.com/dlang/phobos/pull/4988
Comment #11 by github-bugzilla — 2016-12-23T19:13:28Z
Commits pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/5caa66ef31e40725c5548d40ead7dbefd19a0d79 Fix issue 16352 - dead-lock in std.allocator.free_list unittest https://github.com/dlang/phobos/commit/05e9cba20e2455cb94b5d70a0d6e873bf45cec14 Merge pull request #4988 from WalterWaldron/fix16352 Fix issue 16352 - dead-lock in std.allocator.free_list unittest
Comment #12 by github-bugzilla — 2016-12-24T16:08:30Z
Commit pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/f7e14e905b9d668ac788024f3c03c600adf0d84f Fix issue 16352 - dead-lock in std.allocator.free_list unittest This fixes the actual unittest.
Comment #13 by github-bugzilla — 2017-01-07T03:03:17Z
Commits pushed to stable at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/5caa66ef31e40725c5548d40ead7dbefd19a0d79 Fix issue 16352 - dead-lock in std.allocator.free_list unittest https://github.com/dlang/phobos/commit/05e9cba20e2455cb94b5d70a0d6e873bf45cec14 Merge pull request #4988 from WalterWaldron/fix16352 https://github.com/dlang/phobos/commit/f7e14e905b9d668ac788024f3c03c600adf0d84f Fix issue 16352 - dead-lock in std.allocator.free_list unittest
Comment #14 by github-bugzilla — 2017-01-16T23:26:06Z
Commits pushed to newCTFE at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/5caa66ef31e40725c5548d40ead7dbefd19a0d79 Fix issue 16352 - dead-lock in std.allocator.free_list unittest https://github.com/dlang/phobos/commit/05e9cba20e2455cb94b5d70a0d6e873bf45cec14 Merge pull request #4988 from WalterWaldron/fix16352 https://github.com/dlang/phobos/commit/f7e14e905b9d668ac788024f3c03c600adf0d84f Fix issue 16352 - dead-lock in std.allocator.free_list unittest