Bug 20270 – [REG2.087] Deadlock in garbage collection when running processes in parallel

Status
RESOLVED
Resolution
FIXED
Severity
regression
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2019-10-06T09:33:49Z
Last change time
2019-11-28T19:22:58Z
Keywords
pull
Assigned to
No Owner
Creator
Vladimir Panteleev

Comments

Comment #0 by dlang-bugzilla — 2019-10-06T09:33:49Z
///////////// test.d ///////////// import std.parallelism; import std.process; import std.range; void main() { foreach (i; 200.iota.parallel) execute(["true"]); } ////////////////////////////////// This program has a roughly 60% chance to deadlock and never finish executing on my machine. Inspecting the program's state with a debugger shows that the threads are generally in one of these states: Thread 11 (Thread 0x7f2a80ff9700 (LWP 424924)): #0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6 #1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) () #2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) () #3 0x0000563bd079ba95 in thread_suspendHandler () #4 <signal handler called> #5 0x00007f2a89e8da6a in read () from /usr/lib/libpthread.so.0 ... Thread 10 (Thread 0x7f2a817fa700 (LWP 424923)): #0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6 #1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) () #2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) () #3 0x0000563bd079ba95 in thread_suspendHandler () #4 <signal handler called> #5 0x00007f2a89c11414 in fork () from /usr/lib/libc.so.6 ... Thread 9 (Thread 0x7f2a81ffb700 (LWP 424922)): #0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6 #1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) () #2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) () #3 0x0000563bd079ba95 in thread_suspendHandler () #4 <signal handler called> #5 0x00007f2a89c515c9 in __lll_lock_wait_private () from /usr/lib/libc.so.6 #6 0x00007f2a89c51a88 in __run_fork_handlers () from /usr/lib/libc.so.6 #7 0x00007f2a89c113e9 in fork () from /usr/lib/libc.so.6 ... Thread 8 (Thread 0x7f2a827fc700 (LWP 424921)): #0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6 #1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) () #2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) () #3 0x0000563bd079ba95 in thread_suspendHandler () #4 <signal handler called> #5 0x00007f2a89e8e145 in nanosleep () from /usr/lib/libpthread.so.0 #6 0x0000563bd077370e in _D4core6thread6Thread5sleepFNbNiSQBf4time8DurationZv () #7 0x0000563bd07b3e2e in core.internal.spinlock.SpinLock.yield(ulong) shared () #8 0x0000563bd07b3dc4 in core.internal.spinlock.SpinLock.lock() shared () #9 0x0000563bd07c9307 in _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl () #10 0x0000563bd07c1456 in _D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ () #11 0x0000563bd0787fe7 in gc_qalloc () ... Thread 7 (Thread 0x7f2a82ffd700 (LWP 424920)): #0 0x00007f2a89c515cb in __lll_lock_wait_private () from /usr/lib/libc.so.6 #1 0x00007f2a89bd06b3 in calloc () from /usr/lib/libc.so.6 #2 0x0000563bd07c61ad in _D2gc4impl12conservativeQw3Gcx16startScanThreadsMFNbZv () #3 0x0000563bd07c5f44 in _D2gc4impl12conservativeQw3Gcx12markParallelMFNbbZv () #4 0x0000563bd07c5862 in _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm () #5 0x0000563bd07c4050 in _D2gc4impl12conservativeQw3Gcx8bigAllocMFNbmKmkxC8TypeInfoZPv () #6 0x0000563bd07c935a in _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl () #7 0x0000563bd07c1456 in _D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ () #8 0x0000563bd0787fe7 in gc_qalloc () ...
Comment #1 by igor.khasilev — 2019-10-06T09:49:58Z
May (or may not) be related https://issues.dlang.org/show_bug.cgi?id=20256 if scanthread do not block SIGUSR1 and SIGUSR2
Comment #2 by dlang-bugzilla — 2019-10-06T09:56:49Z
(In reply to igor.khasilev from comment #1) > May (or may not) be related https://issues.dlang.org/show_bug.cgi?id=20256 > if scanthread do not block SIGUSR1 and SIGUSR2 Unfortunately `digger run stable+druntime#2813 -- dmd -run test` still hangs.
Comment #3 by r.sagitario — 2019-10-06T10:17:11Z
I cannot reproduce locally in a VM. Does the problem go away with --DRT-gcopt=parallel:0 ?
Comment #4 by dlang-bugzilla — 2019-10-06T10:18:50Z
(In reply to Rainer Schuetze from comment #3) > Does the problem go away with --DRT-gcopt=parallel:0 ? Yes.
Comment #5 by dlang-bugzilla — 2019-10-06T10:38:19Z
(In reply to Rainer Schuetze from comment #3) > I cannot reproduce locally in a VM. From experimenting with taskset, it seems that there need to be at least 5 physical cores to run threads on for this bug to be reproduced. (Does not reproduce with `taskset f` but does reproduce with `taskset 1f`.)
Comment #6 by dlang-bot — 2019-10-06T10:45:39Z
@rainers created dlang/druntime pull request #2816 "fix Issue 20270 - [REG2.087] Deadlock in garbage collection when runn…" fixing this issue: - fix Issue 20270 - [REG2.087] Deadlock in garbage collection when running processes in parallel start scan threads while the world isn't suspended https://github.com/dlang/druntime/pull/2816
Comment #7 by r.sagitario — 2019-10-06T10:47:41Z
I have reproduced the issue when running the test for a higher number of times. Not sure why this doesn't appear more often. Please try https://github.com/dlang/druntime/pull/2816
Comment #8 by r.sagitario — 2019-11-28T19:22:58Z
Not sure why this wasn't closed by the bot when https://github.com/dlang/druntime/pull/2816 got merged.