Bug 20270 – [REG2.087] Deadlock in garbage collection when running processes in parallel
Status
RESOLVED
Resolution
FIXED
Severity
regression
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2019-10-06T09:33:49Z
Last change time
2019-11-28T19:22:58Z
Keywords
pull
Assigned to
No Owner
Creator
Vladimir Panteleev
Comments
Comment #0 by dlang-bugzilla — 2019-10-06T09:33:49Z
///////////// test.d /////////////
import std.parallelism;
import std.process;
import std.range;
void main()
{
foreach (i; 200.iota.parallel)
execute(["true"]);
}
//////////////////////////////////
This program has a roughly 60% chance to deadlock and never finish executing on my machine.
Inspecting the program's state with a debugger shows that the threads are generally in one of these states:
Thread 11 (Thread 0x7f2a80ff9700 (LWP 424924)):
#0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6
#1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) ()
#2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) ()
#3 0x0000563bd079ba95 in thread_suspendHandler ()
#4 <signal handler called>
#5 0x00007f2a89e8da6a in read () from /usr/lib/libpthread.so.0
...
Thread 10 (Thread 0x7f2a817fa700 (LWP 424923)):
#0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6
#1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) ()
#2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) ()
#3 0x0000563bd079ba95 in thread_suspendHandler ()
#4 <signal handler called>
#5 0x00007f2a89c11414 in fork () from /usr/lib/libc.so.6
...
Thread 9 (Thread 0x7f2a81ffb700 (LWP 424922)):
#0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6
#1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) ()
#2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) ()
#3 0x0000563bd079ba95 in thread_suspendHandler ()
#4 <signal handler called>
#5 0x00007f2a89c515c9 in __lll_lock_wait_private () from /usr/lib/libc.so.6
#6 0x00007f2a89c51a88 in __run_fork_handlers () from /usr/lib/libc.so.6
#7 0x00007f2a89c113e9 in fork () from /usr/lib/libc.so.6
...
Thread 8 (Thread 0x7f2a827fc700 (LWP 424921)):
#0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6
#1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) ()
#2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) ()
#3 0x0000563bd079ba95 in thread_suspendHandler ()
#4 <signal handler called>
#5 0x00007f2a89e8e145 in nanosleep () from /usr/lib/libpthread.so.0
#6 0x0000563bd077370e in _D4core6thread6Thread5sleepFNbNiSQBf4time8DurationZv ()
#7 0x0000563bd07b3e2e in core.internal.spinlock.SpinLock.yield(ulong) shared ()
#8 0x0000563bd07b3dc4 in core.internal.spinlock.SpinLock.lock() shared ()
#9 0x0000563bd07c9307 in _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl ()
#10 0x0000563bd07c1456 in _D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ ()
#11 0x0000563bd0787fe7 in gc_qalloc ()
...
Thread 7 (Thread 0x7f2a82ffd700 (LWP 424920)):
#0 0x00007f2a89c515cb in __lll_lock_wait_private () from /usr/lib/libc.so.6
#1 0x00007f2a89bd06b3 in calloc () from /usr/lib/libc.so.6
#2 0x0000563bd07c61ad in _D2gc4impl12conservativeQw3Gcx16startScanThreadsMFNbZv ()
#3 0x0000563bd07c5f44 in _D2gc4impl12conservativeQw3Gcx12markParallelMFNbbZv ()
#4 0x0000563bd07c5862 in _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm ()
#5 0x0000563bd07c4050 in _D2gc4impl12conservativeQw3Gcx8bigAllocMFNbmKmkxC8TypeInfoZPv ()
#6 0x0000563bd07c935a in _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl ()
#7 0x0000563bd07c1456 in _D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ ()
#8 0x0000563bd0787fe7 in gc_qalloc ()
...
Comment #1 by igor.khasilev — 2019-10-06T09:49:58Z
Comment #2 by dlang-bugzilla — 2019-10-06T09:56:49Z
(In reply to igor.khasilev from comment #1)
> May (or may not) be related https://issues.dlang.org/show_bug.cgi?id=20256
> if scanthread do not block SIGUSR1 and SIGUSR2
Unfortunately `digger run stable+druntime#2813 -- dmd -run test` still hangs.
Comment #3 by r.sagitario — 2019-10-06T10:17:11Z
I cannot reproduce locally in a VM. Does the problem go away with --DRT-gcopt=parallel:0 ?
Comment #4 by dlang-bugzilla — 2019-10-06T10:18:50Z
(In reply to Rainer Schuetze from comment #3)
> Does the problem go away with --DRT-gcopt=parallel:0 ?
Yes.
Comment #5 by dlang-bugzilla — 2019-10-06T10:38:19Z
(In reply to Rainer Schuetze from comment #3)
> I cannot reproduce locally in a VM.
From experimenting with taskset, it seems that there need to be at least 5 physical cores to run threads on for this bug to be reproduced. (Does not reproduce with `taskset f` but does reproduce with `taskset 1f`.)
Comment #6 by dlang-bot — 2019-10-06T10:45:39Z
@rainers created dlang/druntime pull request #2816 "fix Issue 20270 - [REG2.087] Deadlock in garbage collection when runn…" fixing this issue:
- fix Issue 20270 - [REG2.087] Deadlock in garbage collection when running processes in parallel
start scan threads while the world isn't suspended
https://github.com/dlang/druntime/pull/2816
Comment #7 by r.sagitario — 2019-10-06T10:47:41Z
I have reproduced the issue when running the test for a higher number of times. Not sure why this doesn't appear more often. Please try https://github.com/dlang/druntime/pull/2816