Bug 13416 – dead-lock in FreeBSD suspend handler

Status
RESOLVED
Resolution
DUPLICATE
Severity
blocker
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
All
OS
FreeBSD
Creation time
2014-09-01T22:42:12Z
Last change time
2022-12-30T23:35:02Z
Keywords
pull
Assigned to
No Owner
Creator
Brad Roberts

Comments

Comment #0 by braddr — 2014-09-01T22:42:12Z
The unit test "obj/64/test_runner core.thread" semi-frequently deadlocks on the new build server. It's an 8 core system vs the older boxes 2 core system. (gdb) bt #0 0x0000000800a0a7ac in sigsuspend () from /lib/libc.so.7 #1 0x0000000800786db5 in sigsuspend () from /lib/libthr.so.3 #2 0x000000000048c55d in core.thread.thread_suspendHandler() () #3 0x000000000048db2c in core.thread.callWithStackShell() () #4 0x000000000048c4c9 in thread_suspendHandler () #5 <signal handler called> #6 0x000000080078956c in ?? () from /lib/libthr.so.3 #7 0x000000080078c5f0 in pthread_attr_get_np () from /lib/libthr.so.3 #8 0x000000000048e64d in core.thread.getStackBottom() () #9 0x000000000048c34a in thread_entryPoint () #10 0x00000008007835e1 in ?? () from /lib/libthr.so.3 #11 0x00007ffffeffa000 in ?? () Cannot access memory at address 0x7fffff1fa000 (gdb) thr 2 [Switching to thread 2 (Thread 800c041c0 (LWP 100229 initial thread))] #0 0x000000080078d64c in ?? () from /lib/libthr.so.3 (gdb) bt #0 0x000000080078d64c in ?? () from /lib/libthr.so.3 #1 0x000000080078d33c in ?? () from /lib/libthr.so.3 #2 0x00000008007894bd in ?? () from /lib/libthr.so.3 #3 0x000000080078902d in pthread_kill () from /lib/libthr.so.3 #4 0x000000000048db6b in core.thread.suspend() () #5 0x000000000048dd67 in thread_suspendAll () #6 0x00000000004e565a in gc.gc.Gcx.fullcollect() () #7 0x00000000004e3d5b in gc.gc.GC.fullCollect() () #8 0x00000000004e7df3 in gc_collect () #9 0x000000000048b22d in core.memory.GC.collect() () #10 0x000000000048fb5a in core.thread.__unittestL4780_99() () #11 0x00000000004900d6 in core.thread.__modtest() () #12 0x0000000000472d22 in test_runner.tester() () #13 0x000000000048b90a in runModuleUnitTests () #14 0x0000000000503903 in rt.dmain2._d_run_main() () #15 0x00000000005038b6 in rt.dmain2._d_run_main() () #16 0x0000000000503837 in _d_run_main () #17 0x0000000000472e53 in main ()
Comment #1 by braddr — 2014-09-02T06:17:26Z
Same behavior and stacktraces on the new 8 core freebsd 32 bit box as well. Both are running freebsd 8.4, same as the other freebsd testers.
Comment #2 by dfj1esp02 — 2014-09-10T11:56:55Z
pthread_kill hangs? Shouldn't it be asynchronous?
Comment #3 by monarchdodra — 2014-10-10T20:29:19Z
Upgraded to "BLOCKER", as this (relativelly frequently) trips up the auto-testers.
Comment #4 by code — 2014-11-22T22:52:04Z
That's a dead-lock in the pthread library. Both pthread_attr_get_np and pthread_kill lock the same thread mutex. _pthread_attr_get_np: https://github.com/freebsd/freebsd/blob/428b45aa532260e8c6ddf0217ec31db2234d29a8/lib/libthr/thread/thr_attr.c#L154 _pthread_kill: https://github.com/freebsd/freebsd/blob/428b45aa532260e8c6ddf0217ec31db2234d29a8/lib/libthr/thread/thr_kill.c#L64 _thr_find_thread: https://github.com/freebsd/freebsd/blob/428b45aa532260e8c6ddf0217ec31db2234d29a8/lib/libthr/thread/thr_list.c#L351 We should try to use pthread_suspend_np or pthread_suspend_all_np instead. Without a signal handler we'd still need to obtain the stack top. There seems to be a function on OpenBSD pthread_stackseg_np, not sure how to do it on FreeBSD.
Comment #5 by code — 2014-12-07T00:26:57Z
Fairly simple to reproduce the problem. cat > bug.d << CODE import core.thread, core.sys.posix.pthread, core.stdc.stdio; void loop() { pthread_attr_t attr; pthread_attr_init(&attr); auto thr = pthread_self(); while (true) pthread_attr_get_np(thr, &attr); } void main() { auto thr = new Thread(&loop).start(); while (true) { thread_suspendAll(); thread_resumeAll(); printf("."); } } CODE dmd -run bug
Comment #6 by code — 2014-12-07T01:37:53Z
Using pthread_suspend_np didn't work out, because there is no way to get the current stack top of a suspended thread. I also tried to override SIGCANCEL which is used for pthread_suspend_np but that didn't work. https://github.com/D-Programming-Language/druntime/pull/1061
Comment #7 by github-bugzilla — 2014-12-15T16:55:55Z
Commits pushed to master at https://github.com/D-Programming-Language/druntime https://github.com/D-Programming-Language/druntime/commit/ad8662d65fe8f24be2c64c721eabe4da7f78b31f fix Issue 13416 - dead-lock in FreeBSD suspend handler - use pthread internal THR_IN_CRITICAL to retry suspend https://github.com/D-Programming-Language/druntime/commit/513ba191f3e8b78aeb99336e27212dfdcacb39c5 Merge pull request #1061 from MartinNowak/fix13416 fix Issue 13416 - dead-lock in FreeBSD suspend handler
Comment #8 by github-bugzilla — 2015-02-18T03:38:33Z
Comment #9 by dbugz — 2015-05-12T21:20:40Z
This fix doesn't seem to work on 9.1 i386, as the new FreeBSD test introduced in this PR hangs 90+% of the time.
Comment #10 by code — 2015-10-31T03:58:20Z
(In reply to Joakim from comment #9) > This fix doesn't seem to work on 9.1 i386, as the new FreeBSD test > introduced in this PR hangs 90+% of the time. > I also tried to override SIGCANCEL > which is used for pthread_suspend_np but that didn't work. SIGCANCEL is the signal used by pthread_suspend_np internally. The signal handler already deal with being in critical regions, hence it doesn't suffer from the deadlock. As it isn't allowed to overrride SIGCANCEL we imitated the behavior by poking in pthread guts (THR_IN_CRITICAL).
Comment #11 by code — 2015-10-31T03:59:43Z
(In reply to Joakim from comment #9) > This fix doesn't seem to work on 9.1 i386, as the new FreeBSD test > introduced in this PR hangs 90+% of the time. Any further details? It doesn't seem like the pthread layout changed from 8.x to 9.1. Is it easy to reproduce w/ the test case of comment 5?
Comment #12 by dbugz — 2018-08-26T04:07:58Z
Sorry, only seeing your question now. I was simply checking the D tests on FreeBSD back then but I haven't used that OS in years, so can't look into it further now.
Comment #13 by dlang-bot — 2022-01-18T18:23:38Z
@ibuclaw created dlang/druntime pull request #3682 "Issue 13416: Remove libthr hack from core.thread.osthread" mentioning this issue: - Issue 13416: Remove libthr hack from core.thread.osthread https://github.com/dlang/druntime/pull/3682
Comment #14 by dlang-bot — 2022-01-20T12:36:15Z
dlang/druntime pull request #3682 "Issue 13416: Remove libthr hack from core.thread.osthread" was merged into master: - 3ac665c49d7aae1893c4e4535f60d1b4e2d427a3 by Iain Buclaw: Issue 13416: Remove libthr hack from core.thread.osthread https://github.com/dlang/druntime/pull/3682
Comment #15 by ibuclaw — 2022-12-30T23:35:02Z
Suspend signals changed to SIGRTMIN. https://github.com/dlang/druntime/pull/3617 *** This issue has been marked as a duplicate of issue 15939 ***