The unit test "obj/64/test_runner core.thread" semi-frequently deadlocks on the new build server. It's an 8 core system vs the older boxes 2 core system.
(gdb) bt
#0 0x0000000800a0a7ac in sigsuspend () from /lib/libc.so.7
#1 0x0000000800786db5 in sigsuspend () from /lib/libthr.so.3
#2 0x000000000048c55d in core.thread.thread_suspendHandler() ()
#3 0x000000000048db2c in core.thread.callWithStackShell() ()
#4 0x000000000048c4c9 in thread_suspendHandler ()
#5 <signal handler called>
#6 0x000000080078956c in ?? () from /lib/libthr.so.3
#7 0x000000080078c5f0 in pthread_attr_get_np () from /lib/libthr.so.3
#8 0x000000000048e64d in core.thread.getStackBottom() ()
#9 0x000000000048c34a in thread_entryPoint ()
#10 0x00000008007835e1 in ?? () from /lib/libthr.so.3
#11 0x00007ffffeffa000 in ?? ()
Cannot access memory at address 0x7fffff1fa000
(gdb) thr 2
[Switching to thread 2 (Thread 800c041c0 (LWP 100229 initial thread))]
#0 0x000000080078d64c in ?? () from /lib/libthr.so.3
(gdb) bt
#0 0x000000080078d64c in ?? () from /lib/libthr.so.3
#1 0x000000080078d33c in ?? () from /lib/libthr.so.3
#2 0x00000008007894bd in ?? () from /lib/libthr.so.3
#3 0x000000080078902d in pthread_kill () from /lib/libthr.so.3
#4 0x000000000048db6b in core.thread.suspend() ()
#5 0x000000000048dd67 in thread_suspendAll ()
#6 0x00000000004e565a in gc.gc.Gcx.fullcollect() ()
#7 0x00000000004e3d5b in gc.gc.GC.fullCollect() ()
#8 0x00000000004e7df3 in gc_collect ()
#9 0x000000000048b22d in core.memory.GC.collect() ()
#10 0x000000000048fb5a in core.thread.__unittestL4780_99() ()
#11 0x00000000004900d6 in core.thread.__modtest() ()
#12 0x0000000000472d22 in test_runner.tester() ()
#13 0x000000000048b90a in runModuleUnitTests ()
#14 0x0000000000503903 in rt.dmain2._d_run_main() ()
#15 0x00000000005038b6 in rt.dmain2._d_run_main() ()
#16 0x0000000000503837 in _d_run_main ()
#17 0x0000000000472e53 in main ()
Comment #1 by braddr — 2014-09-02T06:17:26Z
Same behavior and stacktraces on the new 8 core freebsd 32 bit box as well. Both are running freebsd 8.4, same as the other freebsd testers.
Comment #2 by dfj1esp02 — 2014-09-10T11:56:55Z
pthread_kill hangs? Shouldn't it be asynchronous?
Comment #3 by monarchdodra — 2014-10-10T20:29:19Z
Upgraded to "BLOCKER", as this (relativelly frequently) trips up the auto-testers.
Fairly simple to reproduce the problem.
cat > bug.d << CODE
import core.thread, core.sys.posix.pthread, core.stdc.stdio;
void loop()
{
pthread_attr_t attr;
pthread_attr_init(&attr);
auto thr = pthread_self();
while (true)
pthread_attr_get_np(thr, &attr);
}
void main()
{
auto thr = new Thread(&loop).start();
while (true)
{
thread_suspendAll();
thread_resumeAll();
printf(".");
}
}
CODE
dmd -run bug
Comment #6 by code — 2014-12-07T01:37:53Z
Using pthread_suspend_np didn't work out, because there is no way to get the current stack top of a suspended thread. I also tried to override SIGCANCEL which is used for pthread_suspend_np but that didn't work.
https://github.com/D-Programming-Language/druntime/pull/1061
Comment #7 by github-bugzilla — 2014-12-15T16:55:55Z
This fix doesn't seem to work on 9.1 i386, as the new FreeBSD test introduced in this PR hangs 90+% of the time.
Comment #10 by code — 2015-10-31T03:58:20Z
(In reply to Joakim from comment #9)
> This fix doesn't seem to work on 9.1 i386, as the new FreeBSD test
> introduced in this PR hangs 90+% of the time.
> I also tried to override SIGCANCEL
> which is used for pthread_suspend_np but that didn't work.
SIGCANCEL is the signal used by pthread_suspend_np internally.
The signal handler already deal with being in critical regions, hence it doesn't suffer from the deadlock. As it isn't allowed to overrride SIGCANCEL we imitated the behavior by poking in pthread guts (THR_IN_CRITICAL).
Comment #11 by code — 2015-10-31T03:59:43Z
(In reply to Joakim from comment #9)
> This fix doesn't seem to work on 9.1 i386, as the new FreeBSD test
> introduced in this PR hangs 90+% of the time.
Any further details? It doesn't seem like the pthread layout changed from 8.x to 9.1. Is it easy to reproduce w/ the test case of comment 5?
Comment #12 by dbugz — 2018-08-26T04:07:58Z
Sorry, only seeing your question now. I was simply checking the D tests on FreeBSD back then but I haven't used that OS in years, so can't look into it further now.
Comment #13 by dlang-bot — 2022-01-18T18:23:38Z
@ibuclaw created dlang/druntime pull request #3682 "Issue 13416: Remove libthr hack from core.thread.osthread" mentioning this issue:
- Issue 13416: Remove libthr hack from core.thread.osthread
https://github.com/dlang/druntime/pull/3682
Comment #14 by dlang-bot — 2022-01-20T12:36:15Z
dlang/druntime pull request #3682 "Issue 13416: Remove libthr hack from core.thread.osthread" was merged into master:
- 3ac665c49d7aae1893c4e4535f60d1b4e2d427a3 by Iain Buclaw:
Issue 13416: Remove libthr hack from core.thread.osthread
https://github.com/dlang/druntime/pull/3682