Bug 21919 – darwin: SEGV in core.thread tests on OSX 11

Status
RESOLVED
Resolution
FIXED
Severity
major
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
x86_64
OS
Mac OS X
Creation time
2021-05-13T12:37:17Z
Last change time
2021-11-22T14:14:15Z
Keywords
pull
Assigned to
No Owner
Creator
Iain Buclaw
See also
https://issues.dlang.org/show_bug.cgi?id=15779

Comments

Comment #0 by ibuclaw — 2021-05-13T12:37:17Z
Native configuration is x86_64-apple-darwin20 === gdc tests === Running target unix FAIL: gdc.test/runnable/eh.d execution test FAIL: gdc.test/runnable/eh.d -O2 execution test FAIL: gdc.test/runnable/eh.d -O2 -fPIC execution test FAIL: gdc.test/runnable/eh.d -O2 -fPIC -shared-libphobos execution test FAIL: gdc.test/runnable/eh.d -O2 -shared-libphobos execution test FAIL: gdc.test/runnable/eh.d -fPIC execution test FAIL: gdc.test/runnable/eh.d -fPIC -shared-libphobos execution test FAIL: gdc.test/runnable/eh.d -shared-libphobos execution test FAIL: gdc.test/runnable/test4.d execution test FAIL: gdc.test/runnable/test4.d -shared-libphobos execution test FAIL: gdc.test/runnable/testdstress.d execution test FAIL: gdc.test/runnable/testdstress.d -shared-libphobos execution test === gdc Summary === # of expected passes 10388 # of unexpected failures 12 === libphobos tests === Running target unix FAIL: libphobos.druntime/core/thread.d execution test FAIL: libphobos.exceptions/chain.d execution test FAIL: libphobos.phobos/std/concurrency.d execution test === libphobos Summary === # of expected passes 394 # of unexpected failures 3 # of unsupported tests 1
Comment #1 by ibuclaw — 2021-05-13T12:37:45Z
Confirmed on DMD when running the unittests. generated/osx/release/64/unittest/test_runner core.thread.threadgroup make[1]: *** [generated/osx/release/64/unittest/core/thread/fiber] Bus error: 10 make[1]: *** Deleting file `generated/osx/release/64/unittest/core/thread/fiber' make[1]: *** Waiting for unfinished jobs.... generated/osx/release/64/unittest/test_runner core.thread.types make: *** [unittest-release] Error 2
Comment #2 by ibuclaw — 2021-05-13T12:38:12Z
$ sw_vers ProductName: macOS ProductVersion: 11.1 BuildVersion: 20C69 $ clang --version Apple clang version 12.0.0 (clang-1200.0.32.27) $ xcodebuild -version Xcode 12.2 Build version 12B45b $ uname -v Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64 druntime: a79bb0eb0424f77159eb72e1c527db3b2ae2a57d dmd: 97aa2ae5ee19ce6a2979ca1627479df713f99252
Comment #3 by ibuclaw — 2021-05-13T12:38:38Z
Process 65900 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x100c8cc70) frame #0: 0x00007fff203794e0 libsystem_pthread.dylib`___chkstk_darwin + 96 libsystem_pthread.dylib`___chkstk_darwin: -> 0x7fff203794e0 <+96>: testq %rcx, (%rcx) 0x7fff203794e3 <+99>: popq %rcx 0x7fff203794e4 <+100>: retq libsystem_pthread.dylib`pthread_getspecific: 0x7fff203794e5 <+0>: movq %gs:(,%rdi,8), %rax Target 0: (test_runner) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x100c8cc70) * frame #0: 0x00007fff203794e0 libsystem_pthread.dylib`___chkstk_darwin + 96 frame #1: 0x00007fff20379480 libsystem_pthread.dylib`thread_start + 20 frame #2: 0x00007fff2a542a9c libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step() + 476 frame #3: 0x00007fff2a5446ee libunwind.dylib`_Unwind_RaiseException + 189 frame #4: 0x00000001001bbeb8 test_runner`_d_throwdwarf at dwarfeh.d:317 frame #5: 0x0000000100188b84 test_runner`_D4core6thread5fiber19__unittest_L1679_C1FZ9__lambda1MFNaNbNfZv at fiber.d:1686 frame #6: 0x000000010018fb85 test_runner`_D4core6thread7context8Callable6opCallMFZv at context.d:46 frame #7: 0x0000000100187ac5 test_runner`_D4core6thread5fiber5Fiber3runMFZv at fiber.d:869 frame #8: 0x000000010018749f test_runner`fiber_entryPoint at fiber.d:157
Comment #4 by ibuclaw — 2021-05-13T12:39:33Z
This was discovered in December, hence the git commit hashes are 5 months old.
Comment #5 by ibuclaw — 2021-05-13T14:34:27Z
To describe what looks like is happening: 1. A D fiber context switch occurs. 2. An exception is thrown. 3. libunwind's entry point for raising exceptions is called. 4. Segfault somewhere deep in libc/pthread. The unittest block that matches the encoded line numbers in the function name is: --- // Test exception handling inside fibers. unittest { enum MSG = "Test message."; string caughtMsg; (new Fiber({ try { throw new Exception(MSG); } catch (Exception e) { caughtMsg = e.msg; } })).call(); assert(caughtMsg == MSG); }
Comment #6 by lio+bugzilla — 2021-09-11T23:13:58Z
I suspect I'm running into this same bug while running the DMD 2.097.2 test suite on Big Sur: $ test_results/runnable/test15779_0 fish: “test_results/runnable/test15779…” terminated by signal SIGBUS (Misaligned address error) $ lldb test_results/runnable/test15779_0 (lldb) target create "test_results/runnable/test15779_0" Current executable set to '/Users/llunesu/repos/d/dmd/test/test_results/runnable/test15779_0' (x86_64). (lldb) r Process 854 launched: '/Users/llunesu/repos/d/dmd/test/test_results/runnable/test15779_0' (x86_64) Process 854 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1001edc60) frame #0: 0x00007fff2031b4a8 libsystem_pthread.dylib`___chkstk_darwin + 96 libsystem_pthread.dylib`___chkstk_darwin: -> 0x7fff2031b4a8 <+96>: testq %rcx, (%rcx) 0x7fff2031b4ab <+99>: popq %rcx 0x7fff2031b4ac <+100>: retq libsystem_pthread.dylib`pthread_getspecific: 0x7fff2031b4ad <+0>: movq %gs:(,%rdi,8), %rax Target 0: (test15779_0) stopped. (lldb)
Comment #7 by lio+bugzilla — 2021-09-11T23:15:21Z
Stack trace for previous crash: (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1001e9c60) * frame #0: 0x00007fff2031b4a8 libsystem_pthread.dylib`___chkstk_darwin + 96 frame #1: 0x00007fff2031b448 libsystem_pthread.dylib`thread_start + 20 frame #2: 0x00007fff2a4bfb2d libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::getInfoFromDwarfSection(unsigned long, libunwind::UnwindInfoSections const&, unsigned int) + 191 frame #3: 0x00007fff2a4bfa01 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister(bool) + 999 frame #4: 0x00007fff2a4c1ec9 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step() + 461 frame #5: 0x00007fff2a4c3a18 libunwind.dylib`_Unwind_RaiseException + 189 frame #6: 0x000000010002fca5 test15779_0`_d_throwdwarf + 185 frame #7: 0x0000000100002410 test15779_0`_D9test157793barFZ9__lambda1FNaNfZv + 80 frame #8: 0x000000010002cc2f test15779_0`_D4core6thread7context8Callable6opCallMFZv + 27 frame #9: 0x00000001000293a7 test15779_0`fiber_entryPoint + 99
Comment #8 by lio+bugzilla — 2021-09-11T23:17:42Z
$ sw_vers ProductName: macOS ProductVersion: 11.5.2 BuildVersion: 20G95 $ clang --version Apple clang version 12.0.5 (clang-1205.0.22.11) Target: x86_64-apple-darwin20.6.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin $ xcodebuild -version Xcode 12.5.1 Build version 12E507 $ uname -v Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 dmd, druntime, Phobos: tag v2.097.2
Comment #9 by ibuclaw — 2021-11-07T18:44:18Z
Done some prodding around, and the root cause is darwin's libunwind now overflows the Fiber's small 16kb stack. Fix then is to bump the stack allocated for Fibers. version (Windows) // exception handling walks the stack, invoking DbgHelp.dll which // needs up to 16k of stack space depending on the version of DbgHelp.dll, // the existence of debug symbols and other conditions. Avoid causing // stack overflows by defaulting to a larger stack size enum defaultStackPages = 8; + else version (OSX) + { + version (X86_64) + enum defaultStackPages = 8; + else + enum defaultStackPages = 4; + } else enum defaultStackPages = 4; Darwin x86 pagesize is 4k, whilst arm64 is 16k, so this fix should only be applied to 64-bit code.
Comment #10 by dlang-bot — 2021-11-07T22:13:25Z
@ibuclaw updated dlang/druntime pull request #3612 "fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11" fixing this issue: - fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11 https://github.com/dlang/druntime/pull/3612
Comment #11 by dlang-bot — 2021-11-08T06:33:49Z
dlang/druntime pull request #3612 "fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11" was merged into stable: - ad6583ff842694a07ecb0464aaf5efde13f5c67c by Iain Buclaw: fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11 https://github.com/dlang/druntime/pull/3612
Comment #12 by dlang-bot — 2021-11-08T15:54:11Z
dlang/druntime pull request #3615 "Merge `stable` in `mater`" was merged into master: - 17f51ab99725494c449257a89637e827721becde by Iain Buclaw: fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11 https://github.com/dlang/druntime/pull/3615
Comment #13 by ibuclaw — 2021-11-22T14:14:15Z
*** Issue 22025 has been marked as a duplicate of this issue. ***