Bug 16856 – D does not work on FreeBSD current (what will eventually be 12) due to libunwind
Status
RESOLVED
Resolution
FIXED
Severity
critical
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
x86_64
OS
FreeBSD
Creation time
2016-11-30T03:17:21Z
Last change time
2017-08-16T13:23:06Z
Assigned to
Nemanja Boric
Creator
Jonathan M Davis
Comments
Comment #0 by issues.dlang — 2016-11-30T03:17:21Z
I know that this is a problem on TrueOS (which is based on FreeBSD current with some additional patches applied), so I'm pretty sure that this happens on FreeBSD current in general, but I still need to verify that. In any case, on FreeBSD current, many programs result in bus errors when they run. I know that it happens when an exception is thrown. I don't know why else (but enough programs fail, that I don't _think_ that it's just because of exceptions). For instance, running druntime's unit tests for 2.071.2, results in this:
../dmd/src/dmd -conf= -Isrc -Iimport -w -dip25 -m64 -g -debug -ofgenerated/freebsd/debug/64/unittest/test_runner src/test_runner.d -Lgenerated/freebsd/debug/64/unittest/libdruntime-ut.so -debuglib= -defaultlib=
generated/freebsd/debug/64/unittest/test_runner object
0x800b2b622 <_D4core7runtime18runModuleUnitTestsUZ19unittestSegvHandlerUNbiPS4core3sys5posix6signal9siginfo_tPvZv+58> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800ea984d <pthread_sigmask+1293> at /lib/libthr.so.3
0x800ea8e1f <pthread_getspecific+3743> at /lib/libthr.so.3
0x7ffffffff003 <???+0> at ???
0x800bdfe70 <_d_throwdwarf+72> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800b0bd58 <_D6object18__unittestL2628_28FZ5Inner10__postblitMFNfZv+184> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800b0c091 <_D6object18__unittestL2628_28FZ5Outer15__fieldPostblitMFNeZv+33> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800b0c15d <_D6object18__unittestL2628_28FZ5Outer14__aggrPostblitMFNeZv+21> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800b116b5 <_D6object59__T16_postblitRecurseTS6object18__unittestL2628_28FZ5OuterZ16_postblitRecurseFNfKS6object18__unittestL2628_28FZ5OuterZv+21> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800b0b7fa <_D6object18__unittestL2628_28FZv+66> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800b1410b <_D6object9__modtestFZv+139> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x401758 <???+0> at /usr/home/jmdavis/Programming/github/druntime/generated/freebsd/debug/64/unittest/test_runner
0x800b2b574 <runModuleUnitTests+172> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800bdf81e <_D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZv+30> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800bdf7ac <_D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ7tryExecMFMDFZvZv+52> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x800bdf718 <_d_run_main+808> at generated/freebsd/debug/64/unittest/libdruntime-ut.so
0x401a6c <_D4core4time48__T7convertVAyaa6_686e73656373VAyaa5_6d73656373Z7convertFNaNbNiNflZl+208> at /usr/home/jmdavis/Programming/github/druntime/generated/freebsd/debug/64/unittest/test_runner
0x40138f <???+0> at /usr/home/jmdavis/Programming/github/druntime/generated/freebsd/debug/64/unittest/test_runner
gmake[1]: *** [posix.mak:243: generated/freebsd/debug/64/unittest/object] Bus error
gmake[1]: *** Deleting file 'generated/freebsd/debug/64/unittest/object'
gmake[1]: Leaving directory '/usr/home/jmdavis/Programming/github/druntime'
gmake: *** [posix.mak:200: unittest-debug] Error 2
and running the dmd test suite results in
Creating output directory: test_results
Building d_do_test tool
OS: freebsd
0x478926 <???+0> at /var/tmp//dmd_runqjmU8P
0x80092584d <pthread_sigmask+1293> at /lib/libthr.so.3
0x800924e1f <pthread_getspecific+3743> at /lib/libthr.so.3
0x7ffffffff003 <???+0> at ???
0x459900 <???+0> at /var/tmp//dmd_runqjmU8P
0x40b32e <???+0> at /var/tmp//dmd_runqjmU8P
0x4523f9 <???+0> at /var/tmp//dmd_runqjmU8P
0x408606 <???+0> at /var/tmp//dmd_runqjmU8P
0x4526a1 <???+0> at /var/tmp//dmd_runqjmU8P
0x408570 <???+0> at /var/tmp//dmd_runqjmU8P
0x454c7a <???+0> at /var/tmp//dmd_runqjmU8P
0x47896d <???+0> at /var/tmp//dmd_runqjmU8P
0x457388 <???+0> at /var/tmp//dmd_runqjmU8P
0x45dfc3 <???+0> at /var/tmp//dmd_runqjmU8P
0x45e052 <???+0> at /var/tmp//dmd_runqjmU8P
0x45df54 <???+0> at /var/tmp//dmd_runqjmU8P
0x457364 <???+0> at /var/tmp//dmd_runqjmU8P
0x478867 <???+0> at /var/tmp//dmd_runqjmU8P
0x45973b <???+0> at /var/tmp//dmd_runqjmU8P
0x4596dd <???+0> at /var/tmp//dmd_runqjmU8P
0x45964e <???+0> at /var/tmp//dmd_runqjmU8P
0x45503a <???+0> at /var/tmp//dmd_runqjmU8P
0x4031cf <???+0> at /var/tmp//dmd_runqjmU8P
--- killed by signal 10
gmake: *** [Makefile:194: test_results/d_do_test] Error 1
Some simple programs do run successfully (like hello world), but many do not. I have verified that FreeBSD 11 passes the druntime and Phobos unit tests as well as the dmd test suite, and every version of dmd that I've tried has had this problem. So, I think that it's clear that whatever broke things was a change in FreeBSD current after FreeBSD 11 was forked off of it. What is not clear is whether this is a bug in FreeBSD or a bug in our stuff. Based on the stack traces, I'm _guessing_ that we're doing something wrong with pthreads, but I don't know. Either way, as it stands, D programs don't work currently on FreeBSD current.
When I have time, I'm going to try and at least narrow down the commit in FreeBSD which broke things, which will hopefully give better insight into the problem. I have no idea whether this problem is specific to 64-bit or not, since that's all I'm running. I would guess that it's not, but I'm also guessing that this is a druntime bug. More research is required to know for sure.
Comment #1 by issues.dlang — 2016-12-04T23:16:08Z
Okay. I've confirmed that this is a problem with FreeBSD current in general and not just TrueOS, and I've narrowed down commit in the FreeBSD source tree which broke us. Specifically, it's this one
commit d20793840b5b74acebe80ec710522f7386b452cf
Author: emaste <[email protected]>
Date: Wed Jul 27 16:01:44 2016 +0000
Enable LLVM libunwind by default on amd64 and i386
It is a maintained and updated runtime exception stack unwinder that
should be a drop-in replacement.
It can be disabled by setting WITHOUT_LLVM_LIBUNWIND in src.conf.
PR: 206039 [exp-run]
Sponsored by: The FreeBSD Foundation
and if I rebuild the OS with WITHOUT_LLVM_LIBUNWIND=1, then everything works again. So, clearly, the problem is that FreeBSD changed to using libunwind from whatever they were using before, and whatever we do with dmd and druntime is not compatible with that. The commit message implies that libunwind _should_ be compatible with what was there before, but in our case, it clearly isn't. I don't know if that's a problem with FreeBSD and it truly not being a drop-in replacement, or if we're doing something wrong that happened to work before but doesn't with libunwind, or what. Unfortunately, I know almost nothing about libunwind - just that it has to do with dealing with throwing exceptions, so I really have no idea what the problem could be or what the correct solution is (I don't even know if this involves dmd or just druntime). I'd guess though that we need to do something to become compatible with libunwind.
Since libunwind is not specifically a FreeBSD thing, this may affect something in Linux land. I don't know. But we clearly don't work with FreeBSD 12-to-be right now because of this.
Comment #2 by issues.dlang — 2016-12-05T05:50:37Z
If I compile and run this program
void main()
{
throw new Exception("blah");
}
I get a bus error, and if I run it in gdb, I get this stacktrace:
#0 0x0000000800cd91bf in _Unwind_RaiseException () from /lib/libgcc_s.so.1
#1 0x000000000042a994 in _d_throwdwarf ()
#2 0x000000000042a216 in _Dmain ()
#3 0x000000000042a827 in _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZ9__lambda1MFZv ()
#4 0x000000000042a76d in _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ7tryExecMFMDFZvZv ()
#5 0x000000000042a7e3 in _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZv ()
#6 0x000000000042a76d in _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ7tryExecMFMDFZvZv ()
#7 0x000000000042a6e7 in _d_run_main ()
#8 0x000000000042a2aa in main ()
So, I guess that whatever is going wrong relates to _Unwind_RaisException, which I suppose makes sense, since that would appear to relate to libunwind.
Comment #3 by doob — 2016-12-05T07:39:04Z
I would expect libunwind to be used on macOS, where exceptions work fine. But I'm not sure if that's the case.
Comment #4 by 4burgos — 2017-05-07T13:23:59Z
I've looked into this, and this is the alignment issue.
The faulty instruction happens here:
https://github.com/llvm-mirror/libunwind/blob/master/src/UnwindLevel1.c#L351
```
exception_object->private_1 = 0;
```
On FreeBSD-Current, this is executed as:
```
xorps xmm0, xmm0
movaps XMMWORD PTR [r14+0x10], xmm0
```
where r14 is the pointer to the Unwind_Exception (https://github.com/llvm-mirror/libunwind/blob/master/include/unwind.h#L119-L124) or D runtime part: (https://github.com/dlang/druntime/blob/master/src/rt/unwind.d#L51-L57)
```
struct _Unwind_Exception
{
align(8) _Unwind_Exception_Class exception_class;
_Unwind_Exception_Cleanup_Fn exception_cleanup;
_Unwind_Word private_1;
_Unwind_Word private_2;
}
```
or
```
struct _Unwind_Exception {
uint64_t exception_class;
void (*exception_cleanup)(_Unwind_Reason_Code reason,
_Unwind_Exception *exc);
uintptr_t private_1; // non-zero means forced unwind
uintptr_t private_2; // holds sp that phase1 found for phase2 to use
#ifndef __LP64__
// The implementation of _Unwind_Exception uses an attribute mode on the
// above fields which has the side effect of causing this whole struct to
// round up to 32 bytes in size. To be more explicit, we add pad fields
// added for binary compatibility.
uint32_t reserved[3];
#endif
// The Itanium ABI requires that _Unwind_Exception objects are "double-word
// aligned". GCC has interpreted this to mean "use the maximum useful
// alignment for the target"; so do we.
} __attribute__((__aligned__));
```
Now, this happens because `movaps` instruction requires 16-bit aligned memory, which is not the case for _Unwind_Exceptin.private_1 - where it is aligned to 8 bits.
Making D definition to align this instance (at least D-allocated) to 16 bits fixes the entire problem - exception handling works:
```
align(16)
struct _Unwind_Exception
{
_Unwind_Exception_Class exception_class;
_Unwind_Exception_Cleanup_Fn exception_cleanup;
_Unwind_Word private_1;
_Unwind_Word private_2;
}
```
Now, the problem is that I don't know how to effectively calculate the alignment and do whatever the C++ compiler would do. I guess this is necessity, because we want C++ exceptions (generated by the C++ compiler) to work.
GCC documentation is stating: https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/Type-Attributes.html
```
As in the preceding example, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given struct or union type. Alternatively, you can leave out the alignment factor and just ask the compiler to align a type to the maximum useful alignment for the target machine you are compiling for. For example, you could write:
struct S { short f[3]; } __attribute__ ((aligned));
Whenever you leave out the alignment factor in an aligned attribute specification, the compiler automatically sets the alignment for the type to the largest alignment which is ever used for any data type on the target machine you are compiling for. Doing this can often make copy operations more efficient, because the compiler can use whatever instructions copy the biggest chunks of memory when performing copies to or from the variables which have types that you have aligned this way.
In the example above, if the size of each short is 2 bytes, then the size of the entire struct S type is 6 bytes. The smallest power of two which is greater than or equal to that is 8, so the compiler sets the alignment for the entire struct S type to 8 bytes.
```
Any ideas?
Comment #5 by github-bugzilla — 2017-05-07T22:20:49Z
Commits pushed to master at https://github.com/dlang/druntimehttps://github.com/dlang/druntime/commit/c56e8e0d8d599b1742fe85210f07adacf07e5e2a
Fix issue 16856: Apply correct alignment on the Unwind_Exception structure
In libundwind, _Unwind_Exception structure is defined as follows:
```
struct _Unwind_Exception
{
uint64_t exception_class;
_Unwind_Exception_Cleanup_Fn exception_cleanup;
unsigned long private_1;
unsigned long private_2;
} __attribute__((__aligned__));
```
so the alignment is done on the entire structure, and it
depends on the architecture. This sets the structure
to be 16bit aligned on the X86_64, so the binary layout
matches and that the C++ compiler's optimizations are still
valid (for example, on FreeBSD-12, exception handling was broken
because libunwind assumes correct alignment, so the fast but fragile
instructions were used.
https://github.com/dlang/druntime/commit/c7182eb2ef3d6cc57c3e3366028753306b4dceb7
Merge pull request #1823 from Burgos/exception_alignment
Fix issue 16856: Apply correct alignment on the Unwind_Exception stru…
merged-on-behalf-of: Jonathan M Davis <[email protected]>
Comment #6 by issues.dlang — 2017-05-14T11:39:50Z
Well, while the fix seems to have improved the situation, I'm sorry to report that the unit tests for druntime and Phobos still result in bus errors on FreeBSD CURRENT. So, it looks like the fix was insufficient. :(
Comment #7 by 4burgos — 2017-05-21T07:49:12Z
Ha, that's a bummer. I'm on a holiday right now with limited access to workstation, but I'll give it a look in early June.
Comment #8 by github-bugzilla — 2017-06-17T11:34:29Z
Thanks to GitHub bot, I am reminded about this issue. I'm back and will start looking next week.
Comment #10 by 4burgos — 2017-07-08T18:17:03Z
Ok, I finally got some time to get back to this issue.
There's a sigbus really running, but this is caused by GC, because the runtime asserts in the shared library finalizers, so it seems that the instance is no longer there.
I've traced the failing druntime assert (so far, who knows
what else is waiting for us) to this particular commit in rtld.c:
https://github.com/freebsd/freebsd/commit/3ff2e66ecba2094f5c1c1efe7f2d009649527195
So, this is the problem. At the end of the program, fini()s are called for the shared library and the main executable. Then they call `_do_global_dtors_aux`, and at that point they will call _d_dso_registry, which will (the problem is here:) call dlopen (albeit with RTLD_NOLOAD) to obtain the handle for the object by name.
However, since this particular commit, this doesn't work anymore (and it's
questionable if it should work) - you can't bump a reference count of an object
that's just going to die (dlopen still bumps reference count, even with RTLD_LOAD passed).
I would guess somehow skipping dlopen calls in this scenario should be figured out.
Maybe skipping just for the current object, or maybe caching the handles when
first obtained (not sure if they can change on their own; I don't think so, but
still). I'll see to submit a PR tomorrow, now I know where the problem is.
It was quite a ride finding this out. Because first call to dlopen was failing for the
main executable, so documentation says: "use NULL if you want the main executable instead", so after doing this - I got it working, so I thought there's something
special with this path. What's interesting is that my confusion is caused by the bug
in FreeBSD's code - if the current limitation apply - don't reference "doomed" object,
one shouldn't be able to work around it by passing NULL. I'll see into sending a patch there as well.
Comment #13 by issues.dlang — 2017-07-14T18:29:26Z
I confirm that this works with the latest TrueOS, though I expect that it wouldn't work on the latest FreeBSD 12, because of the 64-bit inode issue (whereas even though TrueOS is based on FreeBSD CURRENT, it hasn't pulled in those changes yet precisely because of the breakage that they cause). That's a separate bug though: bug #17596.
Thanks!
Comment #14 by 4burgos — 2017-07-14T18:58:45Z
Thank you for writing back and you're very welcome! Thanks for
pointing out to that issue, I'll follow it closely.
Comment #15 by dlang-bugzilla — 2017-07-15T06:02:28Z
Comment #16 by issues.dlang — 2017-07-15T07:11:23Z
(In reply to Vladimir Panteleev from comment #15)
> (In reply to Jonathan M Davis from comment #13)
> > That's a separate bug though: bug #17596.
>
> Clickable link: issue 17596
>
> See the Bugzilla manual:
> https://www.bugzilla.org/docs/4.4/en/html/hintsandtips.html#idp6611456
Thanks, I can never remember the exct syntax, and different software for different sites that link in numbers for issues or posts ot other things like this tend to do it slightly differently. And since bugzilla doesn't have the ability to preview or edit your comments, it makes it kind of hard to get right unless you're lucky enough to remember correctly.
Comment #17 by dfj1esp02 — 2017-07-17T17:25:30Z
(In reply to Vladimir Panteleev from comment #15)
> (In reply to Jonathan M Davis from comment #13)
> > That's a separate bug though: bug #17596.
>
> Clickable link: issue 17596
>
> See the Bugzilla manual:
> https://www.bugzilla.org/docs/4.4/en/html/hintsandtips.html#idp6611456
Autolinkification for bug 17596 syntax used to work, but is no more.
Comment #18 by github-bugzilla — 2017-08-07T12:25:54Z