Bug 13726 – Build Phobos and Druntime with stack frames enabled (-gs)

Status
RESOLVED
Resolution
WONTFIX
Severity
enhancement
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2014-11-13T01:31:00Z
Last change time
2017-07-02T02:01:30Z
Assigned to
nobody
Creator
dlang-bugzilla
See also
https://issues.dlang.org/show_bug.cgi?id=8841

Comments

Comment #0 by dlang-bugzilla — 2014-11-13T01:31:25Z
Background: enabling stack frames causes the compiler to emit a few extra prologue/epilogue instructions for each function which save the stack pointer right near the function return address. This creates a linked list, which, when traversed, provides information for each function's address in the call stack (and where within the function it invokes the next function), and the size of its stack frame. Stack frames are a good debugging aid and have a minimal impact on performance. I suggest to build Phobos and Druntime with these enabled. An example use case where stack frames are useful is debugging InvalidMemoryOperation errors. These errors are currently difficult to debug. Consider the following program: //////////// test.d //////////// import core.memory; class C { ~this() { new ubyte[1024]; } } void faulty() { foreach (n; 0..1000) new C(); GC.collect(); } void main() { faulty(); } //////////////////////////////// This program allocates in a destructor, which is not allowed. When ran, it only prints core.exception.InvalidMemoryOperationError@(0), because a stack trace is purposefully not allocated for memory operations (and even if it was, it would still be incorrect, as seen below). When the program is launched from a debugger, on Windows, the runtime will disable its own exception handler, and the program will break on the point where the InvalidMemoryOperationError is thrown. However, the stack trace will look as follows: > KernelBase.dll!_RaiseException@16() + 0x58 bytes test.exe!_D2rt3deh9throwImplFC6ObjectZv() + 0x1a bytes test.exe!_onInvalidMemoryOperationError() + 0xc bytes test.exe!_gc_malloc() + 0x1e bytes test.exe!_D2gc3gcx3Gcx11fullcollectMFZk() + 0x617 bytes test.exe!_D2gc3gcx2GC11fullCollectMFZk() + 0x45 bytes Because _gc_malloc does not create a stack frame, the stack trace is corrupt - it incorrectly shows that Gcx.fullcollect invokes gc_malloc. The stack trace ends abruptly at GC.fullCollect, and does not contain any useful information on why the problem occurred. If Phobos and Druntime are rebuilt with -gs, we see the following stack trace instead: KernelBase.dll!_RaiseException@16() + 0x58 bytes test.exe!_D2rt9deh_win329throwImplFC6ObjectZv() + 0x23 bytes test.exe!__d_throwc() + 0xc bytes test.exe!_D2gc2gc2GC12mallocNoSyncMFNbkkKkxC8TypeInfoZPv() + 0x1f bytes test.exe!_D2gc2gc2GC6mallocMFNbkkPkxC8TypeInfoZPv() + 0x51 bytes test.exe!_gc_malloc() + 0x21 bytes test.exe!__d_newclass() + 0x66 bytes test.exe!_rt_finalize2() + 0xee bytes test.exe!_D2gc2gc3Gcx11fullcollectMFNbZk() + 0x8c8 bytes test.exe!_D2gc2gc2GC11fullCollectMFNbZk() + 0x1f bytes test.exe!_gc_collect() + 0x16 bytes test.exe!_D4core6memory2GC7collectFNbZv() + 0x8 bytes > test.exe!test.faulty() Line 13 C test.exe!D main() Line 17 + 0x5 bytes C (rest omitted) We don't see the destructor in the stack trace because of issue 13723. With issue 13725 fixed, onInvalidMemoryOperationError can be breakpointed instead (this approach also requires stack frames for a useful stack trace).
Comment #1 by dlang-bugzilla — 2014-11-13T01:32:45Z
Comment #2 by bugzilla — 2014-11-13T04:05:40Z
Every bit of performance matters. D is constantly being compared for speed with other tools. Building phobos for maximum debugging support is completely at odds with performance.
Comment #3 by dlang-bugzilla — 2014-11-13T10:41:58Z
(In reply to Walter Bright from comment #2) > Every bit of performance matters. D is constantly being compared for speed > with other tools. One of the (if not the) most common criticism(s) for D has always been the maturity of its tool chain. You can't concentrate on bare performance while jeopardizing everything else. This issue and pull requests are a follow-up to a #d IRC user whose program was crashing due to the above-mentioned invalid memory operations. I assisted them and tried to get them to run the program in a debugger and breakpoint onInvalidMemoryOperationError, which didn't help. I suggested that he used Digger to build a version of D with stack frames enabled, and even though this was simpler than setting up the source code, patching makefiles and invoking DMC make, IIRC at that point the user just gave up on D and left the channel. > Building phobos for maximum debugging support is completely at odds with > performance. As I said, stack frames have a minimum impact on performance. They are a very common instruction sequence, so I suspect modern CPUs recognize and optimize their execution. I believe the release versions of the Microsoft C/C++ runtime are also built with stack frames enabled. Here are the times for running the Phobos unittests (best of 10): 30.180 seconds - Currently (no -gs) 30.211 seconds - With -gs added The difference (31ms) is 0.1%, and at this scale and sample size, it might very well be noise. That said, I think it would be OK to remove stack frames from the release version of Phobos once we start shipping a debug version which works automatically with -debug via -debuglib. BTW, speaking of performance, Phobos is currently built without -inline. However, enabling it does not produce any visible effect when running the benchmarks.
Comment #4 by code — 2014-11-13T23:20:05Z
(In reply to Walter Bright from comment #2) > Every bit of performance matters. D is constantly being compared for speed > with other tools. > > Building phobos for maximum debugging support is completely at odds with > performance. I agree with Walter here. Instead we should link a debug version of phobos when building with -debug.
Comment #5 by code — 2014-11-13T23:21:52Z
It's also possible to improve the exception unwinding to use additional info [1] for functions without a stack frame. [1]: http://wiki.dwarfstd.org/index.php?title=Exception_Handling
Comment #6 by dlang-bugzilla — 2014-11-13T23:43:58Z
Martin, do you have a benchmark for which the performance difference is non-negligible? I've been building D from source code on my server for a while now, just to enable stack frames. They are invaluable for debugging, I use them for production all the time. I am confident that at the moment, D will benefit more from stack frames in Phobos/Druntime than 0.1% in performance.
Comment #7 by dlang-bugzilla — 2014-11-13T23:50:35Z
(In reply to Vladimir Panteleev from comment #3) > That said, I think it would be OK to remove stack frames from the release > version of Phobos once we start shipping a debug version which works > automatically with -debug via -debuglib. Thinking about this more, I think this isn't the right approach. A debug build of Phobos/Druntime (with optimizations/inlining disabled and/or assertions enabled) would definitely have a huge performance impact. We just need stack frames, so that the user/debugger can read any stack traces that go through Phobos. (In reply to Martin Nowak from comment #5) > It's also possible to improve the exception unwinding to use additional info > [1] for functions without a stack frame. > > [1]: http://wiki.dwarfstd.org/index.php?title=Exception_Handling I don't see how this is relevant. This is not about exception unwinding, but seeing a stack trace in debugger or Druntime output so you know why your program crashed.
Comment #8 by code — 2014-11-14T01:32:19Z
(In reply to Vladimir Panteleev from comment #7) > > [1]: http://wiki.dwarfstd.org/index.php?title=Exception_Handling > > I don't see how this is relevant. This is not about exception unwinding Yes it is, C++ exception unwinding uses DWARF format for most ELF architectures. The encode in great detail how to unwind from the stack, which is a requirement for nearly-zero-overhead exceptions. For example you don't have to move all registers to the stack like dmd does currently.
Comment #9 by dlang-bugzilla — 2014-11-14T01:37:57Z
I don't understand... I'm not sure what the scheme is on POSIX, but on Win32, exception frames and stack frames are completely separate (they form two distinct linked lists which do not overlap). Last I checked, gdb was not satisfied with just -g, you still need to rebuild Phobos/Druntime with -gs to get proper stack traces. I still don't understand what this has to do with exception handling. In either case, that only affects POSIX. The problem remains on Windows (despite PDB files supposedly having enough information for proper stack traces without stack frames).
Comment #10 by dfj1esp02 — 2014-11-17T09:03:06Z
(In reply to Martin Nowak from comment #8) > The encode in great detail how to unwind from the stack, which is a > requirement for nearly-zero-overhead exceptions. For example you don't have > to move all registers to the stack like dmd does currently. AFAIK, DWARF works on function level, but ability to find stack without frame pointer requires working on instruction level. It's also completely ignorant about hardware exceptions, which happen on instruction level too.
Comment #11 by code — 2014-11-29T15:55:40Z
(In reply to Vladimir Panteleev from comment #9) > I don't understand... I'm not sure what the scheme is on POSIX, but on > Win32, exception frames and stack frames are completely separate (they form > two distinct linked lists which do not overlap). > https://gnu.wildebeest.org/blog/mjw/2007/08/23/stack-unwinding/
Comment #12 by code — 2014-11-29T16:01:19Z
(In reply to Vladimir Panteleev from comment #6) > Martin, do you have a benchmark for which the performance difference is > non-negligible? It would affect any leaf function that is not inlined. For example this would kill what Manu had in mind for std.simd, a module where each function consists of a few SIMD instrustions. > I've been building D from source code on my server for a while now, just to > enable stack frames. They are invaluable for debugging, I use them for > production all the time. I am confident that at the moment, D will benefit > more from stack frames in Phobos/Druntime than 0.1% in performance. It's 0.1% performance loss for you but maybe 1 or 2% for someone else. There is so much we can do to improve stack traces, but enforcing stack frames is a performance trade-off.
Comment #13 by dlang-bugzilla — 2015-01-25T00:02:59Z
(In reply to Martin Nowak from comment #12) > There is so much we can do to improve stack traces, but enforcing stack > frames is a performance trade-off. On Windows, breakpointing onInvalidMemoryOperationError does not result in a readable stack trace when using Visual Studio with neither Win32, Win32+cv2pdb, or Win64. This means that we are not emitting debug information that can replace stack frames for either CV or PDB formats.
Comment #14 by dlang-bugzilla — 2015-01-25T00:05:40Z
(In reply to Vladimir Panteleev from comment #0) > An example use case where stack frames are useful is debugging > InvalidMemoryOperation errors. Right now, there are TWO threads in the front page of digitalmars.D.learn asking about this error. Nobody can give good advice on reliably determining what exactly causes it. I'm writing a wiki page with a guide on how to use Digger to rebuild D so you can get a good stack trace, but right now the situation is miserable.
Comment #15 by dfj1esp02 — 2015-01-27T09:29:17Z
(In reply to Walter Bright from comment #2) > Every bit of performance matters. D is constantly being compared for speed > with other tools. > > Building phobos for maximum debugging support is completely at odds with > performance. Not for debugging, but for decent newbie user experience.