Bug 9242 – Add stack stomping code to flush out heisenbugs
Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-12-29T16:42:28Z
Last change time
2020-03-21T03:56:37Z
Assigned to
No Owner
Creator
Walter Bright
Comments
Comment #0 by bugzilla — 2012-12-29T16:42:28Z
We've lately had some very hard to track down heisenbugs that ultimately turned out to be references to stack frames that have gone out of scope. This particularly is happening when there are bugs in the lambda implementation, but it is quite possible that such can still happen with user code.
It's not possible to always detect these at runtime, but their incidence can be reduced, and bugs should be easier to track down because those references will not randomly appear to work.
The first part is to replace the stack frame cleanup code:
mov ESP,EBP
pop EBP
ret
with:
call __stack_frame_smash
mov ESP,EBP
pop EBP
ret
What __stack_frame_smash does is:
1. set all memory [ESP..EBP] to something like 0xDEADBEEF
2. set to 0xDEADBEEF all registers that are not guaranteed to be preserved
across function calls.
Unfortunately, this won't smash the parameter stack, and it can't because the callee cannot know how many parameters are on that stack (according to the ABI). But, ya can't have everything.
The second part is, when a pointer, reference, dynamic array, or delegate is returned from a function, add the following code to the epilog before the call to __stack_frame_smash:
cmp EAX,EBP
ja Ok
cmp EAX,ESP
jb Ok
halt
Ok:
or EDX in the case of dynamic arrays. This will halt the machine if a pointer into the deallocated stack frame is returned.
Insertion of this code is done if the -gh switch is thrown.
Comment #1 by bearophile_hugs — 2012-12-29T17:28:45Z
(In reply to comment #0)
All this sounds very nice.
> Insertion of this code is done if the -gh switch is thrown.
Maybe it's better to name/syntax it differently, to make it more future-proof in case we'll want to add other related runtime safeties.
Example: in FreePascal there is an option for stack checking:
http://www.freepascal.org/docs-html/prog/progsu101.html#x108-1090001.2.25
Comment #2 by bearophile_hugs — 2012-12-29T18:02:38Z
(In reply to comment #1)
> Maybe it's better to name/syntax it differently, to make it more future-proof
> in case we'll want to add other related runtime safeties.
A simpler possibility is not add a switch, and just turn on this feature (and the feature in Issue 9243 ) when the "-debug" switch is used.
Comment #3 by issues.dlang — 2012-12-29T18:08:16Z
> A simpler possibility is not add a switch, and just turn on this feature (and
the feature in Issue 9243 ) when the "-debug" switch is used.
Let's not overload the -debug flag. All it does is enable debug blocks which are used primarily for debug output. That's fundamentally different from something like stack smashing, and depending, someone might actually want this feature with -release, and while you _can_ use -debug with -release, you generally don't want to, and I wouldn't expect anyone who wanted this feature with -release to also want debug blocks to be enabled.
Comment #4 by bearophile_hugs — 2013-01-01T05:46:06Z
(In reply to comment #1)
> Example: in FreePascal there is an option for stack checking:
Stack overflow checking is common on CPUs with no virtual memory, such as 16 bit DOS. Virtual memory systems get stack overflow checking "for free", by marking the memory page beyond the end of the stack as neither readable nor writeable. Then the hardware does the check for you.
The stack smashing thing is completely different.
Comment #7 by bearophile_hugs — 2013-01-24T01:33:36Z
(In reply to comment #6)
> Stack overflow checking is common on CPUs with no virtual memory, such as 16
> bit DOS. Virtual memory systems get stack overflow checking "for free", by
> marking the memory page beyond the end of the stack as neither readable nor
> writeable. Then the hardware does the check for you.
OK. (I'd like an error message plus a stack trace when D programs overflow the stack.)
> The stack smashing thing is completely different.
I am aware of this.
See also those pages on StackGuard, that is a third different thing.
Comment #8 by bugzilla — 2013-01-24T02:10:33Z
(In reply to comment #7)
> OK. (I'd like an error message plus a stack trace when D programs overflow the
> stack.)
Why not try it and see what happens? In any case, discussing stack overflow here is not the right place, as this issue has nothing in common with it.
> See also those pages on StackGuard, that is a third different thing.
I know what stackguard is. This is not stackguard, and has nothing to do with it.
Comment #9 by bearophile_hugs — 2013-01-24T18:24:04Z
(In reply to comment #8)
> Why not try it and see what happens?
I have just tried this code with and without -gx, and on Windows32 the program segfaults with no error message and no stack trace:
import std.c.stdio;
void recurse(in uint i=0) {
printf("%u ", i);
recurse(i + 1);
}
void main() {
recurse();
}
> In any case, discussing stack overflow
> here is not the right place, as this issue has nothing in common with it.
OK.
Comment #10 by bugzilla — 2013-01-24T21:59:02Z
(In reply to comment #9)
> (In reply to comment #8)
>
> > Why not try it and see what happens?
>
> I have just tried this code with and without -gx,
What do I need to do to convince you that stack overflow has nothing at all to do with what -gx does?
> and on Windows32 the program
> segfaults with no error message and no stack trace:
And it seg faults because the stack runs into the guard page.
A stack trace would be problematic, because there'd be thousands of entries, all the same. Generally, once you run out of stack, there's not a whole lot of function calling you can do to do more processing.