Bug 16106 – Calling a fiber from itself causes hard-to-debug stack corruption

Status
NEW
Severity
enhancement
Priority
P4
Component
druntime
Product
D
Version
D2
Platform
x86_64
OS
All
Creation time
2016-05-31T14:28:52Z
Last change time
2024-12-07T13:36:37Z
Assigned to
No Owner
Creator
Don
Moved to GitHub: dmd#17329 →

Comments

Comment #0 by clugdbug — 2016-05-31T14:28:52Z
If you are in fiber `f`, and you call `f.call()`, then you are switching the context to yourself. There is an `in` contract in Fiber.call() which is meant to prevent this: assert( m_state == State.HOLD ); That assert will fail, because the fiber is running. But if contracts are disabled, then execution will continue anyway. Conceptually, switching context to yourself is a no-op. The function could simply return. I'm not sure that would be a good idea, but it's certainly possible. And it *almost* behaves that way. The function fiber_switchContext() pushes the registers onto the existing stack, then pops them from the new stack. In this case, the old stack and new stack are the same. Except that the new stack pointer is the top of the new stack *before the pushes were made*. So, it loads the registers from the completely wrong place. If you're lucky, you get a segfault. If you don't, you'll end up in a completely unrelated place. Either way it is quite difficult to diagnose why it has happened. Admittedly, this only happens after an 'in' contract violation. But I think we should do something a bit more robust. The check for ``m_state == State.HOLD`` is not expensive (one CMP and a predictable branch). It should be moved out of the contract into the function body. The consequences of a totally corrupt stack are _extremely_ severe. It's worth sacrificing a single clock cycle to prevent this debugging nightmare.
Comment #1 by clugdbug — 2016-05-31T15:04:36Z
To clarify: I'm asking for the situation where you call a fiber which is not in state HOLD, to be detected even when compiled without contracts. I'm not asking for fiber.call() to ever be a no-op. Ie, fiber.call() should either switch to a different fiber, or else it should halt execution. It should never cause stack corruption.
Comment #2 by dlang-bugzilla — 2017-07-02T02:37:12Z
A minimal test case exhibiting the problem would be nice!
Comment #3 by robert.schadek — 2024-12-07T13:36:37Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/17329 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB