Bug 22555 – Recursively locked mutexes are not fully released by Condition.wait

Status
NEW
Severity
normal
Priority
P3
Component
druntime
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2021-11-30T11:54:38Z
Last change time
2024-12-07T13:41:33Z
Assigned to
No Owner
Creator
FeepingCreature
Moved to GitHub: dmd#17199 →

Comments

Comment #0 by default_357-line — 2021-11-30T11:54:38Z
Consider this code: import std; import core.sync.condition; import core.sync.mutex; import core.sync.semaphore; import core.thread; class A { int value; Condition condition; this() { condition = new Condition(new Mutex(this)); } void set(int i) { synchronized (this) { value = i; this.condition.notifyAll; } } void wait(int i) { synchronized (this) { while (this.value != i) { condition.wait; } } } } void main() { auto done = new Semaphore; auto a = new A; task({ synchronized (a) { writefln!"Wait for a to be 5"; a.wait(5); writefln!"Wait done."; } done.notify; }).executeInNewThread; writefln!"Let task start up."; Thread.sleep(100.msecs); writefln!"Set a to 5."; a.set(5); writefln!"Wait for task to be done."; done.wait; writefln!"Done."; } Naively we would expect this sequence of operations: Thread starts. Thread waits on a.value == 5 main sets a.value to 5 Thread finishes waiting and signals done. Main awaits done and returns. This does not happen on Posix, because pthreads recursive mutexes have a dangerous interaction with condition variables. See https://linux.die.net/man/3/pthread_cond_wait "These functions atomically release _mutex_" in conjunction with https://linux.die.net/man/3/pthread_mutex_lock "Each time the thread unlocks the mutex, the lock count shall be decremented by one. When the lock count reaches zero, the mutex shall become available for other threads to acquire." As demonstrated, the effect of this is that after pthread_cond_wait, the mutex is *not* available for other threads to acquire! This causes the example to block forever.
Comment #1 by default_357-line — 2021-11-30T14:04:21Z
Seems to happen on Linux and Windows.
Comment #2 by default_357-line — 2021-11-30T17:21:34Z
I don't see any good way in fact to fix this while Mutex is recursive. I've been playing with approaches all day, but the problem is that the interaction of pthread_cond_wait and mutexes leaves it fairly open whether the mutex is actually locked after pthread_mutex_wait has returned - and there is no way to check!
Comment #3 by stanislav.blinov — 2021-11-30T18:28:29Z
I might be wrong, but isn't what's happening here this? - task starts, locks a's monitor and blocks in wait - main thread calls set and blocks trying to lock a's monitor, which is already locked by task I.e. you've a deadlock, but not on condition variable :)
Comment #4 by stanislav.blinov — 2021-11-30T18:31:30Z
Disregard, I'm blind.
Comment #5 by stanislav.blinov — 2021-11-30T18:59:01Z
Actually, no, I'm not blind :) task { synchronized (a) // lock, counter = 1 { a.wait(5) { // `this` is a synchronized (this) // already locked, counter = 2 { // So at this point, you've recursed twice // hence main thread blocks in set() condition.wait; // counter = 1, not unlocked // counter = 2 } } } } --- synchronized(a) in task should be removed
Comment #6 by default_357-line — 2021-11-30T19:13:35Z
No, because the way conditions work is that they unlock their associated mutex while they wait as an atomic operation, ie. the condition and the mutex are linked. You can check this by removing the synchronized in the *task* in main, which should be redundant with the one in wait - but then it works. The problem is recursive mutexes interacting badly with conditions.
Comment #7 by default_357-line — 2021-11-30T19:15:43Z
Owait yes your observation is correct. But that's not something that shows up in D's semantics for Mutex; and would be annoying at any rate because D core.sync does not have non-recursive mutexes in the first place. Ie. since synchronized is supposed to allow double locking, it should still work with core.sync.Condition. D's impl leaks a pthread problem in other words.
Comment #8 by robert.schadek — 2024-12-07T13:41:33Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/17199 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB