Skip to content

Commit dbb2605

Browse files
KAGA-KOKOingomolnar
authored andcommitted
locking/rtmutex: Prevent dequeue vs. unlock race
David reported a futex/rtmutex state corruption. It's caused by the following problem: CPU0 CPU1 CPU2 l->owner=T1 rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 | HAS_WAITERS; enqueue(T2) boost() unlock(l->wait_lock) schedule() rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 | HAS_WAITERS; enqueue(T3) boost() unlock(l->wait_lock) schedule() signal(->T2) signal(->T3) lock(l->wait_lock) dequeue(T2) deboost() unlock(l->wait_lock) lock(l->wait_lock) dequeue(T3) ===> wait list is now empty deboost() unlock(l->wait_lock) lock(l->wait_lock) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; l->owner = owner ==> l->owner = T1 } lock(l->wait_lock) rt_mutex_unlock(l) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; cmpxchg(l->owner, T1, NULL) ===> Success (l->owner = NULL) l->owner = owner ==> l->owner = T1 } That means the problem is caused by fixup_rt_mutex_waiters() which does the RMW to clear the waiters bit unconditionally when there are no waiters in the rtmutexes rbtree. This can be fatal: A concurrent unlock can release the rtmutex in the fastpath because the waiters bit is not set. If the cmpxchg() gets in the middle of the RMW operation then the previous owner, which just unlocked the rtmutex is set as the owner again when the write takes place after the successfull cmpxchg(). The solution is rather trivial: verify that the owner member of the rtmutex has the waiters bit set before clearing it. This does not require a cmpxchg() or other atomic operations because the waiters bit can only be set and cleared with the rtmutex wait_lock held. It's also safe against the fast path unlock attempt. The unlock attempt via cmpxchg() will either see the bit set and take the slowpath or see the bit cleared and release it atomically in the fastpath. It's remarkable that the test program provided by David triggers on ARM64 and MIPS64 really quick, but it refuses to reproduce on x86-64, while the problem exists there as well. That refusal might explain that this got not discovered earlier despite the bug existing from day one of the rtmutex implementation more than 10 years ago. Thanks to David for meticulously instrumenting the code and providing the information which allowed to decode this subtle problem. Reported-by: David Daney <ddaney@caviumnetworks.com> Tested-by: David Daney <david.daney@cavium.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Fixes: 23f78d4 ("[PATCH] pi-futex: rt mutex core") Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent 2513940 commit dbb2605

1 file changed

Lines changed: 66 additions & 2 deletions

File tree

kernel/locking/rtmutex.c

Lines changed: 66 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,72 @@ static inline void clear_rt_mutex_waiters(struct rt_mutex *lock)
6565

6666
static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
6767
{
68-
if (!rt_mutex_has_waiters(lock))
69-
clear_rt_mutex_waiters(lock);
68+
unsigned long owner, *p = (unsigned long *) &lock->owner;
69+
70+
if (rt_mutex_has_waiters(lock))
71+
return;
72+
73+
/*
74+
* The rbtree has no waiters enqueued, now make sure that the
75+
* lock->owner still has the waiters bit set, otherwise the
76+
* following can happen:
77+
*
78+
* CPU 0 CPU 1 CPU2
79+
* l->owner=T1
80+
* rt_mutex_lock(l)
81+
* lock(l->lock)
82+
* l->owner = T1 | HAS_WAITERS;
83+
* enqueue(T2)
84+
* boost()
85+
* unlock(l->lock)
86+
* block()
87+
*
88+
* rt_mutex_lock(l)
89+
* lock(l->lock)
90+
* l->owner = T1 | HAS_WAITERS;
91+
* enqueue(T3)
92+
* boost()
93+
* unlock(l->lock)
94+
* block()
95+
* signal(->T2) signal(->T3)
96+
* lock(l->lock)
97+
* dequeue(T2)
98+
* deboost()
99+
* unlock(l->lock)
100+
* lock(l->lock)
101+
* dequeue(T3)
102+
* ==> wait list is empty
103+
* deboost()
104+
* unlock(l->lock)
105+
* lock(l->lock)
106+
* fixup_rt_mutex_waiters()
107+
* if (wait_list_empty(l) {
108+
* l->owner = owner
109+
* owner = l->owner & ~HAS_WAITERS;
110+
* ==> l->owner = T1
111+
* }
112+
* lock(l->lock)
113+
* rt_mutex_unlock(l) fixup_rt_mutex_waiters()
114+
* if (wait_list_empty(l) {
115+
* owner = l->owner & ~HAS_WAITERS;
116+
* cmpxchg(l->owner, T1, NULL)
117+
* ===> Success (l->owner = NULL)
118+
*
119+
* l->owner = owner
120+
* ==> l->owner = T1
121+
* }
122+
*
123+
* With the check for the waiter bit in place T3 on CPU2 will not
124+
* overwrite. All tasks fiddling with the waiters bit are
125+
* serialized by l->lock, so nothing else can modify the waiters
126+
* bit. If the bit is set then nothing can change l->owner either
127+
* so the simple RMW is safe. The cmpxchg() will simply fail if it
128+
* happens in the middle of the RMW because the waiters bit is
129+
* still set.
130+
*/
131+
owner = READ_ONCE(*p);
132+
if (owner & RT_MUTEX_HAS_WAITERS)
133+
WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
70134
}
71135

72136
/*

0 commit comments

Comments
 (0)