Skip to content

Commit 10d20bd

Browse files
committed
shmem: fix shm fallocate() list corruption
The shmem hole punching with fallocate(FALLOC_FL_PUNCH_HOLE) does not want to race with generating new pages by faulting them in. However, the wait-queue used to delay the page faulting has a serious problem: the wait queue head (in shmem_fallocate()) is allocated on the stack, and the code expects that "wake_up_all()" will make sure that all the queue entries are gone before the stack frame is de-allocated. And that is not at all necessarily the case. Yes, a normal wake-up sequence will remove the wait-queue entry that caused the wakeup (see "autoremove_wake_function()"), but the key wording there is "that caused the wakeup". When there are multiple possible wakeup sources, the wait queue entry may well stay around. And _particularly_ in a page fault path, we may be faulting in new pages from user space while we also have other things going on, and there may well be other pending wakeups. So despite the "wake_up_all()", it's not at all guaranteed that all list entries are removed from the wait queue head on the stack. Fix this by introducing a new wakeup function that removes the list entry unconditionally, even if the target process had already woken up for other reasons. Use that "synchronous" function to set up the waiters in shmem_fault(). This problem has never been seen in the wild afaik, but Dave Jones has reported it on and off while running trinity. We thought we fixed the stack corruption with the blk-mq rq_list locking fix (commit 7fe3113: "blk-mq: update hardware and software queues for sleeping alloc"), but it turns out there was _another_ stack corruptor hiding in the trinity runs. Vegard Nossum (also running trinity) was able to trigger this one fairly consistently, and made us look once again at the shmem code due to the faults often being in that area. Reported-and-tested-by: Vegard Nossum <vegard.nossum@oracle.com>. Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent d9d0452 commit 10d20bd

1 file changed

Lines changed: 14 additions & 1 deletion

File tree

mm/shmem.c

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1848,6 +1848,18 @@ alloc_nohuge: page = shmem_alloc_and_acct_page(gfp, info, sbinfo,
18481848
return error;
18491849
}
18501850

1851+
/*
1852+
* This is like autoremove_wake_function, but it removes the wait queue
1853+
* entry unconditionally - even if something else had already woken the
1854+
* target.
1855+
*/
1856+
static int synchronous_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key)
1857+
{
1858+
int ret = default_wake_function(wait, mode, sync, key);
1859+
list_del_init(&wait->task_list);
1860+
return ret;
1861+
}
1862+
18511863
static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
18521864
{
18531865
struct inode *inode = file_inode(vma->vm_file);
@@ -1883,7 +1895,7 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
18831895
vmf->pgoff >= shmem_falloc->start &&
18841896
vmf->pgoff < shmem_falloc->next) {
18851897
wait_queue_head_t *shmem_falloc_waitq;
1886-
DEFINE_WAIT(shmem_fault_wait);
1898+
DEFINE_WAIT_FUNC(shmem_fault_wait, synchronous_wake_function);
18871899

18881900
ret = VM_FAULT_NOPAGE;
18891901
if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) &&
@@ -2665,6 +2677,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
26652677
spin_lock(&inode->i_lock);
26662678
inode->i_private = NULL;
26672679
wake_up_all(&shmem_falloc_waitq);
2680+
WARN_ON_ONCE(!list_empty(&shmem_falloc_waitq.task_list));
26682681
spin_unlock(&inode->i_lock);
26692682
error = 0;
26702683
goto out;

0 commit comments

Comments
 (0)