md/raid5: skip 2-failure compute when other disk is R5_LOCKED

FengWeiShih · hailan94 · commit 52e4324935be · 2026-03-22T09:57:33.000+08:00
When skip_copy is enabled on a doubly-degraded RAID6, a device that is being written to will be in R5_LOCKED state with R5_UPTODATE cleared. If a new read triggers fetch_block() while the write is still in flight, the 2-failure compute path may select this locked device as a compute target because it is not R5_UPTODATE. Because skip_copy makes the device page point directly to the bio page, reconstructing data into it might be risky. Also, since the compute marks the device R5_UPTODATE, it triggers WARN_ON in ops_run_io() which checks that R5_SkipCopy and R5_UPTODATE are not both set. This can be reproduced by running small-range concurrent read/write on a doubly-degraded RAID6 with skip_copy enabled, for example: mdadm -C /dev/md0 -l6 -n6 -R -f /dev/loop[0-3] missing missing echo 1 > /sys/block/md0/md/skip_copy fio --filename=/dev/md0 --rw=randrw --bs=4k --numjobs=8 \ --iodepth=32 --size=4M --runtime=30 --time_based --direct=1 Fix by checking R5_LOCKED before proceeding with the compute. The compute will be retried once the lock is cleared on IO completion. Signed-off-by: FengWei Shih <dannyshih@synology.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com> Link: https://lore.kernel.org/linux-raid/20260319053351.3676794-1-dannyshih@synology.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
@@ -3916,6 +3916,8 @@ static int fetch_block(struct stripe_head *sh, struct stripe_head_state *s,
 					break;
 			}
 			BUG_ON(other < 0);
+			if (test_bit(R5_LOCKED, &sh->dev[other].flags))
+				return 0;
 			pr_debug("Computing stripe %llu blocks %d,%d\n",
 			       (unsigned long long)sh->sector,
 			       disk_idx, other);

Original file line number	Diff line number	Diff line change
`@@ -3916,6 +3916,8 @@ static int fetch_block(struct stripe_head sh, struct stripe_head_state s,`
`3916`	`3916`	`break;`
`3917`	`3917`	`}`
`3918`	`3918`	`BUG_ON(other < 0);`
	`3919`	`+ if (test_bit(R5_LOCKED, &sh->dev[other].flags))`
	`3920`	`+ return 0;`
`3919`	`3921`	`pr_debug("Computing stripe %llu blocks %d,%d\n",`
`3920`	`3922`	`(unsigned long long)sh->sector,`
`3921`	`3923`	`disk_idx, other);`