Skip to content

Commit 6282675

Browse files
adam900710kdave
authored andcommitted
btrfs: relocation: fix reloc_root lifespan and access
[BUG] There are several different KASAN reports for balance + snapshot workloads. Involved call paths include: should_ignore_root+0x54/0xb0 [btrfs] build_backref_tree+0x11af/0x2280 [btrfs] relocate_tree_blocks+0x391/0xb80 [btrfs] relocate_block_group+0x3e5/0xa00 [btrfs] btrfs_relocate_block_group+0x240/0x4d0 [btrfs] btrfs_relocate_chunk+0x53/0xf0 [btrfs] btrfs_balance+0xc91/0x1840 [btrfs] btrfs_ioctl_balance+0x416/0x4e0 [btrfs] btrfs_ioctl+0x8af/0x3e60 [btrfs] do_vfs_ioctl+0x831/0xb10 create_reloc_root+0x9f/0x460 [btrfs] btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs] create_pending_snapshot+0xa9b/0x15f0 [btrfs] create_pending_snapshots+0x111/0x140 [btrfs] btrfs_commit_transaction+0x7a6/0x1360 [btrfs] btrfs_mksubvol+0x915/0x960 [btrfs] btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs] btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs] btrfs_ioctl+0x241b/0x3e60 [btrfs] do_vfs_ioctl+0x831/0xb10 btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs] create_pending_snapshot+0x209/0x15f0 [btrfs] create_pending_snapshots+0x111/0x140 [btrfs] btrfs_commit_transaction+0x7a6/0x1360 [btrfs] btrfs_mksubvol+0x915/0x960 [btrfs] btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs] btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs] btrfs_ioctl+0x241b/0x3e60 [btrfs] do_vfs_ioctl+0x831/0xb10 [CAUSE] All these call sites are only relying on root->reloc_root, which can undergo btrfs_drop_snapshot(), and since we don't have real refcount based protection to reloc roots, we can reach already dropped reloc root, triggering KASAN. [FIX] To avoid such access to unstable root->reloc_root, we should check BTRFS_ROOT_DEAD_RELOC_TREE bit first. This patch introduces wrappers that provide the correct way to check the bit with memory barriers protection. Most callers don't distinguish merged reloc tree and no reloc tree. The only exception is should_ignore_root(), as merged reloc tree can be ignored, while no reloc tree shouldn't. [CRITICAL SECTION ANALYSIS] Although test_bit()/set_bit()/clear_bit() doesn't imply a barrier, the DEAD_RELOC_TREE bit has extra help from transaction as a higher level barrier, the lifespan of root::reloc_root and DEAD_RELOC_TREE bit are: NULL: reloc_root is NULL PTR: reloc_root is not NULL 0: DEAD_RELOC_ROOT bit not set DEAD: DEAD_RELOC_ROOT bit set (NULL, 0) Initial state __ | /\ Section A btrfs_init_reloc_root() \/ | __ (PTR, 0) reloc_root initialized /\ | | btrfs_update_reloc_root() | Section B | | (PTR, DEAD) reloc_root has been merged \/ | __ === btrfs_commit_transaction() ==================== | /\ clean_dirty_subvols() | | | Section C (NULL, DEAD) reloc_root cleanup starts \/ | __ btrfs_drop_snapshot() /\ | | Section D (NULL, 0) Back to initial state \/ Every have_reloc_root() or test_bit(DEAD_RELOC_ROOT) caller holds transaction handle, so none of such caller can cross transaction boundary. In Section A, every caller just found no DEAD bit, and grab reloc_root. In the cross section A-B, caller may get no DEAD bit, but since reloc_root is still completely valid thus accessing reloc_root is completely safe. No test_bit() caller can cross the boundary of Section B and Section C. In Section C, every caller found the DEAD bit, so no one will access reloc_root. In the cross section C-D, either caller gets the DEAD bit set, avoiding access reloc_root no matter if it's safe or not. Or caller get the DEAD bit cleared, then access reloc_root, which is already NULL, nothing will be wrong. The memory write barriers are between the reloc_root updates and bit set/clear, the pairing read side is before test_bit. Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Fixes: d2311e6 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ barriers ] Signed-off-by: David Sterba <dsterba@suse.com>
1 parent 26ef849 commit 6282675

1 file changed

Lines changed: 46 additions & 5 deletions

File tree

fs/btrfs/relocation.c

Lines changed: 46 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -517,6 +517,34 @@ static int update_backref_cache(struct btrfs_trans_handle *trans,
517517
return 1;
518518
}
519519

520+
static bool reloc_root_is_dead(struct btrfs_root *root)
521+
{
522+
/*
523+
* Pair with set_bit/clear_bit in clean_dirty_subvols and
524+
* btrfs_update_reloc_root. We need to see the updated bit before
525+
* trying to access reloc_root
526+
*/
527+
smp_rmb();
528+
if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
529+
return true;
530+
return false;
531+
}
532+
533+
/*
534+
* Check if this subvolume tree has valid reloc tree.
535+
*
536+
* Reloc tree after swap is considered dead, thus not considered as valid.
537+
* This is enough for most callers, as they don't distinguish dead reloc root
538+
* from no reloc root. But should_ignore_root() below is a special case.
539+
*/
540+
static bool have_reloc_root(struct btrfs_root *root)
541+
{
542+
if (reloc_root_is_dead(root))
543+
return false;
544+
if (!root->reloc_root)
545+
return false;
546+
return true;
547+
}
520548

521549
static int should_ignore_root(struct btrfs_root *root)
522550
{
@@ -525,6 +553,10 @@ static int should_ignore_root(struct btrfs_root *root)
525553
if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
526554
return 0;
527555

556+
/* This root has been merged with its reloc tree, we can ignore it */
557+
if (reloc_root_is_dead(root))
558+
return 1;
559+
528560
reloc_root = root->reloc_root;
529561
if (!reloc_root)
530562
return 0;
@@ -1439,7 +1471,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
14391471
* The subvolume has reloc tree but the swap is finished, no need to
14401472
* create/update the dead reloc tree
14411473
*/
1442-
if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
1474+
if (reloc_root_is_dead(root))
14431475
return 0;
14441476

14451477
if (root->reloc_root) {
@@ -1478,8 +1510,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
14781510
struct btrfs_root_item *root_item;
14791511
int ret;
14801512

1481-
if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) ||
1482-
!root->reloc_root)
1513+
if (!have_reloc_root(root))
14831514
goto out;
14841515

14851516
reloc_root = root->reloc_root;
@@ -1489,6 +1520,11 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
14891520
if (fs_info->reloc_ctl->merge_reloc_tree &&
14901521
btrfs_root_refs(root_item) == 0) {
14911522
set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
1523+
/*
1524+
* Mark the tree as dead before we change reloc_root so
1525+
* have_reloc_root will not touch it from now on.
1526+
*/
1527+
smp_wmb();
14921528
__del_reloc_root(reloc_root);
14931529
}
14941530

@@ -2201,6 +2237,11 @@ static int clean_dirty_subvols(struct reloc_control *rc)
22012237
if (ret2 < 0 && !ret)
22022238
ret = ret2;
22032239
}
2240+
/*
2241+
* Need barrier to ensure clear_bit() only happens after
2242+
* root->reloc_root = NULL. Pairs with have_reloc_root.
2243+
*/
2244+
smp_wmb();
22042245
clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
22052246
btrfs_put_fs_root(root);
22062247
} else {
@@ -4718,7 +4759,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
47184759
struct btrfs_root *root = pending->root;
47194760
struct reloc_control *rc = root->fs_info->reloc_ctl;
47204761

4721-
if (!root->reloc_root || !rc)
4762+
if (!rc || !have_reloc_root(root))
47224763
return;
47234764

47244765
if (!rc->merge_reloc_tree)
@@ -4752,7 +4793,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
47524793
struct reloc_control *rc = root->fs_info->reloc_ctl;
47534794
int ret;
47544795

4755-
if (!root->reloc_root || !rc)
4796+
if (!rc || !have_reloc_root(root))
47564797
return 0;
47574798

47584799
rc = root->fs_info->reloc_ctl;

0 commit comments

Comments
 (0)