Skip to content

Commit 1f6ee9b

Browse files
Fei Lvamir73il
authored andcommitted
ovl: make fsync after metadata copy-up opt-in mount option
Commit 7d6899f ("ovl: fsync after metadata copy-up") was done to fix durability of overlayfs copy up on an upper filesystem which does not enforce ordering on storing of metadata changes (e.g. ubifs). In an earlier revision of the regressing commit by Lei Lv, the metadata fsync behavior was opt-in via a new "fsync=strict" mount option. We were hoping that the opt-in mount option could be avoided, so the change was only made to depend on metacopy=off, in the hope of not hurting performance of metadata heavy workloads, which are more likely to be using metacopy=on. This hope was proven wrong by a performance regression report from Google COS workload after upgrade to kernel 6.12. This is an adaptation of Lei's original "fsync=strict" mount option to the existing upstream code. The new mount option is mutually exclusive with the "volatile" mount option, so the latter is now an alias to the "fsync=volatile" mount option. Reported-by: Chenglong Tang <chenglongtang@google.com> Closes: https://lore.kernel.org/linux-unionfs/CAOdxtTadAFH01Vui1FvWfcmQ8jH1O45owTzUcpYbNvBxnLeM7Q@mail.gmail.com/ Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxgKC1SgjMWre=fUb00v8rxtd6sQi-S+dxR8oDzAuiGu8g@mail.gmail.com/ Fixes: 7d6899f ("ovl: fsync after metadata copy-up") Depends: 50e638b ("ovl: Use str_on_off() helper in ovl_show_options()") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Fei Lv <feilv@asrmicro.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
1 parent 53a7c17 commit 1f6ee9b

6 files changed

Lines changed: 104 additions & 15 deletions

File tree

Documentation/filesystems/overlayfs.rst

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -783,6 +783,56 @@ controlled by the "uuid" mount option, which supports these values:
783783
mounted with "uuid=on".
784784

785785

786+
Durability and copy up
787+
----------------------
788+
789+
The fsync(2) system call ensures that the data and metadata of a file
790+
are safely written to the backing storage, which is expected to
791+
guarantee the existence of the information post system crash.
792+
793+
Without an fsync(2) call, there is no guarantee that the observed
794+
data after a system crash will be either the old or the new data, but
795+
in practice, the observed data after crash is often the old or new data
796+
or a mix of both.
797+
798+
When an overlayfs file is modified for the first time, copy up will
799+
create a copy of the lower file and its parent directories in the upper
800+
layer. Since the Linux filesystem API does not enforce any particular
801+
ordering on storing changes without explicit fsync(2) calls, in case
802+
of a system crash, the upper file could end up with no data at all
803+
(i.e. zeros), which would be an unusual outcome. To avoid this
804+
experience, overlayfs calls fsync(2) on the upper file before completing
805+
data copy up with rename(2) or link(2) to make the copy up "atomic".
806+
807+
By default, overlayfs does not explicitly call fsync(2) on copied up
808+
directories or on metadata-only copy up, so it provides no guarantee to
809+
persist the user's modification unless the user calls fsync(2).
810+
The fsync during copy up only guarantees that if a copy up is observed
811+
after a crash, the observed data is not zeroes or intermediate values
812+
from the copy up staging area.
813+
814+
On traditional local filesystems with a single journal (e.g. ext4, xfs),
815+
fsync on a file also persists the parent directory changes, because they
816+
are usually modified in the same transaction, so metadata durability during
817+
data copy up effectively comes for free. Overlayfs further limits risk by
818+
disallowing network filesystems as upper layer.
819+
820+
Overlayfs can be tuned to prefer performance or durability when storing
821+
to the underlying upper layer. This is controlled by the "fsync" mount
822+
option, which supports these values:
823+
824+
- "auto": (default)
825+
Call fsync(2) on upper file before completion of data copy up.
826+
No explicit fsync(2) on directory or metadata-only copy up.
827+
- "strict":
828+
Call fsync(2) on upper file and directories before completion of any
829+
copy up.
830+
- "volatile": [*]
831+
Prefer performance over durability (see `Volatile mount`_)
832+
833+
[*] The mount option "volatile" is an alias to "fsync=volatile".
834+
835+
786836
Volatile mount
787837
--------------
788838

fs/overlayfs/copy_up.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1146,15 +1146,15 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
11461146
return -EOVERFLOW;
11471147

11481148
/*
1149-
* With metacopy disabled, we fsync after final metadata copyup, for
1149+
* With "fsync=strict", we fsync after final metadata copyup, for
11501150
* both regular files and directories to get atomic copyup semantics
11511151
* on filesystems that do not use strict metadata ordering (e.g. ubifs).
11521152
*
1153-
* With metacopy enabled we want to avoid fsync on all meta copyup
1153+
* By default, we want to avoid fsync on all meta copyup, because
11541154
* that will hurt performance of workloads such as chown -R, so we
11551155
* only fsync on data copyup as legacy behavior.
11561156
*/
1157-
ctx.metadata_fsync = !OVL_FS(dentry->d_sb)->config.metacopy &&
1157+
ctx.metadata_fsync = ovl_should_sync_metadata(OVL_FS(dentry->d_sb)) &&
11581158
(S_ISREG(ctx.stat.mode) || S_ISDIR(ctx.stat.mode));
11591159
ctx.metacopy = ovl_need_meta_copy_up(dentry, ctx.stat.mode, flags);
11601160

fs/overlayfs/overlayfs.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,12 @@ enum {
9999
OVL_VERITY_REQUIRE,
100100
};
101101

102+
enum {
103+
OVL_FSYNC_VOLATILE,
104+
OVL_FSYNC_AUTO,
105+
OVL_FSYNC_STRICT,
106+
};
107+
102108
/*
103109
* The tuple (fh,uuid) is a universal unique identifier for a copy up origin,
104110
* where:
@@ -656,6 +662,21 @@ static inline bool ovl_xino_warn(struct ovl_fs *ofs)
656662
return ofs->config.xino == OVL_XINO_ON;
657663
}
658664

665+
static inline bool ovl_should_sync(struct ovl_fs *ofs)
666+
{
667+
return ofs->config.fsync_mode != OVL_FSYNC_VOLATILE;
668+
}
669+
670+
static inline bool ovl_should_sync_metadata(struct ovl_fs *ofs)
671+
{
672+
return ofs->config.fsync_mode == OVL_FSYNC_STRICT;
673+
}
674+
675+
static inline bool ovl_is_volatile(struct ovl_config *config)
676+
{
677+
return config->fsync_mode == OVL_FSYNC_VOLATILE;
678+
}
679+
659680
/*
660681
* To avoid regressions in existing setups with overlay lower offline changes,
661682
* we allow lower changes only if none of the new features are used.

fs/overlayfs/ovl_entry.h

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ struct ovl_config {
1818
int xino;
1919
bool metacopy;
2020
bool userxattr;
21-
bool ovl_volatile;
21+
int fsync_mode;
2222
};
2323

2424
struct ovl_sb {
@@ -120,11 +120,6 @@ static inline struct ovl_fs *OVL_FS(struct super_block *sb)
120120
return (struct ovl_fs *)sb->s_fs_info;
121121
}
122122

123-
static inline bool ovl_should_sync(struct ovl_fs *ofs)
124-
{
125-
return !ofs->config.ovl_volatile;
126-
}
127-
128123
static inline unsigned int ovl_numlower(struct ovl_entry *oe)
129124
{
130125
return oe ? oe->__numlower : 0;

fs/overlayfs/params.c

Lines changed: 28 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ enum ovl_opt {
5858
Opt_xino,
5959
Opt_metacopy,
6060
Opt_verity,
61+
Opt_fsync,
6162
Opt_volatile,
6263
Opt_override_creds,
6364
};
@@ -140,6 +141,23 @@ static int ovl_verity_mode_def(void)
140141
return OVL_VERITY_OFF;
141142
}
142143

144+
static const struct constant_table ovl_parameter_fsync[] = {
145+
{ "volatile", OVL_FSYNC_VOLATILE },
146+
{ "auto", OVL_FSYNC_AUTO },
147+
{ "strict", OVL_FSYNC_STRICT },
148+
{}
149+
};
150+
151+
static const char *ovl_fsync_mode(struct ovl_config *config)
152+
{
153+
return ovl_parameter_fsync[config->fsync_mode].name;
154+
}
155+
156+
static int ovl_fsync_mode_def(void)
157+
{
158+
return OVL_FSYNC_AUTO;
159+
}
160+
143161
const struct fs_parameter_spec ovl_parameter_spec[] = {
144162
fsparam_string_empty("lowerdir", Opt_lowerdir),
145163
fsparam_file_or_string("lowerdir+", Opt_lowerdir_add),
@@ -155,6 +173,7 @@ const struct fs_parameter_spec ovl_parameter_spec[] = {
155173
fsparam_enum("xino", Opt_xino, ovl_parameter_xino),
156174
fsparam_enum("metacopy", Opt_metacopy, ovl_parameter_bool),
157175
fsparam_enum("verity", Opt_verity, ovl_parameter_verity),
176+
fsparam_enum("fsync", Opt_fsync, ovl_parameter_fsync),
158177
fsparam_flag("volatile", Opt_volatile),
159178
fsparam_flag_no("override_creds", Opt_override_creds),
160179
{}
@@ -665,8 +684,11 @@ static int ovl_parse_param(struct fs_context *fc, struct fs_parameter *param)
665684
case Opt_verity:
666685
config->verity_mode = result.uint_32;
667686
break;
687+
case Opt_fsync:
688+
config->fsync_mode = result.uint_32;
689+
break;
668690
case Opt_volatile:
669-
config->ovl_volatile = true;
691+
config->fsync_mode = OVL_FSYNC_VOLATILE;
670692
break;
671693
case Opt_userxattr:
672694
config->userxattr = true;
@@ -800,6 +822,7 @@ int ovl_init_fs_context(struct fs_context *fc)
800822
ofs->config.nfs_export = ovl_nfs_export_def;
801823
ofs->config.xino = ovl_xino_def();
802824
ofs->config.metacopy = ovl_metacopy_def;
825+
ofs->config.fsync_mode = ovl_fsync_mode_def();
803826

804827
fc->s_fs_info = ofs;
805828
fc->fs_private = ctx;
@@ -870,9 +893,9 @@ int ovl_fs_params_verify(const struct ovl_fs_context *ctx,
870893
config->index = false;
871894
}
872895

873-
if (!config->upperdir && config->ovl_volatile) {
896+
if (!config->upperdir && ovl_is_volatile(config)) {
874897
pr_info("option \"volatile\" is meaningless in a non-upper mount, ignoring it.\n");
875-
config->ovl_volatile = false;
898+
config->fsync_mode = ovl_fsync_mode_def();
876899
}
877900

878901
if (!config->upperdir && config->uuid == OVL_UUID_ON) {
@@ -1070,8 +1093,8 @@ int ovl_show_options(struct seq_file *m, struct dentry *dentry)
10701093
seq_printf(m, ",xino=%s", ovl_xino_mode(&ofs->config));
10711094
if (ofs->config.metacopy != ovl_metacopy_def)
10721095
seq_printf(m, ",metacopy=%s", str_on_off(ofs->config.metacopy));
1073-
if (ofs->config.ovl_volatile)
1074-
seq_puts(m, ",volatile");
1096+
if (ofs->config.fsync_mode != ovl_fsync_mode_def())
1097+
seq_printf(m, ",fsync=%s", ovl_fsync_mode(&ofs->config));
10751098
if (ofs->config.userxattr)
10761099
seq_puts(m, ",userxattr");
10771100
if (ofs->config.verity_mode != ovl_verity_mode_def())

fs/overlayfs/super.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -776,7 +776,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
776776
* For volatile mount, create a incompat/volatile/dirty file to keep
777777
* track of it.
778778
*/
779-
if (ofs->config.ovl_volatile) {
779+
if (ovl_is_volatile(&ofs->config)) {
780780
err = ovl_create_volatile_dirty(ofs);
781781
if (err < 0) {
782782
pr_err("Failed to create volatile/dirty file.\n");

0 commit comments

Comments
 (0)