Skip to content

Commit a5f9980

Browse files
committed
Merge tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Benjamin Marzinski: "There are fixes for some corner case crashes in dm-cache and dm-mirror, new setup functionality for dm-vdo, and miscellaneous minor fixes and cleanups, especially to dm-verity. dm-vdo: - Make dm-vdo able to format the device itself, like other dm targets, instead of needing a userspace formating program - Add some sanity checks and code cleanup dm-cache: - Fix crashes and hangs when operating in passthrough mode (which have been around, unnoticed, since 4.12), as well as a late arriving fix for an error path bug in the passthrough fix - Fix a corner case memory leak dm-verity: - Another set of minor bugfixes and code cleanups to the forward error correction code dm-mirror - Fix minor initialization bug - Fix overflow crash on a large devices with small region sizes dm-crypt - Reimplement elephant diffuser using AES library and minor cleanups dm-core: - Claude found a buffer overflow in /dev/mapper/contrl ioctl handling - make dm_mod.wait_for correctly wait for partitions - minor code fixes and cleanups" * tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits) dm cache: fix missing return in invalidate_committed's error path dm: fix a buffer overflow in ioctl processing dm-crypt: Make crypt_iv_operations::post return void dm vdo: Fix spelling mistake "postive" -> "positive" dm: provide helper to set stacked limits dm-integrity: always set the io hints dm-integrity: fix mismatched queue limits dm-bufio: use kzalloc_flex dm vdo: save the formatted metadata to disk dm vdo: add formatting logic and initialization dm vdo: add synchronous metadata I/O submission helper dm vdo: add geometry block structure dm vdo: add geometry block encoding dm vdo: add upfront validation for logical size dm vdo: add formatting parameters to table line dm vdo: add super block initialization to encodings.c dm vdo: add geometry block initialization to encodings.c dm-crypt: Make crypt_iv_operations::wipe return void dm-crypt: Reimplement elephant diffuser using AES library dm-verity-fec: warn even when there were no errors ...
2 parents f1d26d7 + 8c0ee19 commit a5f9980

73 files changed

Lines changed: 1348 additions & 860 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/admin-guide/device-mapper/verity.rst

Lines changed: 102 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -102,29 +102,42 @@ ignore_zero_blocks
102102
that are not guaranteed to contain zeroes.
103103

104104
use_fec_from_device <fec_dev>
105-
Use forward error correction (FEC) to recover from corruption if hash
106-
verification fails. Use encoding data from the specified device. This
107-
may be the same device where data and hash blocks reside, in which case
108-
fec_start must be outside data and hash areas.
105+
Use forward error correction (FEC) parity data from the specified device to
106+
try to automatically recover from corruption and I/O errors.
109107

110-
If the encoding data covers additional metadata, it must be accessible
111-
on the hash device after the hash blocks.
108+
If this option is given, then <fec_roots> and <fec_blocks> must also be
109+
given. <hash_block_size> must also be equal to <data_block_size>.
112110

113-
Note: block sizes for data and hash devices must match. Also, if the
114-
verity <dev> is encrypted the <fec_dev> should be too.
111+
<fec_dev> can be the same as <dev>, in which case <fec_start> must be
112+
outside the data area. It can also be the same as <hash_dev>, in which case
113+
<fec_start> must be outside the hash and optional additional metadata areas.
114+
115+
If the data <dev> is encrypted, the <fec_dev> should be too.
116+
117+
For more information, see `Forward error correction`_.
115118

116119
fec_roots <num>
117-
Number of generator roots. This equals to the number of parity bytes in
118-
the encoding data. For example, in RS(M, N) encoding, the number of roots
119-
is M-N.
120+
The number of parity bytes in each 255-byte Reed-Solomon codeword. The
121+
Reed-Solomon code used will be an RS(255, k) code where k = 255 - fec_roots.
122+
123+
The supported values are 2 through 24 inclusive. Higher values provide
124+
stronger error correction. However, the minimum value of 2 already provides
125+
strong error correction due to the use of interleaving, so 2 is the
126+
recommended value for most users. fec_roots=2 corresponds to an
127+
RS(255, 253) code, which has a space overhead of about 0.8%.
120128

121129
fec_blocks <num>
122-
The number of encoding data blocks on the FEC device. The block size for
123-
the FEC device is <data_block_size>.
130+
The total number of <data_block_size> blocks that are error-checked using
131+
FEC. This must be at least the sum of <num_data_blocks> and the number of
132+
blocks needed by the hash tree. It can include additional metadata blocks,
133+
which are assumed to be accessible on <hash_dev> following the hash blocks.
134+
135+
Note that this is *not* the number of parity blocks. The number of parity
136+
blocks is inferred from <fec_blocks>, <fec_roots>, and <data_block_size>.
124137

125138
fec_start <offset>
126-
This is the offset, in <data_block_size> blocks, from the start of the
127-
FEC device to the beginning of the encoding data.
139+
This is the offset, in <data_block_size> blocks, from the start of <fec_dev>
140+
to the beginning of the parity data.
128141

129142
check_at_most_once
130143
Verify data blocks only the first time they are read from the data device,
@@ -180,11 +193,6 @@ per-block basis. This allows for a lightweight hash computation on first read
180193
into the page cache. Block hashes are stored linearly, aligned to the nearest
181194
block size.
182195

183-
If forward error correction (FEC) support is enabled any recovery of
184-
corrupted data will be verified using the cryptographic hash of the
185-
corresponding data. This is why combining error correction with
186-
integrity checking is essential.
187-
188196
Hash Tree
189197
---------
190198

@@ -212,6 +220,80 @@ The tree looks something like:
212220
/ ... \ / . . . \ / \
213221
blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767
214222

223+
Forward error correction
224+
------------------------
225+
226+
dm-verity's optional forward error correction (FEC) support adds strong error
227+
correction capabilities to dm-verity. It allows systems that would be rendered
228+
inoperable by errors to continue operating, albeit with reduced performance.
229+
230+
FEC uses Reed-Solomon (RS) codes that are interleaved across the entire
231+
device(s), allowing long bursts of corrupt or unreadable blocks to be recovered.
232+
233+
dm-verity validates any FEC-corrected block against the wanted hash before using
234+
it. Therefore, FEC doesn't affect the security properties of dm-verity.
235+
236+
The integration of FEC with dm-verity provides significant benefits over a
237+
separate error correction layer:
238+
239+
- dm-verity invokes FEC only when a block's hash doesn't match the wanted hash
240+
or the block cannot be read at all. As a result, FEC doesn't add overhead to
241+
the common case where no error occurs.
242+
243+
- dm-verity hashes are also used to identify erasure locations for RS decoding.
244+
This allows correcting twice as many errors.
245+
246+
FEC uses an RS(255, k) code where k = 255 - fec_roots. fec_roots is usually 2.
247+
This means that each k (usually 253) message bytes have fec_roots (usually 2)
248+
bytes of parity data added to get a 255-byte codeword. (Many external sources
249+
call RS codewords "blocks". Since dm-verity already uses the term "block" to
250+
mean something else, we'll use the clearer term "RS codeword".)
251+
252+
FEC checks fec_blocks blocks of message data in total, consisting of:
253+
254+
1. The data blocks from the data device
255+
2. The hash blocks from the hash device
256+
3. Optional additional metadata that follows the hash blocks on the hash device
257+
258+
dm-verity assumes that the FEC parity data was computed as if the following
259+
procedure were followed:
260+
261+
1. Concatenate the message data from the above sources.
262+
2. Zero-pad to the next multiple of k blocks. Let msg be the resulting byte
263+
array, and msglen its length in bytes.
264+
3. For 0 <= i < msglen / k (for each RS codeword):
265+
a. Select msg[i + j * msglen / k] for 0 <= j < k.
266+
Consider these to be the 'k' message bytes of an RS codeword.
267+
b. Compute the corresponding 'fec_roots' parity bytes of the RS codeword,
268+
and concatenate them to the FEC parity data.
269+
270+
Step 3a interleaves the RS codewords across the entire device using an
271+
interleaving degree of data_block_size * ceil(fec_blocks / k). This is the
272+
maximal interleaving, such that the message data consists of a region containing
273+
byte 0 of all the RS codewords, then a region containing byte 1 of all the RS
274+
codewords, and so on up to the region for byte 'k - 1'. Note that the number of
275+
codewords is set to a multiple of data_block_size; thus, the regions are
276+
block-aligned, and there is an implicit zero padding of up to 'k - 1' blocks.
277+
278+
This interleaving allows long bursts of errors to be corrected. It provides
279+
much stronger error correction than storage devices typically provide, while
280+
keeping the space overhead low.
281+
282+
The cost is slow decoding: correcting a single block usually requires reading
283+
254 extra blocks spread evenly across the device(s). However, that is
284+
acceptable because dm-verity uses FEC only when there is actually an error.
285+
286+
The list below contains additional details about the RS codes used by
287+
dm-verity's FEC. Userspace programs that generate the parity data need to use
288+
these parameters for the parity data to match exactly:
289+
290+
- Field used is GF(256)
291+
- Bytes are mapped to/from GF(256) elements in the natural way, where bits 0
292+
through 7 (low-order to high-order) map to the coefficients of x^0 through x^7
293+
- Field generator polynomial is x^8 + x^4 + x^3 + x^2 + 1
294+
- The codes used are systematic, BCH-view codes
295+
- Primitive element alpha is 'x'
296+
- First consecutive root of code generator polynomial is 'x^0'
215297

216298
On-disk format
217299
==============

drivers/md/Kconfig

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,7 @@ config BLK_DEV_DM
226226
select BLOCK_HOLDER_DEPRECATED if SYSFS
227227
select BLK_DEV_DM_BUILTIN
228228
select BLK_MQ_STACKING
229+
select CRYPTO_LIB_SHA256 if IMA
229230
depends on DAX || DAX=n
230231
help
231232
Device-mapper is a low level volume manager. It works by allowing
@@ -299,6 +300,7 @@ config DM_CRYPT
299300
select CRYPTO
300301
select CRYPTO_CBC
301302
select CRYPTO_ESSIV
303+
select CRYPTO_LIB_AES
302304
select CRYPTO_LIB_MD5 # needed by lmk IV mode
303305
help
304306
This device-mapper target allows you to create a device that

drivers/md/dm-bufio.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -391,7 +391,7 @@ struct dm_buffer_cache {
391391
*/
392392
unsigned int num_locks;
393393
bool no_sleep;
394-
struct buffer_tree trees[];
394+
struct buffer_tree trees[] __counted_by(num_locks);
395395
};
396396

397397
static DEFINE_STATIC_KEY_FALSE(no_sleep_enabled);
@@ -2511,7 +2511,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
25112511
}
25122512

25132513
num_locks = dm_num_hash_locks();
2514-
c = kzalloc(sizeof(*c) + (num_locks * sizeof(struct buffer_tree)), GFP_KERNEL);
2514+
c = kzalloc_flex(*c, cache.trees, num_locks);
25152515
if (!c) {
25162516
r = -ENOMEM;
25172517
goto bad_client;

drivers/md/dm-cache-metadata.c

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1023,6 +1023,12 @@ static bool cmd_write_lock(struct dm_cache_metadata *cmd)
10231023
return; \
10241024
} while (0)
10251025

1026+
#define WRITE_LOCK_OR_GOTO(cmd, label) \
1027+
do { \
1028+
if (!cmd_write_lock((cmd))) \
1029+
goto label; \
1030+
} while (0)
1031+
10261032
#define WRITE_UNLOCK(cmd) \
10271033
up_write(&(cmd)->root_lock)
10281034

@@ -1714,17 +1720,6 @@ int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *
17141720
return r;
17151721
}
17161722

1717-
int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result)
1718-
{
1719-
int r;
1720-
1721-
READ_LOCK(cmd);
1722-
r = blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result);
1723-
READ_UNLOCK(cmd);
1724-
1725-
return r;
1726-
}
1727-
17281723
void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd)
17291724
{
17301725
WRITE_LOCK_VOID(cmd);
@@ -1791,11 +1786,8 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
17911786
new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
17921787
CACHE_MAX_CONCURRENT_LOCKS);
17931788

1794-
WRITE_LOCK(cmd);
1795-
if (cmd->fail_io) {
1796-
WRITE_UNLOCK(cmd);
1797-
goto out;
1798-
}
1789+
/* cmd_write_lock() already checks fail_io with cmd->root_lock held */
1790+
WRITE_LOCK_OR_GOTO(cmd, out);
17991791

18001792
__destroy_persistent_data_objects(cmd, false);
18011793
old_bm = cmd->bm;
@@ -1824,3 +1816,12 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
18241816

18251817
return r;
18261818
}
1819+
1820+
int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result)
1821+
{
1822+
READ_LOCK(cmd);
1823+
*result = cmd->clean_when_opened;
1824+
READ_UNLOCK(cmd);
1825+
1826+
return 0;
1827+
}

drivers/md/dm-cache-metadata.h

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -135,17 +135,17 @@ int dm_cache_get_metadata_dev_size(struct dm_cache_metadata *cmd,
135135
*/
136136
int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *p);
137137

138-
/*
139-
* Query method. Are all the blocks in the cache clean?
140-
*/
141-
int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result);
142-
143138
int dm_cache_metadata_needs_check(struct dm_cache_metadata *cmd, bool *result);
144139
int dm_cache_metadata_set_needs_check(struct dm_cache_metadata *cmd);
145140
void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd);
146141
void dm_cache_metadata_set_read_write(struct dm_cache_metadata *cmd);
147142
int dm_cache_metadata_abort(struct dm_cache_metadata *cmd);
148143

144+
/*
145+
* Query method. Was the metadata cleanly shut down when opened?
146+
*/
147+
int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result);
148+
149149
/*----------------------------------------------------------------*/
150150

151151
#endif /* DM_CACHE_METADATA_H */

drivers/md/dm-cache-policy-smq.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1589,14 +1589,18 @@ static int smq_invalidate_mapping(struct dm_cache_policy *p, dm_cblock_t cblock)
15891589
{
15901590
struct smq_policy *mq = to_smq_policy(p);
15911591
struct entry *e = get_entry(&mq->cache_alloc, from_cblock(cblock));
1592+
unsigned long flags;
15921593

15931594
if (!e->allocated)
15941595
return -ENODATA;
15951596

1597+
spin_lock_irqsave(&mq->lock, flags);
15961598
// FIXME: what if this block has pending background work?
15971599
del_queue(mq, e);
15981600
h_remove(&mq->table, e);
15991601
free_entry(&mq->cache_alloc, e);
1602+
spin_unlock_irqrestore(&mq->lock, flags);
1603+
16001604
return 0;
16011605
}
16021606

0 commit comments

Comments
 (0)