Add buffer allocation support for Confidential Computing (CoCo) guests#1748
Open
jpirko wants to merge 21 commits into
Open
Add buffer allocation support for Confidential Computing (CoCo) guests#1748jpirko wants to merge 21 commits into
jpirko wants to merge 21 commits into
Conversation
96e3e82 to
80243ff
Compare
added 21 commits
June 8, 2026 13:00
…oCo) guests In a Confidential Computing (CoCo) guest, guest memory is encrypted, and a device that cannot access encrypted memory requires DMA bounce buffering. For RDMA to work in such guests, the DMA buffers (both provider-internal buffers and user data buffers) must live in unprotected/shared (decrypted) memory. This PR adds the infrastructure to allocate such buffers from a `system_cc_shared` DMA-buf heap and wires it into the mlx5 provider. Using shared memory is strictly opt-in and only takes effect when the device reports that it needs DMA bounce; otherwise normal memory is used and behavior is unchanged. - `struct ibv_buf`: a common buffer descriptor (owning pd, addr, size, optional dmabuf fd) that providers embed in their internal buffer structures, with `ibv_buf_init()` helpers. - An internal DMA-buf heap allocator (`libibverbs/dmabuf_heap.c`) backed by the `system_cc_shared` heap. - `IBV_DEVICE_CC_DMA_BOUNCE` device capability flag, reported by the kernel when the device is in a CoCo guest and requires DMA bounce. - `IBV_PARENT_DOMAIN_INIT_ATTR_ALLOW_CC_UNPROTECTED_ALLOC` parent-domain flag for applications to opt-in to unprotected/shared allocation. - New provider-aware buffer verbs `ibv_alloc_buf()`, `ibv_free_buf()` and `ibv_reg_buf_mr()`, plus the backing `alloc_buf`/`free_buf` provider ops. - Per-buffer dmabuf UMEM plumbing: `ibv_cmd_create_qp_ex3()` with a driver attribute chain and a per-buffer UMEM helper. - man pages and pyverbs enum bindings. - Adopt `struct ibv_buf` for internal buffer allocations. - When a parent domain is created with `ALLOW_CC_UNPROTECTED_ALLOC` and the device reports `CC_DMA_BOUNCE`, allocate all provider-internal buffers (CQ, QP, SRQ, RWQ, doorbell records) from the dmabuf heap through the existing preferred-allocation path. - Pass per-buffer dmabuf UMEM descriptors to the kernel on QP/CQ creation. - Implement the `ibv_alloc_buf`/`ibv_free_buf` ops. - `rc_pingpong`: add `-U`/`--allow-cc-unprotected` to allocate the data buffer and MR from unprotected/shared memory via a parent domain and the new `ibv_alloc_buf()`/`ibv_free_buf()`/`ibv_reg_buf_mr()` helpers. - Kernel headers are refreshed in the first commit; this depends on the matching kernel UAPI support. - Adds new public symbols (`ibv_alloc_buf`, `ibv_free_buf`, `ibv_reg_buf_mr`). Signed-off-by: Jiri Pirko <jiri@nvidia.com>
To commit: d7a40b519497 ("RDMA/uverbs: Expose CoCo DMA bounce
requirement to userspace").
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add struct ibv_buf with addr and size fields to provide a common abstraction for buffer metadata that providers can embed in their internal buffer structures. Introduce an init helper alongside. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add a flag the kernel sets when the device is in a CoCo guest and requires DMA bounce buffering. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add a dmabuf sub-struct with an fd field to struct ibv_buf so that providers can store the DMA-buf file descriptor directly in the common buffer abstraction. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add alloc_buf/free_buf provider ops and corresponding helpers to allow applications to allocate buffers using the provider's configured allocation method. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add ibv_reg_buf_mr() to register memory returned by ibv_alloc_buf(). Register plain buffers with ibv_reg_mr(). Register DMA-buf backed buffers with ibv_reg_dmabuf_mr() after validating the offset against the stored buffer metadata. Reject PD mismatches so a buffer is registered only with the protection domain that allocated it. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add CoCo unprotected alloc flag for applications to opt-in to unprotected/shared memory allocation via parent domain creation. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add a man page for ibv_alloc_buf(), ibv_free_buf() and ibv_reg_buf_mr(). Signed-off-by: Jiri Pirko <jiri@nvidia.com>
The new umem UAPI gives every dmabuf-backed buffer (CQ ring, QP main/RQ/SQ, mlx5 doorbell record) its own ioctl attribute of type UVERBS_ATTR_UMEM whose payload is a single struct ib_uverbs_buffer_desc. Add a per-attribute helper, fill_attr_in_buf_umem(), that providers call once per buffer they want to register through the kernel. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Mirror ibv_cmd_create_cq_ex2() by introducing a new private QP create helper that accepts a driver command-buffer chain. Providers use the chain to attach driver-namespace and per-buffer UMEM attributes (UVERBS_ATTR_CREATE_QP_*_BUF_UMEM) to the create-QP ioctl. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add dmabuf_heap.c/h including a set of internal helpers for providers to allocate memory from Linux DMA-buf heaps (/dev/dma_heap/<name>). Add a helper to initialize "system_cc_shared" heap allocator. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Replace the separate void *buf and size_t length fields in struct mlx5_buf with an embedded struct ibv_buf, aligning the provider's internal buffer representation with the common ibv_buf abstraction. Link struct mlx5_buf to the ibv_buf API (pd, addr, size) instead of maintaining duplicate fields. Add a pd argument to the internal allocation helpers and initialize the embedded buffer through ibv_buf_init(), so each buffer records its owning protection domain. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
When a parent domain is created with ALLOW_CC_UNPROTECTED_ALLOC and the device reports CC_DMA_BOUNCE, open a dmabuf heap and use it for all provider-internal buffer allocations (CQ, QP, SRQ, RWQ, doorbell records) through the existing preferred allocation path. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Implement mlx5_alloc_buf_op()/mlx5_free_buf_op() using the existing struct mlx5_buf based allocation infrastructure. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Wire CQ and QP create paths to the new per-attribute UMEM UAPI: emit a struct ib_uverbs_buffer_desc for each dmabuf-backed buffer (CQ ring, QP main/SQ, doorbell record) on the driver_attrs chain via fill_attr_in_buf_umem(). Signed-off-by: Jiri Pirko <jiri@nvidia.com>
In preparation for the follow-up patch, move ctx->buf allocation later in pp_init_ctx(). Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add -U/--allow-cc-unprotected option for running in CoCo guests if device DMA requires unprotected/shared memory. When set, create a parent domain with ALLOW_CC_UNPROTECTED_ALLOC and use ibv_alloc_buf()/ibv_free_buf()/ibv_reg_buf_mr() for MR buffer allocation and registration. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add a comp_mask argument to ParentDomainInitAttr so callers can request IBV_PARENT_DOMAIN_INIT_ATTR_ALLOW_CC_UNPROTECTED_ALLOC, the opt-in used by DMA-bounce devices on Confidential Computing (CoCo) guests. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Wrap the provider-aware buffer API (ibv_alloc_buf, ibv_free_buf, ibv_reg_buf_mr) with new Buf and BufMR classes. Buf owns a buffer allocated through a PD and is tracked in a per-PD weakset, so it is torn down before the PD it belongs to. BufMR registers an MR over a (sub)range of a Buf and deregisters it on close without freeing the underlying buffer. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add test_buf.py exercising the ibv_buf API (ibv_alloc_buf, ibv_reg_buf_mr, ibv_free_buf) over both a plain PD and a parent domain created with ALLOW_CC_UNPROTECTED_ALLOC, in API-only and RC/UD traffic variants. Add an is_cq_ex option to the rdma_traffic and atomic_traffic helpers so the tests can drive an extended CQ. Signed-off-by: Jiri Pirko <jiri@nvidia.com>
80243ff to
4f7eff7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
In a Confidential Computing (CoCo) guest, guest memory is encrypted, and a
device that cannot access encrypted memory requires DMA bounce buffering. For
RDMA to work in such guests, the DMA buffers (both provider-internal buffers
and user data buffers) must live in unprotected/shared (decrypted) memory.
This PR adds the infrastructure to allocate such buffers from a
system_cc_sharedDMA-buf heap and wires it into the mlx5 provider. Using shared memory is
strictly opt-in and only takes effect when the device reports that it needs
DMA bounce; otherwise normal memory is used and behavior is unchanged.
libibverbs core
struct ibv_buf: a common buffer descriptor (owning pd, addr, size, optionaldmabuf fd) that providers embed in their internal buffer structures, with
ibv_buf_init()helpers.libibverbs/dmabuf_heap.c) backed by thesystem_cc_sharedheap.IBV_DEVICE_CC_DMA_BOUNCEdevice capability flag, reported by the kernel whenthe device is in a CoCo guest and requires DMA bounce.
IBV_PARENT_DOMAIN_INIT_ATTR_ALLOW_CC_UNPROTECTED_ALLOCparent-domain flag forapplications to opt-in to unprotected/shared allocation.
ibv_alloc_buf(),ibv_free_buf()andibv_reg_buf_mr(), plus the backingalloc_buf/free_bufprovider ops.ibv_cmd_create_qp_ex3()with a driverattribute chain and a per-buffer UMEM helper.
mlx5 provider
struct ibv_buffor internal buffer allocations.ALLOW_CC_UNPROTECTED_ALLOCand thedevice reports
CC_DMA_BOUNCE, allocate all provider-internal buffers(CQ, QP, SRQ, RWQ, doorbell records) from the dmabuf heap through the existing
preferred-allocation path.
ibv_alloc_buf/ibv_free_bufops.Example
rc_pingpong: add-U/--allow-cc-unprotectedto allocate the data bufferand MR from unprotected/shared memory via a parent domain and the new
ibv_alloc_buf()/ibv_free_buf()/ibv_reg_buf_mr()helpers.Notes
kernel UAPI support.
ibv_alloc_buf,ibv_free_buf,ibv_reg_buf_mr).