Skip to content

Add buffer allocation support for Confidential Computing (CoCo) guests#1748

Open
jpirko wants to merge 21 commits into
linux-rdma:masterfrom
jpirko:wip_umem_attrs_and_cc
Open

Add buffer allocation support for Confidential Computing (CoCo) guests#1748
jpirko wants to merge 21 commits into
linux-rdma:masterfrom
jpirko:wip_umem_attrs_and_cc

Conversation

@jpirko

@jpirko jpirko commented Jun 2, 2026

Copy link
Copy Markdown

Summary

In a Confidential Computing (CoCo) guest, guest memory is encrypted, and a
device that cannot access encrypted memory requires DMA bounce buffering. For
RDMA to work in such guests, the DMA buffers (both provider-internal buffers
and user data buffers) must live in unprotected/shared (decrypted) memory.

This PR adds the infrastructure to allocate such buffers from a system_cc_shared
DMA-buf heap and wires it into the mlx5 provider. Using shared memory is
strictly opt-in and only takes effect when the device reports that it needs
DMA bounce; otherwise normal memory is used and behavior is unchanged.

libibverbs core

  • struct ibv_buf: a common buffer descriptor (owning pd, addr, size, optional
    dmabuf fd) that providers embed in their internal buffer structures, with
    ibv_buf_init() helpers.
  • An internal DMA-buf heap allocator (libibverbs/dmabuf_heap.c) backed by the
    system_cc_shared heap.
  • IBV_DEVICE_CC_DMA_BOUNCE device capability flag, reported by the kernel when
    the device is in a CoCo guest and requires DMA bounce.
  • IBV_PARENT_DOMAIN_INIT_ATTR_ALLOW_CC_UNPROTECTED_ALLOC parent-domain flag for
    applications to opt-in to unprotected/shared allocation.
  • New provider-aware buffer verbs ibv_alloc_buf(), ibv_free_buf() and
    ibv_reg_buf_mr(), plus the backing alloc_buf/free_buf provider ops.
  • Per-buffer dmabuf UMEM plumbing: ibv_cmd_create_qp_ex3() with a driver
    attribute chain and a per-buffer UMEM helper.
  • man pages and pyverbs enum bindings.

mlx5 provider

  • Adopt struct ibv_buf for internal buffer allocations.
  • When a parent domain is created with ALLOW_CC_UNPROTECTED_ALLOC and the
    device reports CC_DMA_BOUNCE, allocate all provider-internal buffers
    (CQ, QP, SRQ, RWQ, doorbell records) from the dmabuf heap through the existing
    preferred-allocation path.
  • Pass per-buffer dmabuf UMEM descriptors to the kernel on QP/CQ creation.
  • Implement the ibv_alloc_buf/ibv_free_buf ops.

Example

  • rc_pingpong: add -U/--allow-cc-unprotected to allocate the data buffer
    and MR from unprotected/shared memory via a parent domain and the new
    ibv_alloc_buf()/ibv_free_buf()/ibv_reg_buf_mr() helpers.

Notes

  • Kernel headers are refreshed in the first commit; this depends on the matching
    kernel UAPI support.
  • Adds new public symbols (ibv_alloc_buf, ibv_free_buf, ibv_reg_buf_mr).

@jpirko jpirko force-pushed the wip_umem_attrs_and_cc branch 8 times, most recently from 96e3e82 to 80243ff Compare June 5, 2026 18:50
Jiri Pirko added 21 commits June 8, 2026 13:00
…oCo) guests

In a Confidential Computing (CoCo) guest, guest memory is encrypted, and a
device that cannot access encrypted memory requires DMA bounce buffering. For
RDMA to work in such guests, the DMA buffers (both provider-internal buffers
and user data buffers) must live in unprotected/shared (decrypted) memory.

This PR adds the infrastructure to allocate such buffers from a `system_cc_shared`
DMA-buf heap and wires it into the mlx5 provider. Using shared memory is
strictly opt-in and only takes effect when the device reports that it needs
DMA bounce; otherwise normal memory is used and behavior is unchanged.

- `struct ibv_buf`: a common buffer descriptor (owning pd, addr, size, optional
  dmabuf fd) that providers embed in their internal buffer structures, with
  `ibv_buf_init()` helpers.
- An internal DMA-buf heap allocator (`libibverbs/dmabuf_heap.c`) backed by the
  `system_cc_shared` heap.
- `IBV_DEVICE_CC_DMA_BOUNCE` device capability flag, reported by the kernel when
  the device is in a CoCo guest and requires DMA bounce.
- `IBV_PARENT_DOMAIN_INIT_ATTR_ALLOW_CC_UNPROTECTED_ALLOC` parent-domain flag for
  applications to opt-in to unprotected/shared allocation.
- New provider-aware buffer verbs `ibv_alloc_buf()`, `ibv_free_buf()` and
  `ibv_reg_buf_mr()`, plus the backing `alloc_buf`/`free_buf` provider ops.
- Per-buffer dmabuf UMEM plumbing: `ibv_cmd_create_qp_ex3()` with a driver
  attribute chain and a per-buffer UMEM helper.
- man pages and pyverbs enum bindings.

- Adopt `struct ibv_buf` for internal buffer allocations.
- When a parent domain is created with `ALLOW_CC_UNPROTECTED_ALLOC` and the
  device reports `CC_DMA_BOUNCE`, allocate all provider-internal buffers
  (CQ, QP, SRQ, RWQ, doorbell records) from the dmabuf heap through the existing
  preferred-allocation path.
- Pass per-buffer dmabuf UMEM descriptors to the kernel on QP/CQ creation.
- Implement the `ibv_alloc_buf`/`ibv_free_buf` ops.

- `rc_pingpong`: add `-U`/`--allow-cc-unprotected` to allocate the data buffer
  and MR from unprotected/shared memory via a parent domain and the new
  `ibv_alloc_buf()`/`ibv_free_buf()`/`ibv_reg_buf_mr()` helpers.

- Kernel headers are refreshed in the first commit; this depends on the matching
  kernel UAPI support.
- Adds new public symbols (`ibv_alloc_buf`, `ibv_free_buf`, `ibv_reg_buf_mr`).

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
To commit: d7a40b519497 ("RDMA/uverbs: Expose CoCo DMA bounce
requirement to userspace").

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add struct ibv_buf with addr and size fields to provide a common
abstraction for buffer metadata that providers can embed in their
internal buffer structures. Introduce an init helper alongside.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add a flag the kernel sets when the device is in a CoCo guest
and requires DMA bounce buffering.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add a dmabuf sub-struct with an fd field to struct ibv_buf so that
providers can store the DMA-buf file descriptor directly in the
common buffer abstraction.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add alloc_buf/free_buf provider ops and corresponding helpers to
allow applications to allocate buffers using the provider's configured
allocation method.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add ibv_reg_buf_mr() to register memory returned by ibv_alloc_buf().

Register plain buffers with ibv_reg_mr(). Register DMA-buf backed
buffers with ibv_reg_dmabuf_mr() after validating the offset against
the stored buffer metadata.

Reject PD mismatches so a buffer is registered only with the protection
domain that allocated it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add CoCo unprotected alloc flag for applications to opt-in to
unprotected/shared memory allocation via parent domain creation.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add a man page for ibv_alloc_buf(), ibv_free_buf() and ibv_reg_buf_mr().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
The new umem UAPI gives every dmabuf-backed buffer (CQ ring, QP
main/RQ/SQ, mlx5 doorbell record) its own ioctl attribute of type
UVERBS_ATTR_UMEM whose payload is a single struct ib_uverbs_buffer_desc.

Add a per-attribute helper, fill_attr_in_buf_umem(), that providers
call once per buffer they want to register through the kernel.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Mirror ibv_cmd_create_cq_ex2() by introducing a new private QP create
helper that accepts a driver command-buffer chain. Providers use the
chain to attach driver-namespace and per-buffer UMEM attributes
(UVERBS_ATTR_CREATE_QP_*_BUF_UMEM) to the create-QP ioctl.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add dmabuf_heap.c/h including a set of internal helpers for providers
to allocate memory from Linux DMA-buf heaps (/dev/dma_heap/<name>).
Add a helper to initialize "system_cc_shared" heap allocator.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Replace the separate void *buf and size_t length fields in struct
mlx5_buf with an embedded struct ibv_buf, aligning the provider's
internal buffer representation with the common ibv_buf abstraction.

Link struct mlx5_buf to the ibv_buf API (pd, addr, size) instead of
maintaining duplicate fields. Add a pd argument to the internal
allocation helpers and initialize the embedded buffer through
ibv_buf_init(), so each buffer records its owning protection domain.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
When a parent domain is created with ALLOW_CC_UNPROTECTED_ALLOC and the
device reports CC_DMA_BOUNCE, open a dmabuf heap and use it for all
provider-internal buffer allocations (CQ, QP, SRQ, RWQ, doorbell
records) through the existing preferred allocation path.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Implement mlx5_alloc_buf_op()/mlx5_free_buf_op() using the existing
struct mlx5_buf based allocation infrastructure.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Wire CQ and QP create paths to the new per-attribute UMEM UAPI: emit
a struct ib_uverbs_buffer_desc for each dmabuf-backed buffer (CQ ring,
QP main/SQ, doorbell record) on the driver_attrs chain via
fill_attr_in_buf_umem().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
In preparation for the follow-up patch, move ctx->buf allocation later
in pp_init_ctx().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add -U/--allow-cc-unprotected option for running in CoCo guests if
device DMA requires unprotected/shared memory. When set, create a parent
domain with ALLOW_CC_UNPROTECTED_ALLOC and use
ibv_alloc_buf()/ibv_free_buf()/ibv_reg_buf_mr() for MR buffer allocation
and registration.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add a comp_mask argument to ParentDomainInitAttr so callers can request
IBV_PARENT_DOMAIN_INIT_ATTR_ALLOW_CC_UNPROTECTED_ALLOC, the opt-in used
by DMA-bounce devices on Confidential Computing (CoCo) guests.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Wrap the provider-aware buffer API (ibv_alloc_buf, ibv_free_buf,
ibv_reg_buf_mr) with new Buf and BufMR classes. Buf owns a buffer
allocated through a PD and is tracked in a per-PD weakset, so it is
torn down before the PD it belongs to. BufMR registers an MR over a
(sub)range of a Buf and deregisters it on close without freeing the
underlying buffer.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Add test_buf.py exercising the ibv_buf API (ibv_alloc_buf,
ibv_reg_buf_mr, ibv_free_buf) over both a plain PD and a parent domain
created with ALLOW_CC_UNPROTECTED_ALLOC, in API-only and RC/UD traffic
variants. Add an is_cq_ex option to the rdma_traffic and atomic_traffic
helpers so the tests can drive an extended CQ.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
@jpirko jpirko force-pushed the wip_umem_attrs_and_cc branch from 80243ff to 4f7eff7 Compare June 8, 2026 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant