Skip to content

cp.async ergonomics: missing address-space-aware intrinsic forces inline PTX #378

@Y-jiji

Description

@Y-jiji

Summary

Using cp.async from Rust-CUDA currently requires handwritten inline PTX plus manual shared-address conversion in user kernels.

In practice, the missing piece is an address-space-aware API/intrinsic for cp.async.cg.shared.global.

Problem

For an async copy path like:

  • global source pointer
  • shared-memory destination
  • 16-byte copy (4 x f32)

users currently end up writing something like:

unsafe fn shared_addr(ptr: *mut f32) -> u32 {
    let mut addr: u64;
    asm!(
        "cvta.to.shared.u64 {dst}, {src};",
        dst = out(reg64) addr,
        src = in(reg64) ptr,
    );
    addr as u32
}

unsafe fn cp_async4(dst_shared: u32, src_global: *const f32) {
    asm!(
        "cp.async.cg.shared.global [{dst}], [{src}], 16;",
        dst = in(reg32) dst_shared,
        src = in(reg64) src_global,
    );
}

and then precompute shared base addresses in the kernel and pass integer offsets into the helper.

If we instead pass generic/raw pointers through a helper, PTX tends to contain extra address-space conversion glue around the cp.async sites.

Why this matters

This is exactly the kind of operation where users want:

  • a small, explicit intrinsic
  • correct shared/global address-space handling
  • no repeated manual inline PTX in every project

Right now, getting good code requires low-level PTX knowledge and manual control over shared address conversion.

Requested improvement

A Rust-CUDA intrinsic/helper for cp.async.cg.shared.global (or equivalent family) that:

  • represents the destination as shared-memory address space explicitly
  • avoids forcing users to manually convert *mut T into a u32 shared address
  • maps cleanly to the PTX async-copy instructions used in modern CUDA kernels

Even a low-level unsafe API would be useful if it preserves the right address-space semantics and avoids generic-pointer friction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions