cuda.core: bump tensor bridge PyTorch upper-bound to 2.12 by aryanputta · Pull Request #2099 · NVIDIA/cuda-python

aryanputta · 2026-05-16T21:00:30Z

Closes #2089.

Problem

PyTorch 2.12 was released on ~May 14 2026. The version guard in _torch_version_check() (cuda_core/cuda/core/_memoryview.pyx) capped the AOTI tensor bridge at (2, 11), so _is_torch_tensor() returned False for any torch tensor under 2.12.

As a result, torch tensors fell through to the generic CAI/DLPack paths and raised:

BufferError: only CUDA Array Interface v3 or above is supported

Fix

Extend the upper bound from (2, 11) to (2, 12). The THPVariable struct layout (PyObject_HEAD followed by at::Tensor cdata) and the AtenTensorHandle == at::Tensor* identity are stable across PyTorch 2.x minor releases, as they have been across 2.3–2.11.

-    _torch_version_ok = (2, 3) <= (major, minor) <= (2, 11)
+    _torch_version_ok = (2, 3) <= (major, minor) <= (2, 12)

Testing

This change cannot be exercised without a PyTorch 2.12 build in CI. The existing tensor bridge tests (cuda_core/tests/) cover the code path; they will validate correctness once a 2.12 runner is available. The nightly CI matrix already references a latest PyTorch entry (see nightly.yml) which should pick up 2.12.

PyTorch 2.12 was released on ~May 14 2026. The version guard in _torch_version_check() capped the AOTI tensor bridge at (2, 11), causing _is_torch_tensor() to return False for torch tensors under 2.12. As a result, torch tensors fell through to the generic CAI/DLPack paths and raised: BufferError: only CUDA Array Interface v3 or above is supported The THPVariable struct layout and AtenTensorHandle aliasing are stable across PyTorch 2.x minor releases. Extend the upper bound to (2, 12). Closes NVIDIA#2089

copy-pr-bot · 2026-05-16T21:00:33Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rwgk

I think this immediate bump makes sense as a short-term unblock for PyTorch 2.12, but the hard cap may be causing more user pain than the underlying ABI risk warrants.

Right now, every new PyTorch minor release effectively requires a new cuda-core release, even if the AOTI ABI and THPVariable layout have not changed. When the cap trips, users also do not get a very direct explanation of what happened; they fall back to the generic path and can end up seeing a confusing downstream error instead.

Would it make sense to keep the conservative default behavior, but make it explicit and overridable? For example:

export CUDA_PYTHON_TORCH_TENSOR_BRIDGE_COMPATIBILITY_CHECK=ERROR  # or WARN / OFF

With ERROR as the default, we could raise a clear message explaining that the detected PyTorch version is outside the validated range and point users to the WARN and OFF options as escape hatches when they are willing to assume ABI compatibility.

That would preserve the safety-first default while avoiding the need to cut a new cuda-core release for every PyTorch minor version unless there is evidence that compatibility actually broke.

rwgk · 2026-05-17T17:21:47Z

I assume the right validation for this PR is a quick source-level check against PyTorch 2.12. Has that already been done?

In particular, I think the key files to inspect would be:

torch/csrc/autograd/python_variable.h, to confirm that THPVariable still has the expected layout (PyObject_HEAD followed by at::Tensor cdata)
torch/csrc/inductor/aoti_torch/c/macros.h, to confirm the AtenTensorHandle contract is unchanged
torch/csrc/inductor/aoti_torch/c/shim.h, to confirm the AOTI entry points we use are still present and compatible

If someone already compared the relevant 2.11 vs 2.12 sources, it would be great to call that out in the PR description or comments.

rwgk · 2026-05-17T17:38:25Z

While I was at it, I asked Cursor (GPT-5.4 272K Medium) to go ahead and do the source-based validation based on a local git clone of pytorch. Below is what it found. I'll go ahead and merge this PR.

I did a source-level comparison locally against the PyTorch v2.11.0 and v2.12.0 tags to validate the assumptions behind this PR.

What I checked:

torch/csrc/autograd/python_variable.h between v2.11.0 and v2.12.0: no diff. THPVariable still has PyObject_HEAD followed by at::Tensor cdata, which is the core assumption behind the pyobj_to_aten_handle trick.
torch/csrc/inductor/aoti_torch/c/macros.h: no diff. PyTorch still documents AtenTensorHandle as being represented under the hood as at::Tensor*.
torch/csrc/inductor/aoti_torch/utils.h: no diff. The helper still does reinterpret_cast<at::Tensor*>(handle).
torch/csrc/inductor/aoti_torch/c/shim.h: there is a diff, but it appears additive and unrelated to the APIs used here. The specific metadata getters used by cuda-core are still present with the same signatures.
torch/csrc/inductor/aoti_torch/shim_common.cpp: the diff also appears additive; I only saw new dtype helper implementations. The getter implementations used by this bridge are unchanged and still go through the same tensor_handle_to_tensor_pointer(...) path before reading tensor metadata.

My conclusion is that, for the specific 2.11 -> 2.12 bump in this PR, the source-based validation looks good. I do not see evidence of a break in either of the two assumptions the PR description relies on:

the THPVariable layout stayed the same
AtenTensorHandle still maps to at::Tensor*, and the exact AOTI getter path used by cuda-core is unchanged

The caveat is that this is still not a formal compatibility guarantee from PyTorch for the THPVariable trick itself, so the cap remains a conservative policy choice rather than pure paranoia. But for this specific release bump, the source comparison does support the PR.

rwgk · 2026-05-17T17:39:55Z

/ok to test 3c77f31

github-actions · 2026-05-17T19:27:28Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

github-actions Bot added the cuda.core Everything related to the cuda.core module label May 16, 2026

rwgk approved these changes May 17, 2026

View reviewed changes

rwgk assigned aryanputta May 17, 2026

rwgk added this to the cuda.core next milestone May 17, 2026

rwgk added enhancement Any code-related improvements P0 High priority - Must do! labels May 17, 2026

Merge branch 'main' into fix/pytorch-2-12-version-cap

3c77f31

rwgk enabled auto-merge (squash) May 17, 2026 17:40

This comment has been minimized.

Sign in to view

rwgk merged commit ffac267 into NVIDIA:main May 17, 2026
96 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda.core: bump tensor bridge PyTorch upper-bound to 2.12#2099

cuda.core: bump tensor bridge PyTorch upper-bound to 2.12#2099
rwgk merged 2 commits into
NVIDIA:mainfrom
aryanputta:fix/pytorch-2-12-version-cap

aryanputta commented May 16, 2026

Uh oh!

copy-pr-bot Bot commented May 16, 2026

Uh oh!

rwgk left a comment

Uh oh!

rwgk commented May 17, 2026

Uh oh!

rwgk commented May 17, 2026

Uh oh!

rwgk commented May 17, 2026

Uh oh!

This comment has been minimized.

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aryanputta commented May 16, 2026

Problem

Fix

Testing

Uh oh!

copy-pr-bot Bot commented May 16, 2026

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk commented May 17, 2026

Uh oh!

rwgk commented May 17, 2026

Uh oh!

rwgk commented May 17, 2026

Uh oh!

This comment has been minimized.

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants