Skip to content

cuda.core: bump tensor bridge PyTorch upper-bound to 2.12#2099

Merged
rwgk merged 2 commits into
NVIDIA:mainfrom
aryanputta:fix/pytorch-2-12-version-cap
May 17, 2026
Merged

cuda.core: bump tensor bridge PyTorch upper-bound to 2.12#2099
rwgk merged 2 commits into
NVIDIA:mainfrom
aryanputta:fix/pytorch-2-12-version-cap

Conversation

@aryanputta
Copy link
Copy Markdown
Contributor

Closes #2089.

Problem

PyTorch 2.12 was released on ~May 14 2026. The version guard in _torch_version_check() (cuda_core/cuda/core/_memoryview.pyx) capped the AOTI tensor bridge at (2, 11), so _is_torch_tensor() returned False for any torch tensor under 2.12.

As a result, torch tensors fell through to the generic CAI/DLPack paths and raised:

BufferError: only CUDA Array Interface v3 or above is supported

Fix

Extend the upper bound from (2, 11) to (2, 12). The THPVariable struct layout (PyObject_HEAD followed by at::Tensor cdata) and the AtenTensorHandle == at::Tensor* identity are stable across PyTorch 2.x minor releases, as they have been across 2.3–2.11.

-    _torch_version_ok = (2, 3) <= (major, minor) <= (2, 11)
+    _torch_version_ok = (2, 3) <= (major, minor) <= (2, 12)

Testing

This change cannot be exercised without a PyTorch 2.12 build in CI. The existing tensor bridge tests (cuda_core/tests/) cover the code path; they will validate correctness once a 2.12 runner is available. The nightly CI matrix already references a latest PyTorch entry (see nightly.yml) which should pick up 2.12.

PyTorch 2.12 was released on ~May 14 2026.  The version guard in
_torch_version_check() capped the AOTI tensor bridge at (2, 11), causing
_is_torch_tensor() to return False for torch tensors under 2.12.  As a
result, torch tensors fell through to the generic CAI/DLPack paths and
raised:

  BufferError: only CUDA Array Interface v3 or above is supported

The THPVariable struct layout and AtenTensorHandle aliasing are stable
across PyTorch 2.x minor releases.  Extend the upper bound to (2, 12).

Closes NVIDIA#2089
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label May 16, 2026
Copy link
Copy Markdown
Contributor

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this immediate bump makes sense as a short-term unblock for PyTorch 2.12, but the hard cap may be causing more user pain than the underlying ABI risk warrants.

Right now, every new PyTorch minor release effectively requires a new cuda-core release, even if the AOTI ABI and THPVariable layout have not changed. When the cap trips, users also do not get a very direct explanation of what happened; they fall back to the generic path and can end up seeing a confusing downstream error instead.

Would it make sense to keep the conservative default behavior, but make it explicit and overridable? For example:

export CUDA_PYTHON_TORCH_TENSOR_BRIDGE_COMPATIBILITY_CHECK=ERROR  # or WARN / OFF

With ERROR as the default, we could raise a clear message explaining that the detected PyTorch version is outside the validated range and point users to the WARN and OFF options as escape hatches when they are willing to assume ABI compatibility.

That would preserve the safety-first default while avoiding the need to cut a new cuda-core release for every PyTorch minor version unless there is evidence that compatibility actually broke.

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 17, 2026

I assume the right validation for this PR is a quick source-level check against PyTorch 2.12. Has that already been done?

In particular, I think the key files to inspect would be:

  • torch/csrc/autograd/python_variable.h, to confirm that THPVariable still has the expected layout (PyObject_HEAD followed by at::Tensor cdata)
  • torch/csrc/inductor/aoti_torch/c/macros.h, to confirm the AtenTensorHandle contract is unchanged
  • torch/csrc/inductor/aoti_torch/c/shim.h, to confirm the AOTI entry points we use are still present and compatible

If someone already compared the relevant 2.11 vs 2.12 sources, it would be great to call that out in the PR description or comments.

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 17, 2026

While I was at it, I asked Cursor (GPT-5.4 272K Medium) to go ahead and do the source-based validation based on a local git clone of pytorch. Below is what it found. I'll go ahead and merge this PR.


I did a source-level comparison locally against the PyTorch v2.11.0 and v2.12.0 tags to validate the assumptions behind this PR.

What I checked:

  • torch/csrc/autograd/python_variable.h between v2.11.0 and v2.12.0: no diff. THPVariable still has PyObject_HEAD followed by at::Tensor cdata, which is the core assumption behind the pyobj_to_aten_handle trick.
  • torch/csrc/inductor/aoti_torch/c/macros.h: no diff. PyTorch still documents AtenTensorHandle as being represented under the hood as at::Tensor*.
  • torch/csrc/inductor/aoti_torch/utils.h: no diff. The helper still does reinterpret_cast<at::Tensor*>(handle).
  • torch/csrc/inductor/aoti_torch/c/shim.h: there is a diff, but it appears additive and unrelated to the APIs used here. The specific metadata getters used by cuda-core are still present with the same signatures.
  • torch/csrc/inductor/aoti_torch/shim_common.cpp: the diff also appears additive; I only saw new dtype helper implementations. The getter implementations used by this bridge are unchanged and still go through the same tensor_handle_to_tensor_pointer(...) path before reading tensor metadata.

My conclusion is that, for the specific 2.11 -> 2.12 bump in this PR, the source-based validation looks good. I do not see evidence of a break in either of the two assumptions the PR description relies on:

  • the THPVariable layout stayed the same
  • AtenTensorHandle still maps to at::Tensor*, and the exact AOTI getter path used by cuda-core is unchanged

The caveat is that this is still not a formal compatibility guarantee from PyTorch for the THPVariable trick itself, so the cap remains a conservative policy choice rather than pure paranoia. But for this specific release bump, the source comparison does support the PR.

@rwgk rwgk added this to the cuda.core next milestone May 17, 2026
@rwgk rwgk added enhancement Any code-related improvements P0 High priority - Must do! labels May 17, 2026
@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 17, 2026

/ok to test 3c77f31

@rwgk rwgk enabled auto-merge (squash) May 17, 2026 17:40
@github-actions

This comment has been minimized.

@rwgk rwgk merged commit ffac267 into NVIDIA:main May 17, 2026
96 checks passed
@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bump tensor bridge version cap for PyTorch 2.12

2 participants