Preserve buffer mutation tracking for index_put_ in convert_pt2e by MagellaX · Pull Request #4241 · pytorch/ao

MagellaX · 2026-04-05T20:17:52Z

What changed

This fixes PT2E convert for modules that mutate a registered buffer via index_put_.

Previously, the convert path was lowering the observed destination buffer to quantize -> dequantize before index_put_. That changed the mutation from the real buffer to a temporary tensor, so after re-export:

buffers_to_mutate disappeared
and with fold_quantize=True, the buffer input could also get folded into _frozen_param0

This PR preserves observed inputs when they feed a write-alias argument, so the converted graph still mutates the original buffer. It also adds a regression test covering both fold_quantize=False and fold_quantize=True using the exact repro from #4138.

Why

This issue is really a mutation-preservation bug in convert_pt2e, not just a constant-folding issue.

By keeping the original mutable buffer input intact at the write site, re-export preserves both:

inputs_to_buffers
buffers_to_mutate

Validation

ruff check on touched files
python -m compileall torchao/quantization/pt2e/convert.py test/quantization/pt2e/test_quantize_pt2e.py
exact repro script validated both:
- fold_quantize=False
- fold_quantize=True
isolated pytest regression:
- python -m pytest %TEMP%/test_issue4138_repro.py -q
- 1 passed

addresses #4138

pytorch-bot · 2026-04-05T20:17:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4241

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures

As of commit 2ce4c2e with merge base 6e7a6e9 ():

NEW FAILURES - The following jobs have failed:

PR Label Check / Check PR Labels (gh)
Process completed with exit code 1.
Run Regression Tests / test (CPU 2.10, linux.4xlarge, torch==2.10.0 torchvision==0.25.0 --index-url https://download.pyt... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_qat_conv2d_binary2
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytor... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_qat_conv2d_binary2
Run Regression Tests / test (CPU 2.9, linux.4xlarge, torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytor... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_qat_conv2d_binary2
Run Regression Tests / test (CUDA 2.10, linux.g5.12xlarge.nvidia.gpu, torch==2.10.0 torchvision==0.25.0, cuda, 12.6) / linux-job (gh)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_qat_conv2d_binary2
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0 torchvision==0.23.0, cuda, 12.6) / linux-job (gh)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_qat_conv2d_binary2
Run Regression Tests / test (CUDA 2.9, linux.g5.12xlarge.nvidia.gpu, torch==2.9.1 torchvision==0.24.1, cuda, 12.6) / linux-job (gh)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_qat_conv2d_binary2
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch torchvision --index-url https://download.py... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_qat_conv2d_binary2
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch torchvision --index-url htt... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_qat_conv2d_binary2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

MagellaX · 2026-04-07T07:51:55Z

any takes @jerryzh168 ??

andrewor14 · 2026-04-07T14:24:07Z

@claude review please

claude · 2026-04-07T14:24:27Z

Claude finished @andrewor14's task in 5m 20s —— View job

Code Review for PR #4241

Read the PR diff and changed files
Understand the core logic changes in convert.py
Review the test changes
Check for correctness, edge cases, and style issues
Post review feedback

Summary

This PR addresses a real bug: convert_pt2e was replacing observers on mutable buffer inputs with quantize/dequantize nodes, which broke the mutation semantics. After re-export, buffers_to_mutate would disappear because index_put_ was writing to a temporary dequantized tensor instead of the original buffer.

The fix uses op schema introspection (alias_info.is_write) to detect when an observer's output feeds into a write-target argument of an in-place op, and skips the quantize/dequantize conversion in that case—just removing the observer and rewiring to the original input.

Correctness

The core approach is sound. The _is_write_target_arg function correctly inspects the schema's alias_info.is_write property for both positional and keyword arguments. For index_put_, only self (args[0]) has is_write=True, so:

The buffer's observer is preserved (correctly — mutating a dequantized temp would break buffer tracking)
The value's observer is NOT preserved (correctly — it can still be quantized/dequantized since it's a read-only input)

Issues

1. Style inconsistency in the test — test/quantization/pt2e/test_quantize_pt2e.py:65

The test imports and uses Q_ANNOTATION_KEY, but every other quantizer in the test file (30+ occurrences) uses the literal string "quantization_annotation" directly. For consistency with the rest of the file, the test should use the string literal rather than adding a new import.

Fix this →

2. Comment is now misleading — torchao/quantization/pt2e/convert.py:202-203

The comment says "didn't find corresponding quantize op and info for the activation_post_process" but now this branch also handles the _should_preserve_observed_input case, which has nothing to do with missing qconfig info. The comment should be updated to reflect the new condition.

3. Naming: _should_preserve_observed_input could be more precise — torchao/quantization/pt2e/convert.py:158

The function name _should_preserve_observed_input is slightly ambiguous—"preserve" could be interpreted multiple ways. Something like _feeds_write_alias_arg or _is_mutation_target_observer would more directly convey that we're detecting observers whose outputs are consumed as write-alias arguments. Minor naming nit, though.

Edge Cases to Consider

4. Interaction with _find_mutable_buffers in constant_fold.py

The constant folding code at torchao/quantization/pt2e/constant_fold.py:97-110 has a similar but narrower check: it only looks for copy_ operations by name. This PR's approach using alias_info.is_write from the schema is more general and robust. Worth noting that these two systems now have overlapping but different mutation detection mechanisms — future work could potentially unify them.

5. Non-ATen ops without _schema

The _is_write_target_arg function correctly returns False when a node doesn't have _schema (line 132). This means custom ops or higher-order ops that mutate inputs but lack schema information won't be detected. This is a reasonable limitation for now since the PT2E export path primarily produces ATen ops.

Test Review

The test (test_convert_pt2e_preserves_index_put_buffer_mutation_on_reexport) is well-structured:

Tests both fold_quantize=False and fold_quantize=True via subTest
Verifies the graph structure (buffer node directly feeds index_put_)
Verifies re-export preserves inputs_to_buffers and buffers_to_mutate in the signature
The IndexPutQuantizer is a reasonable minimal quantizer for the test

One minor suggestion: the test could also verify that the value argument (args[2]) of index_put_ is still quantized/dequantized (i.e., that we're not over-preserving).

Verdict

The fix is correct and well-targeted. The main feedback items are (1) the style inconsistency with Q_ANNOTATION_KEY and (2) the now-misleading comment. Neither is blocking.

pt2e: preserve mutable buffer inputs through convert

2ce4c2e

MagellaX requested review from andrewor14, jerryzh168 and vkuzo as code owners April 5, 2026 20:17

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 5, 2026

andrewor14 added the module: pt2e_quant pt2 export quantization (prepare_pt2e, convert_pt2e, quantizer) label Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve buffer mutation tracking for index_put_ in convert_pt2e#4241

Preserve buffer mutation tracking for index_put_ in convert_pt2e#4241
MagellaX wants to merge 1 commit intopytorch:mainfrom
MagellaX:feat/4138-preserve-pt2e-buffer-mutation

MagellaX commented Apr 5, 2026

Uh oh!

pytorch-bot bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

MagellaX commented Apr 7, 2026

Uh oh!

andrewor14 commented Apr 7, 2026

Uh oh!

claude bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MagellaX commented Apr 5, 2026

What changed

Why

Validation

Uh oh!

pytorch-bot bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4241

❌ 9 New Failures

Uh oh!

MagellaX commented Apr 7, 2026

Uh oh!

andrewor14 commented Apr 7, 2026

Uh oh!

claude bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #4241

Summary

Correctness

Issues

Edge Cases to Consider

Test Review

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Apr 5, 2026 •

edited

Loading

claude bot commented Apr 7, 2026 •

edited

Loading