Skip to content

Preserve buffer mutation tracking for index_put_ in convert_pt2e#4241

Open
MagellaX wants to merge 1 commit intopytorch:mainfrom
MagellaX:feat/4138-preserve-pt2e-buffer-mutation
Open

Preserve buffer mutation tracking for index_put_ in convert_pt2e#4241
MagellaX wants to merge 1 commit intopytorch:mainfrom
MagellaX:feat/4138-preserve-pt2e-buffer-mutation

Conversation

@MagellaX
Copy link
Copy Markdown
Contributor

@MagellaX MagellaX commented Apr 5, 2026

What changed

This fixes PT2E convert for modules that mutate a registered buffer via index_put_.

Previously, the convert path was lowering the observed destination buffer to quantize -> dequantize before index_put_. That changed the mutation from the real buffer to a temporary tensor, so after re-export:

  • buffers_to_mutate disappeared
  • and with fold_quantize=True, the buffer input could also get folded into _frozen_param0

This PR preserves observed inputs when they feed a write-alias argument, so the converted graph still mutates the original buffer. It also adds a regression test covering both fold_quantize=False and fold_quantize=True using the exact repro from #4138.

Why

This issue is really a mutation-preservation bug in convert_pt2e, not just a constant-folding issue.

By keeping the original mutable buffer input intact at the write site, re-export preserves both:

  • inputs_to_buffers
  • buffers_to_mutate

Validation

  • ruff check on touched files
  • python -m compileall torchao/quantization/pt2e/convert.py test/quantization/pt2e/test_quantize_pt2e.py
  • exact repro script validated both:
    • fold_quantize=False
    • fold_quantize=True
  • isolated pytest regression:
    • python -m pytest %TEMP%/test_issue4138_repro.py -q
    • 1 passed

addresses #4138

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4241

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures

As of commit 2ce4c2e with merge base 6e7a6e9 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 5, 2026
@MagellaX
Copy link
Copy Markdown
Contributor Author

MagellaX commented Apr 7, 2026

any takes @jerryzh168 ??

@andrewor14
Copy link
Copy Markdown
Contributor

@claude review please

@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

Claude finished @andrewor14's task in 5m 20s —— View job


Code Review for PR #4241

  • Read the PR diff and changed files
  • Understand the core logic changes in convert.py
  • Review the test changes
  • Check for correctness, edge cases, and style issues
  • Post review feedback

Summary

This PR addresses a real bug: convert_pt2e was replacing observers on mutable buffer inputs with quantize/dequantize nodes, which broke the mutation semantics. After re-export, buffers_to_mutate would disappear because index_put_ was writing to a temporary dequantized tensor instead of the original buffer.

The fix uses op schema introspection (alias_info.is_write) to detect when an observer's output feeds into a write-target argument of an in-place op, and skips the quantize/dequantize conversion in that case—just removing the observer and rewiring to the original input.

Correctness

The core approach is sound. The _is_write_target_arg function correctly inspects the schema's alias_info.is_write property for both positional and keyword arguments. For index_put_, only self (args[0]) has is_write=True, so:

  • The buffer's observer is preserved (correctly — mutating a dequantized temp would break buffer tracking)
  • The value's observer is NOT preserved (correctly — it can still be quantized/dequantized since it's a read-only input)

Issues

1. Style inconsistency in the testtest/quantization/pt2e/test_quantize_pt2e.py:65

The test imports and uses Q_ANNOTATION_KEY, but every other quantizer in the test file (30+ occurrences) uses the literal string "quantization_annotation" directly. For consistency with the rest of the file, the test should use the string literal rather than adding a new import.

Fix this →

2. Comment is now misleadingtorchao/quantization/pt2e/convert.py:202-203

The comment says "didn't find corresponding quantize op and info for the activation_post_process" but now this branch also handles the _should_preserve_observed_input case, which has nothing to do with missing qconfig info. The comment should be updated to reflect the new condition.

3. Naming: _should_preserve_observed_input could be more precisetorchao/quantization/pt2e/convert.py:158

The function name _should_preserve_observed_input is slightly ambiguous—"preserve" could be interpreted multiple ways. Something like _feeds_write_alias_arg or _is_mutation_target_observer would more directly convey that we're detecting observers whose outputs are consumed as write-alias arguments. Minor naming nit, though.

Edge Cases to Consider

4. Interaction with _find_mutable_buffers in constant_fold.py

The constant folding code at torchao/quantization/pt2e/constant_fold.py:97-110 has a similar but narrower check: it only looks for copy_ operations by name. This PR's approach using alias_info.is_write from the schema is more general and robust. Worth noting that these two systems now have overlapping but different mutation detection mechanisms — future work could potentially unify them.

5. Non-ATen ops without _schema

The _is_write_target_arg function correctly returns False when a node doesn't have _schema (line 132). This means custom ops or higher-order ops that mutate inputs but lack schema information won't be detected. This is a reasonable limitation for now since the PT2E export path primarily produces ATen ops.

Test Review

The test (test_convert_pt2e_preserves_index_put_buffer_mutation_on_reexport) is well-structured:

  • Tests both fold_quantize=False and fold_quantize=True via subTest
  • Verifies the graph structure (buffer node directly feeds index_put_)
  • Verifies re-export preserves inputs_to_buffers and buffers_to_mutate in the signature
  • The IndexPutQuantizer is a reasonable minimal quantizer for the test

One minor suggestion: the test could also verify that the value argument (args[2]) of index_put_ is still quantized/dequantized (i.e., that we're not over-preserving).

Verdict

The fix is correct and well-targeted. The main feedback items are (1) the style inconsistency with Q_ANNOTATION_KEY and (2) the now-misleading comment. Neither is blocking.


@andrewor14 andrewor14 added the module: pt2e_quant pt2 export quantization (prepare_pt2e, convert_pt2e, quantizer) label Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: pt2e_quant pt2 export quantization (prepare_pt2e, convert_pt2e, quantizer)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants