Build GPU profile benchmark by namgyu-youn · Pull Request #4244 · pytorch/ao

namgyu-youn · 2026-04-06T16:14:50Z

Summary:
Quantization is fundamentally designed to reduce memory footprint. However, torchao doesn't have e2e gpu benchmark with a real model run. For example, measure_accuracy_and_performance.sh focus on accuracy and throughput, not memory. Without an explicit memory benchmark, it's difficult to quantify the allocator behavior and memory framentation — which many quantization frameworks suffers to resolve.

To handle this issue, this PR introduces gpu benchmark with real model run. It covers 3-variants: (1) BF16 (original), (2) W8A8-INT, (3) W8A8-INT + torch.compile.

Result on NVIDIA RTX 5090 device and Qwen3-8B model:

Metric	BF16	W8A8-INT	INT8+compile
Allocated (MiB)	15643.9	11396.1	8430.6
Reserved (MiB)	15658.0	21334.0	15312.0
Fragmentation %	0.1	46.6	44.9

Future plan:
This PR only introduces Int8DynamicActivationInt8WeightConfig for minimal change and design confirm, all torchao subclass support is planed:

Int4WeightOnlyConfig
Float8DynamicActivationInt4WeightConfig
Int8WeightOnlyConfig
Float8WeightOnlyConfig
Float8DynamicActivationFloat8WeightConfig

pytorch-bot · 2026-04-06T16:14:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4244

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

namgyu-youn · 2026-04-06T16:15:53Z

@pytorchbot label "module: inference" "topic: for developers"

jerryzh168 · 2026-04-07T20:07:58Z

cc @jainapurva maybe you can take a look?

build w8a8-int profile pipeline

3696fb3

namgyu-youn requested review from jerryzh168 and vkuzo as code owners April 6, 2026 16:14

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 6, 2026

pytorch-bot bot added module: inference quantize_ api inference flow topic: for developers Use this tag if this PR is mainly developer facing labels Apr 6, 2026

namgyu-youn mentioned this pull request Apr 8, 2026

Matmul kernel preference support for Int8Tensor #3558

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build GPU profile benchmark#4244

Build GPU profile benchmark#4244
namgyu-youn wants to merge 1 commit intopytorch:mainfrom
namgyu-youn:int8-profile-benchmark

namgyu-youn commented Apr 6, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 6, 2026

Uh oh!

namgyu-youn commented Apr 6, 2026

Uh oh!

jerryzh168 commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

namgyu-youn commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4244

Uh oh!

namgyu-youn commented Apr 6, 2026

Uh oh!

jerryzh168 commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

namgyu-youn commented Apr 6, 2026 •

edited

Loading