Skip to content

Build GPU profile benchmark#4244

Open
namgyu-youn wants to merge 1 commit intopytorch:mainfrom
namgyu-youn:int8-profile-benchmark
Open

Build GPU profile benchmark#4244
namgyu-youn wants to merge 1 commit intopytorch:mainfrom
namgyu-youn:int8-profile-benchmark

Conversation

@namgyu-youn
Copy link
Copy Markdown
Contributor

@namgyu-youn namgyu-youn commented Apr 6, 2026

Summary:
Quantization is fundamentally designed to reduce memory footprint. However, torchao doesn't have e2e gpu benchmark with a real model run. For example, measure_accuracy_and_performance.sh focus on accuracy and throughput, not memory. Without an explicit memory benchmark, it's difficult to quantify the allocator behavior and memory framentation — which many quantization frameworks suffers to resolve.

To handle this issue, this PR introduces gpu benchmark with real model run. It covers 3-variants: (1) BF16 (original), (2) W8A8-INT, (3) W8A8-INT + torch.compile.

Result on NVIDIA RTX 5090 device and Qwen3-8B model:

Metric BF16 W8A8-INT INT8+compile
Allocated (MiB) 15643.9 11396.1 8430.6
Reserved (MiB) 15658.0 21334.0 15312.0
Fragmentation % 0.1 46.6 44.9

Future plan:
This PR only introduces Int8DynamicActivationInt8WeightConfig for minimal change and design confirm, all torchao subclass support is planed:

  • Int4WeightOnlyConfig
  • Float8DynamicActivationInt4WeightConfig
  • Int8WeightOnlyConfig
  • Float8WeightOnlyConfig
  • Float8DynamicActivationFloat8WeightConfig

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4244

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 6, 2026
@namgyu-youn
Copy link
Copy Markdown
Contributor Author

@pytorchbot label "module: inference" "topic: for developers"

@pytorch-bot pytorch-bot bot added module: inference quantize_ api inference flow topic: for developers Use this tag if this PR is mainly developer facing labels Apr 6, 2026
@jerryzh168
Copy link
Copy Markdown
Contributor

cc @jainapurva maybe you can take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: inference quantize_ api inference flow topic: for developers Use this tag if this PR is mainly developer facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants