Adding V-only Hadamard support to low precision attention API by howardzhang-cv · Pull Request #4249 · pytorch/ao

howardzhang-cv · 2026-04-07T20:36:44Z

Stack from ghstack (oldest at bottom):

-> Adding V-only Hadamard support to low precision attention API #4249

Summary

Added Hadamard on V tensor support for the low precision attention API, passed through from apply_low_precision_attention to select the hadamard fused kernel
Added new functions to kernel files (triton_hadamard_qkv_quantization.py and triton_hadamard_rope_qkv_quantization.py) in triton for fused V-only hadamard and QKV quantization (with rope fusion as well)
Added V-only hadamard option to the benchmarks
Added additional V-only hadamard tests

Results

Single Attention Layer

Results show speedup compared to QKV Hadamard option (1.29x for QKV, 1.32x for V-only, 1.38x for no Hadamard). Worth noting that the mid-sequence length options sped up a lot more (1.02x for QKV, 1.15x for V-only).

LLaMA3 Prefill

Perplexity goes from 7.54->7.57 with QKV. Now it goes from 7.54->7.61, which is very close to the 7.62 without Hadamard. The speedup from QKV Hadamard is 1.15x, which is now 1.19x with V-Only Hadamard. Seems to me like V-only Hadamard does not help as much for LLM models.

Flux.1-schnell 2048x2048

LPIPS is about the same as QKV Hadamard, much better than the no Hadamard option (0.44 LPIPS). Speedup is slightly better too 0.96x with QKV to 1.0x with V-only (speedups scale up with sequence lengths, larger image sizes or video generation will see better speedup numbers). So V-only Hadamard is a better option for image generation and video generation.

[ghstack-poisoned]

pytorch-bot · 2026-04-07T20:36:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4249

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f5bf649 with merge base 2a8fa55 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: bfa05b9 Pull-Request: #4249

jerryzh168

LGTM

Update

f5bf649

[ghstack-poisoned]

howardzhang-cv requested review from jerryzh168 and vkuzo as code owners April 7, 2026 20:36

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 7, 2026

howardzhang-cv added a commit that referenced this pull request Apr 7, 2026

Adding V-only Hadamard support to low precision attention API

f1ba057

ghstack-source-id: bfa05b9 Pull-Request: #4249

howardzhang-cv added the module: inference quantize_ api inference flow label Apr 7, 2026

jerryzh168 approved these changes Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding V-only Hadamard support to low precision attention API#4249

Adding V-only Hadamard support to low precision attention API#4249
howardzhang-cv wants to merge 1 commit intogh/howardzhang-cv/40/basefrom
gh/howardzhang-cv/40/head

howardzhang-cv commented Apr 7, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

jerryzh168 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

howardzhang-cv commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Single Attention Layer

LLaMA3 Prefill

Flux.1-schnell 2048x2048

Uh oh!

pytorch-bot bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4249

✅ No Failures

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

howardzhang-cv commented Apr 7, 2026 •

edited

Loading

pytorch-bot bot commented Apr 7, 2026 •

edited

Loading