add gptq algorithm diagram by vkuzo · Pull Request #4237 · pytorch/ao

vkuzo · 2026-04-03T18:23:51Z

Summary:

Adding a diagram of the GPTQ update loop to simplify future work on this
code.

Also, var renames to simplify:

W_t_quantize_block -> B_cur
k_block_start -> B_cur_k_start
k_block_end -> B_cur_k_end
Err1 -> B_cur_Err1
Hinv_quantize_block -> Hinv_cur
k_group_start -> G_k_start
k_group_end -> G_k_end

Test Plan: CI, GPTQ output did not change locally before+after

[ghstack-poisoned]

vkuzo · 2026-04-03T18:23:52Z

Stack from ghstack (oldest at bottom):

-> add gptq algorithm diagram #4237

Summary: Adding a diagram of the GPTQ update loop to simplify future work on this code. Also, var renames to simplify: * W_t_quantize_block -> B_cur * k_block_start -> B_cur_k_start * k_block_end -> B_cur_k_end * Err1 -> B_cur_Err1 * Hinv_quantize_block -> Hinv_cur * k_group_start -> G_k_start * k_group_end -> G_k_end Test Plan: CI, this is a comment-only diff ghstack-source-id: 24d0909 ghstack-comment-id: 4184643773 Pull-Request: #4237

pytorch-bot · 2026-04-03T18:23:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4237

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4807cac with merge base 6e7a6e9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2026-04-03T18:24:49Z

@claude verify that all var renames are correct, that the diagram matches the code, and that this is a pure refactor (no logic change)

claude · 2026-04-03T18:25:07Z

Claude finished @vkuzo's task in 8m 58s —— View job

Verification Results

Read the changed file and diff
Verify all variable renames are correct and consistent
Verify the diagram matches the code
Confirm this is a pure refactor (no logic changes)

1. Variable Renames: PASS

All 7 renames were applied completely and consistently. Grep confirms zero occurrences of any old variable name remain in the file:

Old Name	New Name	Status
`W_t_quantize_block`	`B_cur`	All replaced
`k_block_start`	`B_cur_k_start`	All replaced
`k_block_end`	`B_cur_k_end`	All replaced
`Err1`	`B_cur_Err1`	All replaced
`Hinv_quantize_block`	`Hinv_cur`	All replaced
`k_group_start`	`G_k_start`	All replaced
`k_group_end`	`G_k_end`	All replaced

2. Diagram vs. Code: Minor inaccuracy in pseudocode

The diagram structure (triple for loop, shapes, batch propagation) correctly describes the code at a high level. However, the inner loop pseudocode on lines 294-301 has an indexing issue:

#            for k in range(group_size):
#                k2 = B_cur_k_start + k
#                w_t = B_cur[:, G_cur_start + k2]

In the actual code (api.py:342-343), k iterates over range(G_k_start - B_cur_k_start, G_k_end - B_cur_k_start) — i.e., k is already a block-local index. The code then indexes directly with B_cur[:, k]. The diagram's k2 = B_cur_k_start + k would produce a global index, and G_cur_start + k2 doesn't match any indexing in the code. Also, G_cur_start is referenced but never defined.

The same issue propagates to the error computation lines, where the diagram uses H_inv_cur[k2, k2] but the code uses Hinv_cur[k, k] (block-local).

Suggested fix for the pseudocode — something like:

#            for k in range(G_start_local, G_end_local):
#                w_t = B_cur[:, k]
#                w_t_qdq = quant_dequant(w_t, base_config, qparams)
#                err1 = (w_t - w_t_qdq) / Hinv_cur[k, k]
#                B_cur[:, k:] -= err1.matmul(Hinv_cur[k, k:])
#                B_cur_Err1[:, k] = err1.flatten()

This is documentation-only; the executable code is correct.

3. Pure Refactor: PASS

Every changed line of executable code is a pure variable rename. Specifically verified:

No operators or operations changed
No indexing logic changed (all slice boundaries use the same expressions with renamed variables)
No statements reordered
No executable lines added or removed
The only non-rename change is a whitespace reformatting of the err1.matmul(...) call from three lines to one line (semantically identical)

Summary

The variable renames and refactor are correct — this is a clean, logic-preserving change. The one item to consider fixing is the inner loop pseudocode in the diagram comment, where the indexing doesn't quite match the actual code's block-local k variable.

[ghstack-poisoned]

Summary: Adding a diagram of the GPTQ update loop to simplify future work on this code. Also, var renames to simplify: * W_t_quantize_block -> B_cur * k_block_start -> B_cur_k_start * k_block_end -> B_cur_k_end * Err1 -> B_cur_Err1 * Hinv_quantize_block -> Hinv_cur * k_group_start -> G_k_start * k_group_end -> G_k_end Test Plan: CI, this is a comment-only diff ghstack-source-id: 5ae51c8 ghstack-comment-id: 4184643773 Pull-Request: #4237

jerryzh168 · 2026-04-03T19:57:47Z

torchao/prototype/gptq/api.py

+    #    * B1, ..., BN are chunks of size [N, B], where B is a hyperparameter of GPTQ
+    #    * G1, ..., GN are chunks of size [N, G], where G is group_size of the quantization recipe
+    #
+    # 2. triple for loop, with every loop chunking along the K dimension:


is this the same thing the code is doing? if so, would it better to just do inline comment for code (including refactor code itself to be easier to read by adding helper functions etc.)?

vkuzo · 2026-04-03T20:23:34Z

I think the diagram is really useful. What you suggested is good as well, I think this PR should land and we could implement your suggestion in a future PR.

…

On Fri, Apr 3, 2026 at 3:58 PM Jerry Zhang ***@***.***> wrote: @ jerryzh168 commented on this pull request. In torchao/prototype/gptq/api. py: > + # + # W_t (below) + # + # |------------------ K --------------------| + # |---B1----|---B2----| .. . + # |-G1-|-G2-|-G1-|-G2-| .. . + # |-----------------------------------------| ***@***.**** commented on this pull request. ------------------------------ In torchao/prototype/gptq/api.py <#4237?email_source=notifications&email_token=AAMMEIMONABDDB2AQP4CM2T4UAJVFA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBVG4YTAMBZG44KM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJL3QOJPXEZLWNFSXOX3DNRUWG2Y#discussion_r3034189810> : > + # + # W_t (below) + # + # |------------------ K --------------------| + # |---B1----|---B2----| ... + # |-G1-|-G2-|-G1-|-G2-| ... + # |-----------------------------------------| + # N | 0 1 2 3 | 4 5 6 7 | ... + # | ... + # |-----------------------------------------| + # + # 1. start with W_t, with shape [N, K] + # * B1, ..., BN are chunks of size [N, B], where B is a hyperparameter of GPTQ + # * G1, ..., GN are chunks of size [N, G], where G is group_size of the quantization recipe + # + # 2. triple for loop, with every loop chunking along the K dimension: is this the same thing the code is doing? if so, would it better to just do inline comment for code (including refactor code itself to be easier to read by adding helper functions etc.)? — Reply to this email directly, view it on GitHub <#4237?email_source=notifications&email_token=AAMMEIKPP6KAHESWMWBI4T34UAJVFA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBVG4YTAMBZG44KM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJPLQOJPXEZLWNFSXOX3ON52GSZTJMNQXI2LPNZZV6Y3MNFRWW#pullrequestreview-4057100978>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMMEIONVMSEW6DKBXH2S2D4UAJVFAVCNFSM6AAAAACXMA74Y6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DANJXGEYDAOJXHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Update

14db1be

[ghstack-poisoned]

vkuzo requested a review from jerryzh168 as a code owner April 3, 2026 18:23

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2026

vkuzo added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Apr 3, 2026

Update

4807cac

[ghstack-poisoned]

jerryzh168 reviewed Apr 3, 2026

View reviewed changes

jerryzh168 approved these changes Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add gptq algorithm diagram#4237

add gptq algorithm diagram#4237
vkuzo wants to merge 2 commits intomainfrom
gh/vkuzo/246/head

vkuzo commented Apr 3, 2026 •

edited

Loading

Uh oh!

vkuzo commented Apr 3, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

vkuzo commented Apr 3, 2026

Uh oh!

claude bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

jerryzh168 Apr 3, 2026

Uh oh!

vkuzo commented Apr 3, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vkuzo commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4237

✅ No Failures

Uh oh!

vkuzo commented Apr 3, 2026

Uh oh!

claude bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification Results

1. Variable Renames: PASS

2. Diagram vs. Code: Minor inaccuracy in pseudocode

3. Pure Refactor: PASS

Summary

Uh oh!

jerryzh168 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

vkuzo commented Apr 3, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkuzo commented Apr 3, 2026 •

edited

Loading

vkuzo commented Apr 3, 2026 •

edited

Loading

pytorch-bot bot commented Apr 3, 2026 •

edited

Loading

claude bot commented Apr 3, 2026 •

edited

Loading