Skip to content

add gptq algorithm diagram#4237

Open
vkuzo wants to merge 2 commits intomainfrom
gh/vkuzo/246/head
Open

add gptq algorithm diagram#4237
vkuzo wants to merge 2 commits intomainfrom
gh/vkuzo/246/head

Conversation

@vkuzo
Copy link
Copy Markdown
Contributor

@vkuzo vkuzo commented Apr 3, 2026

Summary:

Adding a diagram of the GPTQ update loop to simplify future work on this
code.

Also, var renames to simplify:

  • W_t_quantize_block -> B_cur
  • k_block_start -> B_cur_k_start
  • k_block_end -> B_cur_k_end
  • Err1 -> B_cur_Err1
  • Hinv_quantize_block -> Hinv_cur
  • k_group_start -> G_k_start
  • k_group_end -> G_k_end

Test Plan: CI, GPTQ output did not change locally before+after

[ghstack-poisoned]
@vkuzo
Copy link
Copy Markdown
Contributor Author

vkuzo commented Apr 3, 2026

Stack from ghstack (oldest at bottom):

@vkuzo vkuzo requested a review from jerryzh168 as a code owner April 3, 2026 18:23
vkuzo added a commit that referenced this pull request Apr 3, 2026
Summary:

Adding a diagram of the GPTQ update loop to simplify future work on this
code.

Also, var renames to simplify:
* W_t_quantize_block -> B_cur
* k_block_start -> B_cur_k_start
* k_block_end -> B_cur_k_end
* Err1 -> B_cur_Err1
* Hinv_quantize_block -> Hinv_cur
* k_group_start -> G_k_start
* k_group_end -> G_k_end

Test Plan: CI, this is a comment-only diff
ghstack-source-id: 24d0909
ghstack-comment-id: 4184643773
Pull-Request: #4237
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 3, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4237

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4807cac with merge base 6e7a6e9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2026
@vkuzo
Copy link
Copy Markdown
Contributor Author

vkuzo commented Apr 3, 2026

@claude verify that all var renames are correct, that the diagram matches the code, and that this is a pure refactor (no logic change)

@vkuzo vkuzo added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Apr 3, 2026
@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Claude finished @vkuzo's task in 8m 58s —— View job


Verification Results

  • Read the changed file and diff
  • Verify all variable renames are correct and consistent
  • Verify the diagram matches the code
  • Confirm this is a pure refactor (no logic changes)

1. Variable Renames: PASS

All 7 renames were applied completely and consistently. Grep confirms zero occurrences of any old variable name remain in the file:

Old Name New Name Status
W_t_quantize_block B_cur All replaced
k_block_start B_cur_k_start All replaced
k_block_end B_cur_k_end All replaced
Err1 B_cur_Err1 All replaced
Hinv_quantize_block Hinv_cur All replaced
k_group_start G_k_start All replaced
k_group_end G_k_end All replaced

2. Diagram vs. Code: Minor inaccuracy in pseudocode

The diagram structure (triple for loop, shapes, batch propagation) correctly describes the code at a high level. However, the inner loop pseudocode on lines 294-301 has an indexing issue:

#            for k in range(group_size):
#                k2 = B_cur_k_start + k
#                w_t = B_cur[:, G_cur_start + k2]

In the actual code (api.py:342-343), k iterates over range(G_k_start - B_cur_k_start, G_k_end - B_cur_k_start) — i.e., k is already a block-local index. The code then indexes directly with B_cur[:, k]. The diagram's k2 = B_cur_k_start + k would produce a global index, and G_cur_start + k2 doesn't match any indexing in the code. Also, G_cur_start is referenced but never defined.

The same issue propagates to the error computation lines, where the diagram uses H_inv_cur[k2, k2] but the code uses Hinv_cur[k, k] (block-local).

Suggested fix for the pseudocode — something like:

#            for k in range(G_start_local, G_end_local):
#                w_t = B_cur[:, k]
#                w_t_qdq = quant_dequant(w_t, base_config, qparams)
#                err1 = (w_t - w_t_qdq) / Hinv_cur[k, k]
#                B_cur[:, k:] -= err1.matmul(Hinv_cur[k, k:])
#                B_cur_Err1[:, k] = err1.flatten()

This is documentation-only; the executable code is correct.

3. Pure Refactor: PASS

Every changed line of executable code is a pure variable rename. Specifically verified:

  • No operators or operations changed
  • No indexing logic changed (all slice boundaries use the same expressions with renamed variables)
  • No statements reordered
  • No executable lines added or removed
  • The only non-rename change is a whitespace reformatting of the err1.matmul(...) call from three lines to one line (semantically identical)

Summary

The variable renames and refactor are correct — this is a clean, logic-preserving change. The one item to consider fixing is the inner loop pseudocode in the diagram comment, where the indexing doesn't quite match the actual code's block-local k variable.

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Apr 3, 2026
Summary:

Adding a diagram of the GPTQ update loop to simplify future work on this
code.

Also, var renames to simplify:
* W_t_quantize_block -> B_cur
* k_block_start -> B_cur_k_start
* k_block_end -> B_cur_k_end
* Err1 -> B_cur_Err1
* Hinv_quantize_block -> Hinv_cur
* k_group_start -> G_k_start
* k_group_end -> G_k_end

Test Plan: CI, this is a comment-only diff
ghstack-source-id: 5ae51c8
ghstack-comment-id: 4184643773
Pull-Request: #4237
# * B1, ..., BN are chunks of size [N, B], where B is a hyperparameter of GPTQ
# * G1, ..., GN are chunks of size [N, G], where G is group_size of the quantization recipe
#
# 2. triple for loop, with every loop chunking along the K dimension:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the same thing the code is doing? if so, would it better to just do inline comment for code (including refactor code itself to be easier to read by adding helper functions etc.)?

@vkuzo
Copy link
Copy Markdown
Contributor Author

vkuzo commented Apr 3, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants