Add 2-bit quantization support to WebGPU GatherBlockQuantized operator by Shivani767 · Pull Request #29074 · microsoft/onnxruntime

Shivani767 · 2026-06-16T11:44:45Z

Description

Adds 2‑bit quantization support to two WebGPU operators:

GatherBlockQuantized: Handles signed 2‑bit quantized data (2's complement, range [-2, 1]) and 2‑bit zero points (including when the packed zero point dimension isn't a multiple of 4). Follows the same patterns as the existing CPU implementation and includes WebGPU‑specific test coverage.
QMoE: Extends the constructor to accept expert_weight_bits_ == 2.

Motivation and Context

Resolves #28895! INT2 quantization is a hot research/industry topic for LLM serving, and this support enables using 2‑bit quantized weights with both WebGPU's GatherBlockQuantized and QMoE operators!

Copilot

Pull request overview

This PR extends WebGPU-side quantized operator support to include 2-bit weights, primarily by updating the WebGPU GatherBlockQuantized shader generation and relaxing QMoE’s constructor validation to accept 2-bit expert weights.

Changes:

Add 2-bit extraction and (attempted) signed handling branches in GatherBlockQuantized WGSL generation (including 2-bit zero-point sign handling).
Allow expert_weight_bits == 2 in the WebGPU QMoE kernel constructor.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
onnxruntime/contrib_ops/webgpu/quantization/gather_block_quantized.cc	Adds 2-bit-specific shader logic for reading packed 2-bit values and adjusts signed zero-point handling logic.
onnxruntime/contrib_ops/webgpu/moe/qmoe.h	Expands accepted `expert_weight_bits` values to include 2-bit in the constructor validation.

+    if (is_signed_) {
+      shader.MainFunctionBody()
+          << "  if((quantized_data & 0x2) != 0) { quantized_data = quantized_data - 4 ;};\n";
+    }


+      if (is_2bit) {
+        shader.MainFunctionBody()
+            << "  if((zero_point & 0x2) != 0) { zero_point = zero_point - 4 ;};\n";
+      } else if (is_4bit) {
+        shader.MainFunctionBody()
+            << "  if((zero_point & 0x8) != 0) { zero_point = zero_point - 16 ;};\n";
+      }


+    ORT_ENFORCE(expert_weight_bits_ == 8 || expert_weight_bits_ == 4 || expert_weight_bits_ == 2,
+                "expert_weight_bits must be 2, 4, or 8, but got ", expert_weight_bits_);


Shivani767 and others added 2 commits June 15, 2026 21:10

Add 2-bit quantization support to WebGPU GatherBlockQuantized operator

3c6f22f

Add 2-bit support to WebGPU QMoE

1a71cab

guschmue added the ep:WebGPU ort-web webgpu provider label Jun 16, 2026

guschmue requested a review from Copilot June 16, 2026 16:56

Copilot started reviewing on behalf of guschmue June 16, 2026 16:57 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 2-bit quantization support to WebGPU GatherBlockQuantized operator#29074

Add 2-bit quantization support to WebGPU GatherBlockQuantized operator#29074
Shivani767 wants to merge 2 commits into
microsoft:mainfrom
Shivani767:webgpu-gather-2bit-support

Shivani767 commented Jun 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		ORT_ENFORCE(expert_weight_bits_ == 8 \|\| expert_weight_bits_ == 4 \|\| expert_weight_bits_ == 2,
		"expert_weight_bits must be 2, 4, or 8, but got ", expert_weight_bits_);

Conversation

Shivani767 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Shivani767 commented Jun 16, 2026 •

edited

Loading