Skip to content

Commit 3326269

Browse files
committed
Delete AffineQuantizedTensor, AQTTensorImpl, and Layout
**Summary:** AffineQuantizedTensor was the v1 quantized tensor system, now fully superseded by v2 tensor types (Int8Tensor, Int4Tensor, Float8Tensor, IntxUnpackedToInt8Tensor, etc.) that inherit from TorchAOBaseTensor. **BC-Breaking notes:** Before (AQT): ```python from torchao.dtypes import to_affine_quantized_intx from torchao.quantization import quantize_, Int4WeightOnlyConfig # Low-level AQT API weight = to_affine_quantized_intx( weight, mapping_type, block_size, target_dtype, quant_min, quant_max, eps, _layout=Layout(), ) # High-level API (unchanged) quantize_(model, Int4WeightOnlyConfig()) ``` After (v2 tensors): ```python from torchao.quantization import quantize_, Int4WeightOnlyConfig # High-level API (unchanged, recommended) quantize_(model, Int4WeightOnlyConfig()) # Low-level v2 API (if needed) from torchao.quantization import Int4Tensor, IntxUnpackedToInt8Tensor weight = Int4Tensor.from_hp(weight, block_size) weight = IntxUnpackedToInt8Tensor.from_hp(weight, block_size, torch.int4) ``` **Detailed changes:** Core deletions: - torchao/dtypes/affine_quantized_tensor.py (class definition) - torchao/dtypes/affine_quantized_tensor_ops.py (aten dispatch) - torchao/dtypes/floatx/, torchao/dtypes/uintx/ (empty subpackages) - torchao/dtypes/README.md (stale AQT-centric docs) - torchao/dtypes/utils.py: removed Layout class and AQTTensorImpl class - torchao/dtypes/__init__.py: removed all AQT and Layout exports - torchao/utils.py: removed _register_layout, _get_tensor_impl_constructor, and their classmethod registrations on TorchAOBaseTensor - test/dtypes/test_affine_quantized.py - test/dtypes/test_affine_quantized_tensor_parallel.py Core updates: - quant_api.py: removed AQT from _is_linear check, removed 5 dead activation quant helpers - testing/utils.py: switched defaults from AQT to Int8Tensor - Updated test assertions, docstrings, and docs to remove AQT references Prototype updates: - prototype/autoround/: removed broken AQT imports, updated isinstance checks to TorchAOBaseTensor. Everything works except apply_auto_round() which was already broken before this PR (issue #1690). - prototype/dtypes/uintx/uintx_utils.py: removed UintxLayout, UintxAQTTensorImpl, and AQT imports (fixes codebook import breakage) - prototype/quantization/mixed_precision/: added assertion error since feature was already broken by PlainLayout deletion (#4151) Still broken (tracked with TODOs): - tutorials/calibration_flow/ (uses to_affine_quantized_intx_static) - tutorials/developer_api_guide/ (uses Layout) Docs/comments only (not broken, just stale references): - prototype/quantization/module_swap/ (README) - prototype/parq/ (README) - prototype/quantized_training/ (comments)
1 parent 6e7a6e9 commit 3326269

45 files changed

Lines changed: 124 additions & 1928 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CLAUDE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,13 @@ These render at https://docs.pytorch.org/ao/main/
2525
## Deprecated APIs
2626

2727
Do not use or recommend these:
28-
- `AffineQuantizedTensor` (AQT) in `torchao/dtypes/` - old v1 system, being removed
28+
- `AffineQuantizedTensor` (AQT) - deleted
2929
- `autoquant()` - deleted
3030
- Layout registration system (`PlainLayout`, `Float8Layout`, `TensorCoreTiledLayout`, etc.) - deleted
3131
- `TorchAODType` - deprecated
3232
- `change_linear_weights_to_int4_woqtensors` - deleted, use `quantize_(model, Int4WeightOnlyConfig())`
3333

34-
New tensor types should inherit from `TorchAOBaseTensor` in `torchao/utils.py`, not AQT.
34+
New tensor types should inherit from `TorchAOBaseTensor` in `torchao/utils.py`.
3535

3636
## Development
3737

docs/source/contributing/contributor_guide.rst

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,6 @@ We have utility base class: ``torchao.utils.TorchAOBaseTensor`` that can help de
3030

3131
With the above, we'll have multiple methods and functions available to use for this Tensor, for more details please check the docs for `TorchAOBaseTensor <https://docs.pytorch.org/ao/main/generated/torchao.utils.TorchAOBaseTensor.html#torchao.utils.TorchAOBaseTensor>`__
3232

33-
.. note::
34-
Many of the existing use cases in torchao still uses AffineQuantizedTensor, but we plan to move away from it to reduce the abstractions and make it easier for people to contribute to torchao.
35-
3633
Adding Efficient Kernels
3734
~~~~~~~~~~~~~~~~~~~~~~~~
3835

docs/source/contributing/quantization_overview.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ We'll also have efficient kernels that works with the low precision tensors, for
5858
* `int_scaled_matmul <https://github.com/pytorch/ao/blob/3e9746cf636e39e3c1ec0de6e0ef2e31f75c4c02/torchao/kernel/intmm.py#L107>`__ that does matmul and also applies a scale to the result.
5959

6060
.. note::
61-
We can also rely on torch.compile to generate kernels (through triton), for example the current int8 weight only quantization `kernel <https://github.com/pytorch/ao/blob/e283743b3cc4612bb641b88dca3670231724d396/torchao/dtypes/affine_quantized_tensor.py#L1292-L1309>`__ just relies on torch.compile to get speedup. In this case there is no custom handwritten "efficient kernel" that's corresponding to the type of quantization.
61+
We can also rely on torch.compile to generate kernels (through triton), for example the int8 weight only quantization kernel just relies on torch.compile to get speedup. In this case there is no custom handwritten "efficient kernel" that's corresponding to the type of quantization.
6262

6363
Quantized Tensors (derived dtypes and packing format)
6464
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

docs/source/eager_tutorials/serialization.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,9 +91,9 @@ To deserialize an optimized model, we can initialize the floating point model in
9191

9292
The reason we initialize the model in ``meta`` device is to avoid initializing the original floating point model since original floating point model may not fit into the device that we want to use for inference.
9393

94-
What happens in ``m_loaded.load_state_dict(state_dict, assign=True)`` is that the corresponding weights (e.g. m_loaded.linear1.weight) are updated with the Tensors in ``state_dict``, which is an optimized tensor subclass instance (e.g. int4 ``AffineQuantizedTensor``). No dependency on torchao is needed for this to work.
94+
What happens in ``m_loaded.load_state_dict(state_dict, assign=True)`` is that the corresponding weights (e.g. m_loaded.linear1.weight) are updated with the Tensors in ``state_dict``, which is an optimized tensor subclass instance (e.g. ``Int4Tensor``). No dependency on torchao is needed for this to work.
9595

9696
We can also verify that the weight is properly loaded by checking the type of weight tensor::
9797

9898
type of weight before loading: (<class 'torch.Tensor'>, <class 'torch.Tensor'>)
99-
type of weight after loading: (<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>, <class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>)
99+
type of weight after loading: (<class 'torchao.quantization.Int4Tensor'>, <class 'torchao.quantization.Int4Tensor'>)

docs/source/eager_tutorials/static_quantization.rst

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -139,11 +139,12 @@ Now we are ready to calibrate the model, which populates the observers we insert
139139
Quantization Phase
140140
~~~~~~~~~~~~~~~~~~
141141

142-
There are multiple ways to actually quantize the model. Here we walk through the simpler alternative, which is to define a `QuantizedLinear` class that we will swap our `ObservedLinear` to. Defining this new class isn't strictly necessary. For an alternative method that simply uses the existing `torch.nn.Linear`, please see the full `example script <https://github.com/pytorch/ao/tree/main/tutorials/calibration_flow/static_quant.py>`__.
142+
There are multiple ways to actually quantize the model. Here we walk through the simpler alternative, which is to define a `QuantizedLinear` class that we will swap our `ObservedLinear` to.
143143

144144
.. code:: py
145145
146-
from torchao.dtypes import to_affine_quantized_intx_static
146+
from torchao.quantization import Int8Tensor
147+
from torchao.quantization import PerRow, PerTensor
147148
148149
class QuantizedLinear(torch.nn.Module):
149150
def __init__(
@@ -154,27 +155,22 @@ There are multiple ways to actually quantize the model. Here we walk through the
154155
weight_obs: torch.nn.Module,
155156
weight: torch.Tensor,
156157
bias: torch.Tensor,
157-
target_dtype: torch.dtype,
158158
):
159159
super().__init__()
160160
self.act_scale, self.act_zero_point = act_obs.calculate_qparams()
161161
weight_scale, weight_zero_point = weight_obs.calculate_qparams()
162-
assert weight.dim() == 2
163-
block_size = (1, weight.shape[1])
164-
self.target_dtype = target_dtype
165162
self.bias = bias
166-
self.qweight = to_affine_quantized_intx_static(
167-
weight, weight_scale, weight_zero_point, block_size, self.target_dtype
163+
self.qweight = Int8Tensor.from_hp(
164+
weight, granularity=PerRow(),
165+
scale=weight_scale, zero_point=weight_zero_point,
168166
)
169167
170168
def forward(self, input: torch.Tensor):
171-
block_size = input.shape
172-
qinput = to_affine_quantized_intx_static(
169+
qinput = Int8Tensor.from_hp(
173170
input,
174-
self.act_scale,
175-
self.act_zero_point,
176-
block_size,
177-
self.target_dtype,
171+
granularity=PerTensor(),
172+
scale=self.act_scale,
173+
zero_point=self.act_zero_point,
178174
)
179175
return F.linear(qinput, self.qweight, self.bias)
180176

docs/source/eager_tutorials/subclass_basic.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ purposes as an extension point:
193193
(`DTensor <https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/_api.py#L217>`__,
194194
`docs <https://pytorch.org/docs/stable/distributed.tensor.html#pytorch-dtensor-distributed-tensor>`__)
195195
2) [quantization] scale/zero_point metadata
196-
(`AffineQuantizedTensor <https://github.com/pytorch/ao/blob/v0.8.0/torchao/dtypes/affine_quantized_tensor.py#L46>`__)
196+
(e.g. `Int8Tensor <https://github.com/pytorch/ao/blob/main/torchao/quantization/quantize_/workflows/int8/int8_tensor.py>`__)
197197
3) [raggedness] metadata on ragged structure
198198
(`NestedTensor <https://github.com/pytorch/pytorch/blob/main/torch/nested/_internal/nested_tensor.py#L53>`__,
199199
`docs <https://pytorch.org/tutorials/prototype/nestedtensor.html#getting-started-with-nested-tensors>`__)
@@ -455,7 +455,7 @@ subclass. This is part one of two tutorials in this series. The
455455
`next post <subclass_advanced.html>`__ will discuss how to add more advanced
456456
features to your tensor subclass, such as making it trainable, composing
457457
with DTensors, and adding tensor parallelism support. For a more detailed
458-
example of how `AffineQuantizedTensor` in torchao was built using tensor
458+
example of how quantized tensors in torchao are built using tensor
459459
subclasses, also check out `this example <https://github.com/pytorch/ao/blob/main/tutorials/developer_api_guide/my_dtype_tensor_subclass.py>`__.
460460

461461
If you have any questions while implementing your subclass, feel free to

docs/source/llms.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
## Deprecated APIs
3030

3131
Do not use or recommend these:
32-
- `AffineQuantizedTensor` (AQT) in `torchao/dtypes/` - old v1 system, being removed. New tensor types inherit from `TorchAOBaseTensor`
32+
- `AffineQuantizedTensor` (AQT) - deleted. New tensor types inherit from `TorchAOBaseTensor`
3333
- `autoquant()` - deleted
3434
- Layout registration system (`PlainLayout`, `Float8Layout`, `TensorCoreTiledLayout`, etc.) - deleted
3535
- `TorchAODType` - deprecated

test/dtypes/test_affine_quantized.py

Lines changed: 0 additions & 140 deletions
This file was deleted.

0 commit comments

Comments
 (0)