pytorch
diff --git a/‎CLAUDE.md‎
Lines changed: 2 additions & 2 deletions b/‎CLAUDE.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/contributing/contributor_guide.rst‎
Lines changed: 0 additions & 3 deletions b/‎docs/source/contributing/contributor_guide.rst‎
Lines changed: 0 additions & 3 deletions
diff --git a/‎docs/source/contributing/quantization_overview.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/contributing/quantization_overview.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/eager_tutorials/serialization.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/eager_tutorials/serialization.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/eager_tutorials/static_quantization.rst‎
Lines changed: 10 additions & 14 deletions b/‎docs/source/eager_tutorials/static_quantization.rst‎
Lines changed: 10 additions & 14 deletions
diff --git a/‎docs/source/eager_tutorials/subclass_basic.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/eager_tutorials/subclass_basic.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/llms.txt‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/llms.txt‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎test/dtypes/test_affine_quantized.py‎
Lines changed: 0 additions & 140 deletions b/‎test/dtypes/test_affine_quantized.py‎
Lines changed: 0 additions & 140 deletions
@@ -25,13 +25,13 @@ These render at https://docs.pytorch.org/ao/main/
 ## Deprecated APIs
 
 Do not use or recommend these:
-- `AffineQuantizedTensor` (AQT) in `torchao/dtypes/` - old v1 system, being removed
+- `AffineQuantizedTensor` (AQT) - deleted
 - `autoquant()` - deleted
 - Layout registration system (`PlainLayout`, `Float8Layout`, `TensorCoreTiledLayout`, etc.) - deleted
 - `TorchAODType` - deprecated
 - `change_linear_weights_to_int4_woqtensors` - deleted, use `quantize_(model, Int4WeightOnlyConfig())`
 
-New tensor types should inherit from `TorchAOBaseTensor` in `torchao/utils.py`, not AQT.
+New tensor types should inherit from `TorchAOBaseTensor` in `torchao/utils.py`.
 
 ## Development
 
 
@@ -30,9 +30,6 @@ We have utility base class: ``torchao.utils.TorchAOBaseTensor`` that can help de
 
 With the above, we'll have multiple methods and functions available to use for this Tensor, for more details please check the docs for `TorchAOBaseTensor <https://docs.pytorch.org/ao/main/generated/torchao.utils.TorchAOBaseTensor.html#torchao.utils.TorchAOBaseTensor>`__
 
-.. note::
-   Many of the existing use cases in torchao still uses AffineQuantizedTensor, but we plan to move away from it to reduce the abstractions and make it easier for people to contribute to torchao.
-
 Adding Efficient Kernels
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
 
@@ -58,7 +58,7 @@ We'll also have efficient kernels that works with the low precision tensors, for
 * `int_scaled_matmul <https://github.com/pytorch/ao/blob/3e9746cf636e39e3c1ec0de6e0ef2e31f75c4c02/torchao/kernel/intmm.py#L107>`__ that does matmul and also applies a scale to the result.
 
 .. note::
-   We can also rely on torch.compile to generate kernels (through triton), for example the current int8 weight only quantization `kernel <https://github.com/pytorch/ao/blob/e283743b3cc4612bb641b88dca3670231724d396/torchao/dtypes/affine_quantized_tensor.py#L1292-L1309>`__ just relies on torch.compile to get speedup. In this case there is no custom handwritten "efficient kernel" that's corresponding to the type of quantization.
+   We can also rely on torch.compile to generate kernels (through triton), for example the int8 weight only quantization kernel just relies on torch.compile to get speedup. In this case there is no custom handwritten "efficient kernel" that's corresponding to the type of quantization.
 
 Quantized Tensors (derived dtypes and packing format)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -91,9 +91,9 @@ To deserialize an optimized model, we can initialize the floating point model in
 
 The reason we initialize the model in ``meta`` device is to avoid initializing the original floating point model since original floating point model may not fit into the device that we want to use for inference.
 
-What happens in ``m_loaded.load_state_dict(state_dict, assign=True)`` is that the corresponding weights (e.g. m_loaded.linear1.weight) are updated with the Tensors in ``state_dict``, which is an optimized tensor subclass instance (e.g. int4 ``AffineQuantizedTensor``). No dependency on torchao is needed for this to work.
+What happens in ``m_loaded.load_state_dict(state_dict, assign=True)`` is that the corresponding weights (e.g. m_loaded.linear1.weight) are updated with the Tensors in ``state_dict``, which is an optimized tensor subclass instance (e.g. ``Int4Tensor``). No dependency on torchao is needed for this to work.
 
 We can also verify that the weight is properly loaded by checking the type of weight tensor::
 
   type of weight before loading: (<class 'torch.Tensor'>, <class 'torch.Tensor'>)
-  type of weight after loading: (<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>, <class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>)
+  type of weight after loading: (<class 'torchao.quantization.Int4Tensor'>, <class 'torchao.quantization.Int4Tensor'>)
@@ -139,11 +139,12 @@ Now we are ready to calibrate the model, which populates the observers we insert
 Quantization Phase
 ~~~~~~~~~~~~~~~~~~
 
-There are multiple ways to actually quantize the model. Here we walk through the simpler alternative, which is to define a `QuantizedLinear` class that we will swap our `ObservedLinear` to. Defining this new class isn't strictly necessary. For an alternative method that simply uses the existing `torch.nn.Linear`, please see the full `example script <https://github.com/pytorch/ao/tree/main/tutorials/calibration_flow/static_quant.py>`__.
+There are multiple ways to actually quantize the model. Here we walk through the simpler alternative, which is to define a `QuantizedLinear` class that we will swap our `ObservedLinear` to.
 
 .. code:: py
 
-   from torchao.dtypes import to_affine_quantized_intx_static
+   from torchao.quantization import Int8Tensor
+   from torchao.quantization import PerRow, PerTensor
 
    class QuantizedLinear(torch.nn.Module):
        def __init__(
@@ -154,27 +155,22 @@ There are multiple ways to actually quantize the model. Here we walk through the
            weight_obs: torch.nn.Module,
            weight: torch.Tensor,
            bias: torch.Tensor,
-           target_dtype: torch.dtype,
        ):
            super().__init__()
            self.act_scale, self.act_zero_point = act_obs.calculate_qparams()
            weight_scale, weight_zero_point = weight_obs.calculate_qparams()
-           assert weight.dim() == 2
-           block_size = (1, weight.shape[1])
-           self.target_dtype = target_dtype
            self.bias = bias
-           self.qweight = to_affine_quantized_intx_static(
-               weight, weight_scale, weight_zero_point, block_size, self.target_dtype
+           self.qweight = Int8Tensor.from_hp(
+               weight, granularity=PerRow(),
+               scale=weight_scale, zero_point=weight_zero_point,
            )
 
        def forward(self, input: torch.Tensor):
-           block_size = input.shape
-           qinput = to_affine_quantized_intx_static(
+           qinput = Int8Tensor.from_hp(
                input,
-               self.act_scale,
-               self.act_zero_point,
-               block_size,
-               self.target_dtype,
+               granularity=PerTensor(),
+               scale=self.act_scale,
+               zero_point=self.act_zero_point,
            )
            return F.linear(qinput, self.qweight, self.bias)
 
 
@@ -193,7 +193,7 @@ purposes as an extension point:
       (`DTensor <https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/_api.py#L217>`__,
       `docs <https://pytorch.org/docs/stable/distributed.tensor.html#pytorch-dtensor-distributed-tensor>`__)
    2) [quantization] scale/zero_point metadata
-      (`AffineQuantizedTensor <https://github.com/pytorch/ao/blob/v0.8.0/torchao/dtypes/affine_quantized_tensor.py#L46>`__)
+      (e.g. `Int8Tensor <https://github.com/pytorch/ao/blob/main/torchao/quantization/quantize_/workflows/int8/int8_tensor.py>`__)
    3) [raggedness] metadata on ragged structure
       (`NestedTensor <https://github.com/pytorch/pytorch/blob/main/torch/nested/_internal/nested_tensor.py#L53>`__,
       `docs <https://pytorch.org/tutorials/prototype/nestedtensor.html#getting-started-with-nested-tensors>`__)
@@ -455,7 +455,7 @@ subclass. This is part one of two tutorials in this series. The
 `next post <subclass_advanced.html>`__ will discuss how to add more advanced
 features to your tensor subclass, such as making it trainable, composing
 with DTensors, and adding tensor parallelism support. For a more detailed
-example of how `AffineQuantizedTensor` in torchao was built using tensor
+example of how quantized tensors in torchao are built using tensor
 subclasses, also check out `this example <https://github.com/pytorch/ao/blob/main/tutorials/developer_api_guide/my_dtype_tensor_subclass.py>`__.
 
 If you have any questions while implementing your subclass, feel free to
 
@@ -29,7 +29,7 @@
 ## Deprecated APIs
 
 Do not use or recommend these:
-- `AffineQuantizedTensor` (AQT) in `torchao/dtypes/` - old v1 system, being removed. New tensor types inherit from `TorchAOBaseTensor`
+- `AffineQuantizedTensor` (AQT) - deleted. New tensor types inherit from `TorchAOBaseTensor`
 - `autoquant()` - deleted
 - Layout registration system (`PlainLayout`, `Float8Layout`, `TensorCoreTiledLayout`, etc.) - deleted
 - `TorchAODType` - deprecated