Skip to content

Commit e0babea

Browse files
authored
[Docs] Clean up architecture docs: remove duplicates, fix stale content (#19399)
- Remove duplicate `tvm/s_tir/meta_schedule` and `tvm/s_tir/dlight` sections from `arch/index.rst` (already covered in the `tvm/s_tir` section with cross-reference to TensorIR Deep Dive) - Remove duplicate `device_target_interactions` toctree entry (was listed under both `tvm/runtime` and `tvm/target`; keep only under `tvm/target`) - Remove duplicate CUDA pipeline listing in `arch/fusion.rst` "How Backends Use Fusion" section (already shown in Overview); add cross-reference to BYOC doc - Remove duplicated intro sentences in `arch/relax_vm.rst` that were identical to `arch/index.rst` - Fix `R.call_dps` → `R.call_dps_packed` (the former does not exist) - Replace outdated GraphExecutor example (`set_input`/`run`/`get_output`) with Relax VM example (GraphExecutor has been removed from the codebase) - Replace broken external `mlc.ai` image link (returns 404) with local image in `deep_dive/relax/learning.rst` - Fix stale `use pass instrument` link in `arch/pass_infra.rst` that pointed to an unrelated page
1 parent b465646 commit e0babea

6 files changed

Lines changed: 18 additions & 43 deletions

File tree

104 KB
Loading

docs/arch/fusion.rst

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -345,10 +345,7 @@ How Backends Use Fusion
345345
-----------------------
346346

347347
The default backend pipelines (CUDA, ROCm, CPU, etc.) all include ``FuseOps`` + ``FuseTIR``
348-
in their ``legalize_passes`` phase for automatic fusion. For example, the CUDA pipeline
349-
(``python/tvm/relax/backend/cuda/pipeline.py``) runs::
350-
351-
LegalizeOps → AnnotateTIROpPattern → FoldConstant → FuseOps → FuseTIR → DLight
348+
in their ``legalize_passes`` phase for automatic fusion, as shown in the `Overview`_ above.
352349

353350
For external library dispatch (cuBLAS, CUTLASS, cuDNN, DNNL), ``FuseOpsByPattern`` is used
354351
separately. These are **not** included in the default pipeline — users add them explicitly
@@ -358,7 +355,7 @@ when building a custom compilation flow. The typical sequence is:
358355
offloaded to external libraries. For example, CUTLASS patterns match
359356
matmul+bias+activation combinations (``python/tvm/relax/backend/cuda/cutlass.py``).
360357
Functions marked by patterns are annotated with ``Composite`` and optionally ``Codegen``
361-
attributes.
358+
attributes. See :ref:`external-library-dispatch` for the full BYOC pipeline.
362359

363360
2. **Automatic fusion** (``FuseOps`` + ``FuseTIR``): remaining operators that were not
364361
matched by backend patterns are fused automatically based on their pattern kinds.

docs/arch/index.rst

Lines changed: 11 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ contains a collection of functions. Currently, we support two primary variants o
6868
threading, and vector/tensor instructions. It is usually used to represent an operator program that executes a (possibly-fused) layer in a model.
6969

7070
During the compilation and transformation, all relax operators are lowered to ``tirx::PrimFunc`` or ``TVM PackedFunc``, which can be executed directly
71-
on the target device, while the calls to relax operators are lowered to calls to low-level functions (e.g. ``R.call_tir`` or ``R.call_dps``).
71+
on the target device, while the calls to relax operators are lowered to calls to low-level functions (e.g. ``R.call_tir`` or ``R.call_dps_packed``).
7272

7373
Transformations
7474
~~~~~~~~~~~~~~~
@@ -160,22 +160,19 @@ following types: POD types(int, float), string, runtime.PackedFunc, runtime.Modu
160160

161161
:py:class:`tvm.runtime.Module` and :py:class:`tvm.runtime.PackedFunc` are powerful mechanisms to modularize the runtime. For example, to get the above `addone` function on CUDA, we can use LLVM to generate the host-side code to compute the launching parameters(e.g. size of the thread groups) and then call into another PackedFunc from a CUDAModule that is backed by the CUDA driver API. The same mechanism can be used for OpenCL kernels.
162162

163-
The above example only deals with a simple `addone` function. The code snippet below gives an example of an end-to-end model execution using the same interface:
163+
The above example only deals with a simple `addone` function. The code snippet below gives an example of an end-to-end model execution using the Relax Virtual Machine, which is built on the same runtime.Module and runtime.PackedFunc interface:
164164

165165
.. code-block:: python
166166
167167
import tvm
168-
# Example runtime execution program in python, with types annotated
169-
factory: tvm.runtime.Module = tvm.runtime.load_module("resnet18.so")
170-
# Create a stateful graph execution module for resnet18 on cuda(0)
171-
gmod: tvm.runtime.Module = factory["resnet18"](tvm.cuda(0))
168+
from tvm import relax
169+
# Load the compiled artifact
170+
mod: tvm.runtime.Module = tvm.runtime.load_module("resnet18.so")
171+
# Create a VM instance on cuda(0)
172+
vm = relax.VirtualMachine(mod, tvm.cuda(0))
172173
data: tvm.runtime.Tensor = get_input_data()
173-
# set input
174-
gmod["set_input"](0, data)
175-
# execute the model
176-
gmod["run"]()
177-
# get the output
178-
result = gmod["get_output"](0).numpy()
174+
# Run the model — vm["main"] returns a PackedFunc
175+
result = vm["main"](data).numpy()
179176
180177
The main take away is that runtime.Module and runtime.PackedFunc are sufficient to encapsulate both operator level programs (such as addone), as well as the end-to-end models.
181178

@@ -236,10 +233,9 @@ for learning-based optimizations.
236233
:maxdepth: 1
237234

238235
introduction_to_module_serialization
239-
device_target_interactions
240236

241237
Relax Virtual Machine
242-
^^^^^^^^^^^^^^^^^^^^^
238+
~~~~~~~~~~~~~~~~~~~~~
243239

244240
Relax defines *what* to compute — it is a graph-level IR that describes the operators and dataflow
245241
of a model. The Relax Virtual Machine (VM) handles *how* to run it — it is the runtime component
@@ -257,7 +253,7 @@ pipeline, instruction set details, execution model, and Python interface.
257253
relax_vm
258254

259255
Disco: Distributed Runtime
260-
^^^^^^^^^^^^^^^^^^^^^^^^^^
256+
~~~~~~~~~~~~~~~~~~~~~~~~~~
261257

262258
Disco is TVM's distributed runtime for executing models across multiple devices. When a model is
263259
too large to fit on a single GPU, the ``relax.distributed`` module annotates how tensors should be
@@ -416,18 +412,3 @@ and then integrate it into the IRModule.
416412
While possible to construct operators directly via TensorIR or tensor expressions (TE) for each use case, it is tedious to do so.
417413
`topi` (Tensor operator inventory) provides a set of pre-defined operators defined by numpy and found in common deep learning workloads.
418414

419-
tvm/s_tir/meta_schedule
420-
-----------------------
421-
422-
MetaSchedule is a system for automated search-based program optimization,
423-
and can be used to optimize TensorIR schedules. Note that MetaSchedule only works with static-shape workloads.
424-
425-
tvm/s_tir/dlight
426-
----------------
427-
428-
DLight is a set of pre-defined, easy-to-use, and performant s_tir schedules. DLight aims:
429-
430-
- Fully support **dynamic shape workloads**.
431-
- **Light weight**. DLight schedules provides tuning-free schedule with reasonable performance.
432-
- **Robust**. DLight schedules are designed to be robust and general-purpose for a single rule. And if the rule is not applicable,
433-
DLight not raise any error and switch to the next rule automatically.

docs/arch/pass_infra.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,7 @@ Note that it is recommended to use the ``pass_instrument`` decorator to implemen
617617
``PassInstrument`` instances can be registered through ``instruments`` argument in
618618
:py:class:`tvm.transform.PassContext`.
619619

620-
`use pass instrument`_ tutorial provides examples for how to implement ``PassInstrument`` with Python APIs.
620+
See `python/tvm/ir/instrument.py`_ for examples of how to implement ``PassInstrument`` with Python APIs.
621621

622622
.. _pass_instrument_overriden:
623623

@@ -668,4 +668,3 @@ new ``PassInstrument`` are called.
668668

669669
.. _use pass infra: https://github.com/apache/tvm/blob/main/docs/how_to/tutorials/customize_opt.py
670670

671-
.. _use pass instrument: https://github.com/apache/tvm/blob/main/docs/how_to/dev/index.rst

docs/arch/relax_vm.rst

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,9 @@
2020
Relax Virtual Machine
2121
=====================
2222

23-
Relax defines *what* to compute — it is a graph-level IR that describes the operators and dataflow
24-
of a model. The Relax Virtual Machine (VM) handles *how* to run it — it is the runtime component
25-
that executes the compiled result. This document explains the VM architecture in detail, covering
26-
the compilation pipeline from Relax IR to bytecode, the instruction set, the execution model, and
27-
the Python-level user interface.
23+
This document explains the Relax VM architecture in detail, covering the compilation pipeline
24+
from Relax IR to bytecode, the instruction set, the execution model, and the Python-level user
25+
interface.
2826

2927
Overview
3028
--------

docs/deep_dive/relax/learning.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ In this chapter, we will use the following model as an example. This is
3232
a two-layer neural network that consists of two linear operations with
3333
relu activation.
3434

35-
.. image:: https://mlc.ai/_images/e2e_fashionmnist_mlp_model.png
35+
.. image:: /_static/img/e2e_fashionmnist_mlp_model.png
3636
:width: 85%
3737
:align: center
3838

0 commit comments

Comments
 (0)