Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
56af575
research: add autoconfig POC with QNN NPU catalog sweep results
github-actions[bot] Jun 15, 2026
76bb07b
research: add winml-cli agent layer design doc
github-actions[bot] Jun 15, 2026
4a6ef5b
research: add WinML CLI Skills Design Doc
github-actions[bot] Jun 15, 2026
6de0e6b
research: reorganize skills doc into user/contributor + merge overlap…
github-actions[bot] Jun 15, 2026
ea3911e
research: rigorous review and correction of ep_knowledge findings
github-actions[bot] Jun 16, 2026
5c9cea7
research: validation sweep confirms npu-001 is DINOv2-specific, not g…
github-actions[bot] Jun 16, 2026
28031c7
research(autoconfig): correct npu KB based on review -- npu-001 mecha…
github-actions[bot] Jun 16, 2026
0976d39
research(autoconfig): Transpose analysis — npu-001 mechanism confirme…
github-actions[bot] Jun 16, 2026
967ddcc
research(autoconfig): extended model sweep — npu-001 scope confirmed …
github-actions[bot] Jun 16, 2026
b3c0856
research(autoconfig): correct cpu/dml/qnn_gpu KB -- remove invalid fi…
github-actions[bot] Jun 17, 2026
9f31db7
research(autoconfig): update skills-design with validated KB findings
github-actions[bot] Jun 17, 2026
21dda6a
research(autoconfig): add operational constraints to autoconfig skill…
github-actions[bot] Jun 17, 2026
e903f67
research(autoconfig): add npu-006/npu-007 constraints to catalog swee…
github-actions[bot] Jun 17, 2026
0cd43d9
research(autoconfig): fix 5 bugs found in code review + add bench_utils
github-actions[bot] Jun 17, 2026
59e7329
research(autoconfig): add VerdictPolicy, screen early exit, crash-res…
github-actions[bot] Jun 17, 2026
da32a88
research(autoconfig): update diagram + agent-design, add ep-findings-…
github-actions[bot] Jun 17, 2026
ef0b48b
research(autoconfig): hide single-model findings by default in ep-fin…
github-actions[bot] Jun 17, 2026
da6b29c
research(autoconfig): fix missing .finding CSS selector in ep-finding…
github-actions[bot] Jun 17, 2026
2c38721
research(autoconfig): rename --report to --json, drop dml-004 FR row
github-actions[bot] Jun 17, 2026
5c5684a
research(autoconfig): trim autoconfig_diagram.html — shorter bullets,…
github-actions[bot] Jun 17, 2026
e18e066
research(autoconfig): add Pending Features badge to Phase 3 in diagram
github-actions[bot] Jun 17, 2026
526fcd0
research(autoconfig): add local PyTorch reference FR; clarify correct…
github-actions[bot] Jun 17, 2026
9d5148e
research(autoconfig): fix Phase 0 layout — nowrap, 3 equal-width boxes
github-actions[bot] Jun 17, 2026
408c647
research(autoconfig): pending features badge — outcome-focused descri…
github-actions[bot] Jun 17, 2026
d761741
research(autoconfig): Phase 3 -> Outcome; Feature Requirements badge …
github-actions[bot] Jun 17, 2026
e9f1cf4
research(autoconfig): align Feature Requirements badge style with oth…
github-actions[bot] Jun 17, 2026
d242fdd
research(autoconfig): simplify Feature Requirements badge to issue nu…
github-actions[bot] Jun 17, 2026
0940ef5
research(autoconfig): feature requirements badge — issue numbers in c…
github-actions[bot] Jun 17, 2026
eeef0b9
research(autoconfig): feature requirements badge — show issue templat…
github-actions[bot] Jun 17, 2026
75bea5a
research(autoconfig): simplify Insight Engine boxes to concept + exam…
github-actions[bot] Jun 17, 2026
2dd7f05
research(autoconfig): implement Phase 1 Insight Engine + Phase 3 repo…
github-actions[bot] Jun 17, 2026
0ef818c
research(autoconfig): expand graph analysis + hypothesis matrix from …
github-actions[bot] Jun 17, 2026
7765775
research(autoconfig): add --only-hypotheses + --reuse-h0-config flags…
github-actions[bot] Jun 17, 2026
c3acb85
research(autoconfig): add QNN GPU sweep script (catalog_gpu_sweep.py)
github-actions[bot] Jun 17, 2026
bb34c9d
research(autoconfig): add Phase C confirmation pass to GPU + NPU sweeps
github-actions[bot] Jun 17, 2026
cc25e31
fix(autoconfig): fix GPU sweep perf flag and JSON parsing
github-actions[bot] Jun 17, 2026
7b30db9
fix(autoconfig): add --rebuild to GPU sweep build step
github-actions[bot] Jun 18, 2026
292d378
research(autoconfig): add CPU EP sweep script (catalog_cpu_sweep.py)
github-actions[bot] Jun 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,8 @@ lint.per-file-ignores."tests/**" = [ "ANN", "D", "PLR2004", "PT", "S101", "T20"
lint.per-file-ignores."tests/**/generate_patterns.py" = [ "PERF401" ]
# Generated opset code: Allow long lines
lint.per-file-ignores."src/winml/modelkit/analyze/onnx_opset/**" = [ "D", "E501", "N802", "N803", "N806", "TC001", "TC002", "TC003" ]
# Research scripts: POC code, not production — exempt from all style/type/security rules
lint.per-file-ignores."research/**" = [ "ANN", "D", "E", "N", "S", "T20", "UP", "W", "B", "C4", "FA", "I", "PERF", "PIE", "PT", "PTH", "RET", "RSE", "RUF", "SIM", "TCH", "TID", "TRY", "G", "ICN", "E402", "E501", "F401", "F403", "F811" ]
# === Import Conventions ===
lint.flake8-bandit.check-typed-exception = true
lint.flake8-bandit.hardcoded-tmp-directory = [ "/tmp", "/var/tmp", "C:\\Temp" ]
Expand Down
220 changes: 220 additions & 0 deletions research/autoconfig/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# autoconfig — Automated Config Search POC

**Status: Research POC — not production code.**

This directory contains an experimental automated search system that finds the optimal
`winml-cli` build configuration (execution provider, opset version, graph optimizations)
for a given model on Windows hardware — without requiring the user to understand the
underlying ORT/EP optimizer mechanics.

---

## What This Is

`autoconfig.py` implements an Explorer/Optimizer/Reviewer loop:

1. **Explorer** — proposes the next hypothesis (opset, EP flags, graph passes) by reading
`ep_knowledge/` to prune already-refuted configurations
2. **Optimizer** — runs `winml build` + `winml perf` (two-phase: 200-iter CV screen → 3×500-iter full bench)
3. **Reviewer** — evaluates the result, updates the knowledge base, and decides keep/discard

The loop terminates after 30 consecutive discards (plateau detection) or a time budget.

`catalog_qnn_sweep.py` is a generalized multi-model sweep that tests a fixed hypothesis
matrix (h0–h5: baseline, opset 17–21, conv fusions) across a catalog of models on the
QNN NPU, collecting structured results in `catalog-qnn-sweep/<model-slug>/results.json`.

`analyze_graph.py` is an ONNX graph analysis helper that identifies architectural
patterns relevant to EP optimization (Transpose sandwiches, residual branches, GELU
variants, depthwise Conv) and surfaces gaps in `winml analyze` output.

`gen_report_v3.py` generates an HTML sweep report from `results.json` files.

`autoconfig_diagram.html` is an interactive architecture diagram of the Explorer/Optimizer/
Reviewer loop.

---

## Key Findings — 8-Model QNN NPU Catalog Sweep (2026-06-13)

### npu-001: opset 21 NHWC bypass is real — but architecture-specific

Opset ≥ 21 bypasses ORT's NHWC layout transformer for QNN EP, giving a large speedup
on **Conv + residual** models but no benefit (or slight regression) on pure transformers:

| Architecture | Models | opset 21 vs opset 17 |
|---|---|---|
| Conv + residual | MobileViT-small, DINOv2-small | **+26–31% speedup** |
| Pure transformer | ViT-base, YOLOS-small | neutral / slight regression |
| BERT-family NLP | DistilBERT, MiniLM, RoBERTa | neutral (within DVFS noise) |
| Plain Conv (ResNet) | ResNet-18 | ~+20% (h1→h3), but DVFS-dominated |

Root cause: ORT's `IsSupportedOpset()` gate in `layout_transformation.cc` causes the
NHWC layout transform to insert Transpose nodes around Conv ops. For Conv+residual
models these Transposes cannot be cancelled, so bypassing the transform (opset 21) gives
a cleaner HTP graph. Pure attention models have no Conv→NHWC transposes, so the bypass
has no effect.

### npu-006: Conv fusions cause ~4900% regression on QNN NPU for Conv-dominant models

`conv_bn_fusion`, `conv_add_fusion`, `conv_activation_fusion` produce fused op nodes
that QNN EP cannot execute natively — falling back to CPU for every fused Conv:

| Model | h4 (conv fusions) vs h1 (baseline) |
|---|---|
| ResNet-18 | **132.3 ms vs 2.72 ms (+4764% regression)** |
| MobileViT-small | 11.36 ms vs 11.72 ms (neutral) |
| DistilBERT | 19.59 ms vs 19.5 ms (neutral — no Conv to fuse) |

This is a critical correctness/performance hazard. `winml` should detect when the target
EP would CPU-fallback fused Conv ops and suppress incompatible fusions automatically
(see [Feature Gaps](#feature-gaps)).

### npu-007: DVFS thermal noise requires session-level averaging for reliable results

QNN NPU exhibits extreme DVFS thermal throttling. CV is consistently 0.10–2.0+ across
all models. Practical implications:

- The CV < 15% Phase-A gate must be **disabled** for QNN NPU (blocks all models)
- Differences < 10% between configs are **unreliable** without ≥ 1500 total iterations
- Recommended protocol: **3 × 500-iter sessions** with 30 s cool-down; report median of
session p50 values
- 30 s cool-down reduces but does not eliminate DVFS spikes

---

## How to Run

### Prerequisites

- `winml` CLI installed and on PATH
- Python 3.11+ with `onnx` package (`pip install onnx`)
- For QNN experiments: Snapdragon X Elite device with QNN SDK (Hexagon HTP driver)

### autoconfig.py — single-model adaptive search

Configured at the top of the file (edit `MODEL_ID`, `TASK`, `EP`, `DEVICE`, `WORK_DIR`):

```bash
# Default: facebook/convnext-tiny-224 on CPU
python autoconfig.py
```

Results are written to `WORK_DIR/results.tsv` and per-hypothesis subdirectories.
The script reads `ep_knowledge/<ep>.json` to prune already-refuted configurations.

### catalog_qnn_sweep.py — multi-model QNN NPU sweep

```bash
# Full catalog sweep (all 8 models, ~6-8 hours on X Elite)
python catalog_qnn_sweep.py

# Single model
python catalog_qnn_sweep.py --model microsoft/resnet-18

# Show available models
python catalog_qnn_sweep.py --list
```

Results land in `catalog-qnn-sweep/<model-slug>/results.json` and a `SUMMARY.md` is
regenerated at the end of each sweep.

### analyze_graph.py — ONNX graph analysis

```bash
# Edit the onnx path at the top of the file, then:
python analyze_graph.py
```

Prints Transpose patterns, residual branch structure, GELU variants, and op domain
breakdown to stdout.

---

## ep_knowledge/ — Empirical Knowledge Base

Each JSON file stores empirical findings for one EP/device combination:

| File | EP/device |
|---|---|
| `cpu.json` | CPU EP (Snapdragon X Elite Oryon) |
| `dml.json` | DirectML EP |
| `qnn_gpu.json` | QNN Adreno GPU |
| `qnn_npu.json` | QNN HTP (Hexagon NPU) — most findings here |

### Schema overview

Each file has a `findings` array. Each finding has:

```json
{
"id": "npu-001",
"title": "...",
"mechanism_confirmed": true,
"architecture_requirement": ["has_conv_ops", "has_residual_connections"],
"status": "confirmed",
"confidence": "high"
}
```

And a `search_space_rules` object that `autoconfig.py` reads to prune configurations
(only findings with `"mechanism_confirmed": true` are applied as pruning rules).

### Adding a new finding

1. Run the experiment and collect bench data
2. Add an entry to the appropriate `ep_knowledge/<ep>.json` under `findings`
3. Set `"mechanism_confirmed": false` and `"confidence": "draft"` until the mechanism
is understood from ORT/EP source code
4. If the finding prunes a search dimension, add a rule under `search_space_rules`
5. Set `"mechanism_confirmed": true` only after source code investigation confirms
the root cause — do NOT promote to confirmed based on benchmark numbers alone
6. See `ep_knowledge/README.md` for the epistemics guidelines

---

## Feature Gaps Identified

Three actionable gaps in `winml-cli` surfaced by this research:

1. **FusedConv detection in `winml analyze`** — `analyze` should detect Conv ops that
would CPU-fallback on QNN NPU after fusion (npu-006), and either warn or suppress
incompatible fusions in the generated build config.

2. **DVFS-aware perf** — `winml perf` should support `--thermal-stabilization` mode
that waits for device temperature to stabilize before measurements, and should report
confidence intervals rather than a single p50.

3. **Budget-aware sweep** — `catalog_qnn_sweep.py` exhausts the 20-min budget on models
> 50 ms baseline after just 2 hypotheses (YOLOS: 78 ms × 3×500 iters = 207 s/hypothesis).
A `--quick` flag that reduces to 1×200-iter for large models is needed.

---

## Directory Layout

```
research/autoconfig/
├── README.md ← this file
├── autoconfig.py ← adaptive single-model config search loop
├── catalog_qnn_sweep.py ← fixed-hypothesis multi-model QNN sweep
├── analyze_graph.py ← ONNX graph pattern analysis helper
├── autoconfig_diagram.html ← Explorer/Optimizer/Reviewer architecture diagram
├── gen_report_v3.py ← HTML report generator for sweep results
├── ep_knowledge/
│ ├── README.md ← epistemics guidelines and KB format
│ ├── cpu.json ← CPU EP findings (ConvNext, 6 findings)
│ ├── dml.json ← DirectML EP findings
│ ├── qnn_gpu.json ← QNN Adreno GPU findings
│ └── qnn_npu.json ← QNN HTP NPU findings (npu-001 through npu-007)
└── catalog-qnn-sweep/
├── SUMMARY.md ← 8-model sweep results and cross-model analysis
├── apple--mobilevit-small/results.json
├── facebook--dinov2-small/results.json
├── microsoft--resnet-18/results.json
├── google--vit-base-patch16-224/results.json
├── deepset--roberta-base-squad2/results.json
├── distilbert--distilbert-base-uncased-finetuned-sst-2-english/results.json
├── sentence-transformers--all-MiniLM-L6-v2/results.json
└── hustvl--yolos-small/results.json
```
Loading
Loading