diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 025085e..55238c9 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -7,6 +7,19 @@ on:
     branches: [main]
 
 jobs:
+  lint-benchmark:
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: benchmarks/snapshot-efficiency
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - run: pip install -r requirements-dev.txt
+      - run: make check
+
   build-and-test:
     runs-on: ubuntu-latest
     steps:
diff --git a/README.md b/README.md
index ad29c73..886ec94 100644
--- a/README.md
+++ b/README.md
@@ -328,6 +328,10 @@ export OPERA_CLI_MCP_BIN=opera-devtools-mcp
 export OPERA_CLI_HEADED=1
 ```
 
+## Benchmarks
+
+See [`benchmarks/snapshot-efficiency/`](benchmarks/snapshot-efficiency/README.md) — measures token cost and task-completion quality of compact snapshot output vs raw MCP and `chrome-devtools-axi`.
+
 ## Development
 
 ```sh
diff --git a/benchmarks/snapshot-efficiency/.flake8 b/benchmarks/snapshot-efficiency/.flake8
new file mode 100644
index 0000000..65cb60e
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/.flake8
@@ -0,0 +1,5 @@
+[flake8]
+max-line-length = 120
+# E203: whitespace before ':' — conflicts with black's slice formatting
+# W503: line break before binary operator — conflicts with black
+extend-ignore = E203, W503
diff --git a/benchmarks/snapshot-efficiency/CLAUDE.md b/benchmarks/snapshot-efficiency/CLAUDE.md
new file mode 100644
index 0000000..83deec9
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/CLAUDE.md
@@ -0,0 +1,65 @@
+# snapshot-efficiency benchmark — Claude guidance
+
+## File roles
+
+| File | Role |
+|---|---|
+| `src/run_benchmark.py` | Entry point. Loads all three config files, resolves CLI overrides, runs the outer condition × task × repeat loop, writes artifacts and JSONL. |
+| `src/agent.py` | Browser agent loop. `run_agent()` drives the LLM turn loop; `AgentState` owns all mutable state accumulation; `AgentResult` is the immutable output. |
+| `src/judge.py` | LLM-as-judge grading. `grade()` takes a trajectory and returns `{"pass": bool, "reason": str}`. |
+| `src/tools.py` | `ToolSet` base class + `CLIToolSet` (subprocess) and `BridgeToolSet` (HTTP) subclasses. `make_tool_set(condition)` is the factory. |
+| `src/llm.py` | Thin OpenAI Responses API wrapper. `Client.call()` returns a `Turn` dataclass. |
+| `src/report.py` | Reads `results/*.jsonl`, prints and writes `results/report.md`. No external deps beyond stdlib + the results files. |
+| `src/utils.py` | `snapshot_chars(text)` — counts characters in a snapshot result, returns 0 for empty/None. |
+| `config/conditions.yaml` | Benchmark conditions: tool mode (`cli` or `bridge`), CLI binary path, bridge URL. |
+| `config/tasks.yaml` | Task prompts and grading hints. |
+| `config/models.yaml` | Agent and judge model names and reasoning effort. **The only place to change model defaults.** |
+
+## Data flow
+
+```
+run_benchmark.py
+  └── run_once()
+        ├── make_tool_set(condition)      → ToolSet (CLIToolSet or BridgeToolSet)
+        ├── run_agent(prompt, tool_set, model, reasoning_effort)
+        │     └── loop:
+        │           client.call()         → Turn
+        │           tool_set.dispatch()   → result str (side effect: browser action)
+        │           state.update(turn, turn_index, tool_results)
+        │     └── state.to_result()       → AgentResult
+        └── grade(prompt, trajectory, model, reasoning_effort, grading_hint)
+              └── Client.call()           → {"pass": bool, "reason": str}
+```
+
+## Running checks
+
+```sh
+# Install dev dependencies (once)
+pip install -r requirements-dev.txt
+
+make format      # apply black + isort (modifies files)
+make lint        # ruff + flake8 (read-only)
+make typecheck   # mypy (read-only)
+make check       # format-check + lint + typecheck — no modifications, matches CI
+```
+
+Config: `pyproject.toml` for black/isort/ruff/mypy; `.flake8` for flake8 (88-char line length throughout).
+
+## Key design decisions
+
+### No hardcoded model defaults
+`run_agent()` and `grade()` require `model` and `reasoning_effort` as positional parameters — there are no defaults in the function signatures. All defaults live in `config/models.yaml`. CLI flags `--model`, `--reasoning-effort`, `--judge-model`, `--judge-reasoning-effort` override them for a single run.
+
+### AgentState owns all state mutations
+`AgentState.update(turn, turn_index, tool_results=None)` is the single place that mutates benchmark state:
+- Always: accumulates `input_tokens` and `output_tokens` from the turn
+- `tool_results=None` (final turn): sets `answer`, appends to `trajectory`
+- `tool_results` provided (tool-call turn): increments `tool_call_count`, appends to `snapshot_chars` for snapshot tools, appends to `trajectory`
+
+`run_agent()` only handles control flow and I/O (LLM calls, tool dispatch, `inputs` buffer).
+
+### SNAPSHOT_TOOLS
+`SNAPSHOT_TOOLS: frozenset[str]` in `agent.py` defines which tool names produce page snapshots worth measuring. Add a tool name here if it returns a snapshot.
+
+### ToolSet dispatch
+Both `CLIToolSet` and `BridgeToolSet` use `match/case` in `dispatch()`. The shared tool schema lives in `_CLI_SCHEMA` (module-level constant in `tools.py`), evaluated once at import time.
diff --git a/benchmarks/snapshot-efficiency/Makefile b/benchmarks/snapshot-efficiency/Makefile
new file mode 100644
index 0000000..f78ced6
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/Makefile
@@ -0,0 +1,23 @@
+SRC = src
+
+.PHONY: format check lint typecheck
+
+# Apply formatting (local dev)
+format:
+	black $(SRC)/
+	isort $(SRC)/
+
+# Check formatting without modifying (CI)
+format-check:
+	black --check $(SRC)/
+	isort --check-only $(SRC)/
+
+lint:
+	ruff check $(SRC)/
+	flake8 $(SRC)/
+
+typecheck:
+	mypy $(SRC)/
+
+# Full validation suite — no file modifications (used in CI)
+check: format-check lint typecheck
diff --git a/benchmarks/snapshot-efficiency/README.md b/benchmarks/snapshot-efficiency/README.md
new file mode 100644
index 0000000..4daf87c
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/README.md
@@ -0,0 +1,185 @@
+# Snapshot Efficiency Benchmark
+
+Measures the token cost and task-completion quality of `opera-browser-cli`'s compact snapshot output against raw MCP output and alternative browser CLI tools.
+
+## What it measures
+
+Every browser agent task requires sending the current page as context to the LLM. This benchmark answers:
+
+- **Token savings** — how much does compact snapshot output reduce input token usage vs raw MCP output?
+- **Quality** — does compression affect task-completion rate?
+- **vs AXI** — how does `opera-browser-cli` compare to `chrome-devtools-axi`, an established browser CLI tool?
+
+### Conditions
+
+| ID              | Description                                                                             |
+|-----------------|-----------------------------------------------------------------------------------------|
+| `opera-compact` | `opera-browser-cli` default — compact snapshots with URL compression (our tool)         |
+| `opera-raw`     | `opera-browser-cli --raw` — uncompressed MCP output piped through our CLI               |
+| `mcp-raw`       | Raw `take_snapshot` via bridge HTTP API — no compression at all (chrome-mcp equivalent) |
+| `axi`           | `chrome-devtools-axi` CLI — external comparison baseline                                |
+
+### Tasks
+
+7 browser tasks adapted from the [axi bench-browser benchmark](https://github.com/kunchenguid/axi/tree/main/bench-browser), covering single-step reads, multi-step navigation, and complex multi-page extraction:
+
+| ID                           | Category      | Target                                   |
+|------------------------------|---------------|------------------------------------------|
+| `read_static_page`           | single-step   | example.com                              |
+| `wikipedia_fact_lookup`      | single-step   | Wikipedia — Moon infobox                 |
+| `github_repo_stars`          | single-step   | github.com/torvalds/linux                |
+| `wikipedia_table_read`       | single-step   | Wikipedia — population table             |
+| `wikipedia_link_follow`      | multi-step    | Wikipedia Ada Lovelace → Charles Babbage |
+| `wikipedia_deep_extraction`  | investigation | Wikipedia Nobel Physics laureates        |
+| `github_issue_investigation` | investigation | github.com/facebook/react/issues         |
+
+### Model
+
+Model defaults are set in [`config/models.yaml`](config/models.yaml):
+
+```yaml
+agent:
+  model: gpt-5.5
+  reasoning_effort: medium
+
+judge:
+  model: gpt-5.5
+  reasoning_effort: low
+```
+
+Both use the OpenAI Responses API (`/v1/responses`). The judge runs at lower effort since pass/fail grading is simpler than browser navigation. To use a different model for a run, pass CLI flags (see [CLI reference](#cli-reference)) — these override the config file without changing it.
+
+## Setup
+
+Requirements: Python 3.11+, `opera-browser-cli` in PATH, Opera/Chrome browser open.
+
+```sh
+cd benchmarks/snapshot-efficiency
+python -m venv .venv
+source .venv/bin/activate   # Windows: .venv\Scripts\activate
+pip install -r requirements.txt
+```
+
+For the `axi` condition, also install:
+
+```sh
+npm install -g chrome-devtools-axi
+```
+
+## Running
+
+All commands run from `benchmarks/snapshot-efficiency/` with the venv active.
+
+### Sanity check (1 run, 1 task)
+
+```sh
+OPENAI_API_KEY=<key> python src/run_benchmark.py \
+  --conditions opera-compact \
+  --tasks read_static_page \
+  --repeats 1
+```
+
+### Single condition
+
+```sh
+OPENAI_API_KEY=<key> python src/run_benchmark.py --conditions opera-compact --repeats 5
+```
+
+### All conditions (skipping axi if not installed)
+
+```sh
+OPENAI_API_KEY=<key> python src/run_benchmark.py \
+  --conditions opera-compact,opera-raw,mcp-raw \
+  --repeats 5
+```
+
+### Full matrix (requires chrome-devtools-axi)
+
+```sh
+OPENAI_API_KEY=<key> python src/run_benchmark.py --repeats 5
+```
+
+### Generate report
+
+```sh
+python src/report.py
+# → results/report.md
+```
+
+## Linting & formatting
+
+Install dev tools (separate from benchmark runtime deps):
+
+```sh
+pip install -r requirements-dev.txt
+```
+
+| Command | What it does |
+|---|---|
+| `make format` | Apply black + isort (local dev) |
+| `make lint` | ruff + flake8 |
+| `make typecheck` | mypy |
+| `make check` | All of the above, read-only — same as CI |
+
+Config lives in `pyproject.toml` (black, isort, ruff, mypy) and `.flake8`.
+All tools are configured for 120-char line length.
+
+## Source layout
+
+```
+src/
+├── run_benchmark.py   # entry point — CLI arg parsing, outer loop, artifact writing
+├── agent.py           # browser agent loop (AgentState, AgentResult, run_agent)
+├── judge.py           # LLM-as-judge pass/fail grading (grade)
+├── tools.py           # ToolSet subclasses (CLIToolSet, BridgeToolSet) + factory
+├── llm.py             # thin OpenAI Responses API wrapper (Client, Turn)
+├── report.py          # reads results/*.jsonl and writes results/report.md
+└── utils.py           # shared utilities (snapshot_chars)
+
+config/
+├── conditions.yaml    # benchmark conditions (tool mode, CLI binary, bridge URL)
+├── tasks.yaml         # task prompts and grading hints
+└── models.yaml        # agent and judge model + reasoning_effort defaults
+```
+
+## CLI reference
+
+```
+python src/run_benchmark.py [options]
+
+  --conditions             Comma-separated condition IDs (default: all four)
+  --tasks                  Comma-separated task IDs (default: all seven)
+  --repeats                Runs per condition × task (default: 5)
+  --model                  Agent model — overrides config/models.yaml
+  --reasoning-effort       Agent reasoning effort: low / medium / high — overrides config/models.yaml
+  --judge-model            Judge model — overrides config/models.yaml
+  --judge-reasoning-effort Judge reasoning effort: low / medium / high — overrides config/models.yaml
+```
+
+To permanently change the defaults, edit [`config/models.yaml`](config/models.yaml).
+
+## Results layout
+
+```
+results/
+├── opera-compact.jsonl      # one record per run
+├── opera-raw.jsonl
+├── mcp-raw.jsonl
+├── axi.jsonl
+├── report.md                # generated by report.py
+└── {condition}/{task}/run{N}/
+    ├── agent_output.json    # full trajectory + per-turn token usage
+    ├── grade.json           # pass/fail verdict + reason
+    └── result.json          # merged record (same shape as the .jsonl row)
+```
+
+## Attribution
+
+This benchmark is based on the [axi browser benchmark](https://github.com/kunchenguid/axi/tree/main/bench-browser) by [@kunchenguid](https://github.com/kunchenguid):
+
+- **Task definitions** (`config/tasks.yaml`) — adapted directly from [`bench-browser/config/tasks.yaml`](https://github.com/kunchenguid/axi/blob/main/bench-browser/config/tasks.yaml)
+- **LLM-as-judge grading approach** — adapted from [`bench-browser/src/grader.ts`](https://github.com/kunchenguid/axi/blob/main/bench-browser/src/grader.ts)
+- **Benchmark methodology** (per-condition JSONL results, trajectory capture, usage metrics) — adapted from [`bench-browser/src/runner.ts`](https://github.com/kunchenguid/axi/blob/main/bench-browser/src/runner.ts)
+- **`axi` condition** — uses [`chrome-devtools-axi`](https://github.com/kunchenguid/axi), the browser CLI tool the axi project benchmarks
+
+The original benchmark uses TypeScript + Claude Sonnet. This port uses Python + OpenAI GPT-5.5 with the Responses API.
diff --git a/benchmarks/snapshot-efficiency/config/conditions.yaml b/benchmarks/snapshot-efficiency/config/conditions.yaml
new file mode 100644
index 0000000..b7f21d7
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/config/conditions.yaml
@@ -0,0 +1,25 @@
+conditions:
+  - id: opera-compact
+    description: opera-browser-cli default (compact snapshots, URL compression)
+    tool_mode: cli
+    cli_bin: opera-browser-cli
+    raw: false
+
+  - id: opera-raw
+    description: opera-browser-cli with --raw flag (uncompressed MCP output)
+    tool_mode: cli
+    cli_bin: opera-browser-cli
+    raw: true
+
+  - id: mcp-raw
+    description: Raw take_snapshot via bridge HTTP API, no compression layer
+    tool_mode: bridge
+    bridge_url: "http://localhost:9224"
+
+  - id: axi
+    description: chrome-devtools-axi CLI (external comparison baseline)
+    tool_mode: cli
+    cli_bin: chrome-devtools-axi
+    raw: false
+    start: "chrome-devtools-axi start"
+    stop: "chrome-devtools-axi stop"
diff --git a/benchmarks/snapshot-efficiency/config/models.yaml b/benchmarks/snapshot-efficiency/config/models.yaml
new file mode 100644
index 0000000..271829d
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/config/models.yaml
@@ -0,0 +1,7 @@
+agent:
+  model: gpt-5.5
+  reasoning_effort: medium
+
+judge:
+  model: gpt-5.5
+  reasoning_effort: low
diff --git a/benchmarks/snapshot-efficiency/config/tasks.yaml b/benchmarks/snapshot-efficiency/config/tasks.yaml
new file mode 100644
index 0000000..bb3ca67
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/config/tasks.yaml
@@ -0,0 +1,56 @@
+tasks:
+  read_static_page:
+    category: single_step
+    prompt: >
+      Navigate to https://example.com and report the main heading of the page.
+    grading:
+      grading_hint: "The main heading on example.com is 'Example Domain'."
+
+  wikipedia_fact_lookup:
+    category: single_step
+    prompt: >
+      Navigate to the Wikipedia article for the Moon
+      (https://en.wikipedia.org/wiki/Moon) and report the Moon's average
+      orbital speed from the infobox.
+    grading:
+      grading_hint: "The Moon's average orbital speed is 1.022 km/s (approximately 1.022 km/s or 2,286 mph)."
+
+  github_repo_stars:
+    category: single_step
+    prompt: >
+      Navigate to https://github.com/torvalds/linux and report the
+      approximate star count and the primary programming language.
+    grading:
+      grading_hint: "torvalds/linux has 190k+ stars and the primary language is C."
+
+  wikipedia_table_read:
+    category: single_step
+    prompt: >
+      Navigate to the Wikipedia article 'List of countries and dependencies
+      by population' and report the top 3 countries by population.
+    grading:
+      grading_hint: "The top 3 countries by population are India, China, and the United States."
+
+  wikipedia_link_follow:
+    category: multi_step
+    prompt: >
+      Navigate to the Wikipedia article for Ada Lovelace, click the link
+      to Charles Babbage, and report his birth date.
+    grading:
+      grading_hint: "Charles Babbage was born on 26 December 1791."
+
+  wikipedia_deep_extraction:
+    category: investigation
+    prompt: >
+      Navigate to 'List of Nobel laureates in Physics' on Wikipedia and
+      report the winners for the 3 most recent years listed.
+    grading:
+      grading_hint: "2024: John Hopfield and Geoffrey Hinton; 2023: Pierre Agostini, Ferenc Krausz, Anne L'Huillier; 2022: Alain Aspect, John Clauser, Anton Zeilinger."
+
+  github_issue_investigation:
+    category: investigation
+    prompt: >
+      Navigate to https://github.com/facebook/react/issues and report
+      the titles of the 5 most recent issues and the total open issue count.
+    grading:
+      grading_hint: "The agent must report 5 specific issue titles. The open issue count should be in the hundreds."
diff --git a/benchmarks/snapshot-efficiency/pyproject.toml b/benchmarks/snapshot-efficiency/pyproject.toml
new file mode 100644
index 0000000..ab7de2d
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/pyproject.toml
@@ -0,0 +1,17 @@
+[tool.black]
+line-length = 120
+
+[tool.isort]
+profile = "black"
+line_length = 120
+
+[tool.ruff]
+line-length = 120
+
+[tool.ruff.lint]
+select = ["E", "F"]
+
+[tool.mypy]
+python_version = "3.11"
+ignore_missing_imports = true
+explicit_package_bases = true
diff --git a/benchmarks/snapshot-efficiency/requirements-dev.txt b/benchmarks/snapshot-efficiency/requirements-dev.txt
new file mode 100644
index 0000000..3a1089f
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/requirements-dev.txt
@@ -0,0 +1,5 @@
+black>=24.0
+flake8>=7.0
+isort>=5.13
+mypy>=1.10
+ruff>=0.4
diff --git a/benchmarks/snapshot-efficiency/requirements.txt b/benchmarks/snapshot-efficiency/requirements.txt
new file mode 100644
index 0000000..32ebcb2
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/requirements.txt
@@ -0,0 +1,3 @@
+openai>=1.30
+pyyaml>=6.0
+requests>=2.31
diff --git a/benchmarks/snapshot-efficiency/src/agent.py b/benchmarks/snapshot-efficiency/src/agent.py
new file mode 100644
index 0000000..ddf5819
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/src/agent.py
@@ -0,0 +1,128 @@
+import json
+import time
+from dataclasses import dataclass, field
+
+from llm import Client, Turn
+from tools import ToolSet
+
+SYSTEM_PROMPT = """You are a browser automation agent. Use the provided tools to navigate the web and answer questions.
+
+Guidelines:
+- Use `navigate` to open URLs
+- Use `snapshot` to re-read the current page if needed
+- Use `click` on element refs (e.g. @1.5) shown in snapshots to follow links
+- Use `go_back` to return to the previous page
+- When you have enough information, reply with your final answer directly (no tool call)
+- Be concise and factual — only report what you observed in the page
+"""
+
+MAX_TURNS = 20
+SNAPSHOT_TOOLS: frozenset[str] = frozenset({"navigate", "snapshot", "click", "go_back"})
+
+
+@dataclass
+class AgentResult:
+    answer: str
+    input_tokens: int
+    output_tokens: int
+    trajectory: list[dict]
+    snapshot_chars: list[int]
+    tool_call_count: int
+    wall_clock_seconds: float
+    error: str | None = None
+
+    @property
+    def total_tokens(self) -> int:
+        return self.input_tokens + self.output_tokens
+
+
+@dataclass
+class AgentState:
+    input_tokens: int = 0
+    output_tokens: int = 0
+    trajectory: list[dict] = field(default_factory=list)
+    snapshot_chars: list[int] = field(default_factory=list)
+    tool_call_count: int = 0
+    start: float = field(default_factory=time.monotonic)
+    error: str | None = None
+    answer: str = ""
+
+    def update(self, turn: Turn, turn_index: int, tool_results: dict | None = None) -> None:
+        self.input_tokens += turn.input_tokens
+        self.output_tokens += turn.output_tokens
+
+        if tool_results is None:
+            self.answer = turn.text
+            self.trajectory.append({"turn": turn_index, "tool_calls": [], "text": turn.text})
+            return
+
+        self.tool_call_count += len(turn.tool_calls)
+        for tc in turn.tool_calls:
+            if tc.name in SNAPSHOT_TOOLS:
+                self.snapshot_chars.append(len(tool_results[tc.call_id]))
+        for tc in turn.tool_calls:
+            self.trajectory.append(
+                {
+                    "turn": turn_index,
+                    "tool_calls": [{"name": tc.name, "args": tc.arguments}],
+                    "tool_result": tool_results.get(tc.call_id, ""),
+                    "text": turn.text,
+                }
+            )
+
+    def to_result(self) -> AgentResult:
+        return AgentResult(
+            answer=self.answer,
+            input_tokens=self.input_tokens,
+            output_tokens=self.output_tokens,
+            trajectory=self.trajectory,
+            snapshot_chars=self.snapshot_chars,
+            tool_call_count=self.tool_call_count,
+            wall_clock_seconds=round(time.monotonic() - self.start, 1),
+            error=self.error,
+        )
+
+
+def run_agent(
+    task_prompt: str,
+    tool_set: ToolSet,
+    model: str,
+    reasoning_effort: str,
+) -> AgentResult:
+    client = Client(model, reasoning_effort)
+    inputs: list = [
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {"role": "user", "content": task_prompt},
+    ]
+
+    state = AgentState()
+
+    try:
+        for _turn in range(MAX_TURNS):
+            turn = client.call(inputs, tools=tool_set.definitions)
+            inputs.extend(turn.output_items)
+
+            if not turn.tool_calls:
+                state.update(turn, _turn)
+                break
+
+            tool_results = {}
+            for tc in turn.tool_calls:
+                args = json.loads(tc.arguments)
+                tool_results[tc.call_id] = tool_set.dispatch(tc.name, args)
+                inputs.append(
+                    {
+                        "type": "function_call_output",
+                        "call_id": tc.call_id,
+                        "output": tool_results[tc.call_id],
+                    }
+                )
+
+            state.update(turn, _turn, tool_results)
+        else:
+            state.error = f"Reached max turns ({MAX_TURNS}) without final answer"
+
+    except Exception as e:
+        state.error = str(e)
+
+    return state.to_result()
diff --git a/benchmarks/snapshot-efficiency/src/judge.py b/benchmarks/snapshot-efficiency/src/judge.py
new file mode 100644
index 0000000..6f4d7a1
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/src/judge.py
@@ -0,0 +1,75 @@
+import json
+
+from llm import Client
+
+SYSTEM_PROMPT = "You are a benchmark grader evaluating whether an AI agent completed a browser automation task."
+
+RULES = """
+- PASS if the agent navigated to the correct pages AND produced a correct, complete answer
+- FAIL if the agent hallucinated data without actually browsing to the page
+- FAIL if the agent browsed but misinterpreted the page content
+- FAIL if the agent gave a partial answer when a complete one was requested
+- For error recovery tasks, PASS if the agent correctly identified the error and then recovered
+- For multi-step tasks, PASS only if all steps were completed
+
+Respond with exactly: {"pass": true, "reason": "..."} or {"pass": false, "reason": "..."}
+"""
+
+TOOL_OUTPUT_CAP = 30_000
+
+
+def _format_trajectory(trajectory: list[dict]) -> str:
+    lines = []
+    for turn in trajectory:
+        for tc in turn.get("tool_calls", []):
+            args = tc.get("args", "")
+            if isinstance(args, str):
+                try:
+                    args = json.loads(args)
+                except json.JSONDecodeError:
+                    pass
+            lines.append(f"[tool] {tc['name']}({json.dumps(args)})")
+        result = turn.get("tool_result", "")
+        if result:
+            if len(result) > TOOL_OUTPUT_CAP:
+                result = result[:TOOL_OUTPUT_CAP] + f"\n... (truncated, {len(result)} chars total)"
+            lines.append(f"[result] {result}")
+        if turn.get("text"):
+            lines.append(f"[agent] {turn['text']}")
+    return "\n".join(lines)
+
+
+def _build_prompt(task_prompt: str, trajectory: list[dict], grading_hint: str | None) -> str:
+    parts = [f"TASK:\n{task_prompt.strip()}"]
+    if trajectory:
+        parts.append(f"AGENT TRAJECTORY:\n{_format_trajectory(trajectory)}")
+    if grading_hint:
+        parts.append(f"KNOWN FACTS:\n{grading_hint}")
+    parts.append(f"GRADING RULES:{RULES}")
+    return "\n\n".join(parts)
+
+
+def grade(
+    task_prompt: str,
+    trajectory: list[dict],
+    model: str,
+    reasoning_effort: str,
+    grading_hint: str | None = None,
+) -> dict:
+    prompt = _build_prompt(task_prompt, trajectory, grading_hint)
+    client = Client(model, reasoning_effort=reasoning_effort)
+    try:
+        turn = client.call(
+            [
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": prompt},
+            ]
+        )
+        raw = turn.text.strip()
+        if raw.startswith("```"):
+            raw = raw.split("```")[1].removeprefix("json")
+        return json.loads(raw)
+    except json.JSONDecodeError as e:
+        return {"pass": False, "reason": f"judge parse error: {e}"}
+    except Exception as e:
+        return {"pass": False, "reason": f"judge error: {e}"}
diff --git a/benchmarks/snapshot-efficiency/src/llm.py b/benchmarks/snapshot-efficiency/src/llm.py
new file mode 100644
index 0000000..2d86559
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/src/llm.py
@@ -0,0 +1,52 @@
+from dataclasses import dataclass
+
+import openai
+
+
+@dataclass
+class Turn:
+    text: str
+    tool_calls: list  # raw function_call items from response.output
+    output_items: list  # model_dump'd, ready to extend next input
+    input_tokens: int
+    output_tokens: int
+
+
+def _to_input_item(item) -> dict:
+    # status is an output-only field; the API rejects it when fed back as input
+    d = item.model_dump()
+    d.pop("status", None)
+    return d
+
+
+class Client:
+    def __init__(self, model: str, reasoning_effort: str = "medium"):
+        self._api = openai.OpenAI()
+        self._model = model
+        self._reasoning_effort = reasoning_effort
+
+    def call(self, input_items: list, tools: list | None = None) -> Turn:
+        response = self._api.responses.create(  # type: ignore[call-overload]
+            model=self._model,
+            reasoning={"effort": self._reasoning_effort},
+            input=input_items,
+            tools=tools or [],
+        )
+
+        text_parts: list[str] = []
+        tool_calls: list = []
+        for item in response.output:
+            if item.type == "function_call":
+                tool_calls.append(item)
+            elif item.type == "message":
+                for block in item.content:
+                    if hasattr(block, "text"):
+                        text_parts.append(block.text)
+
+        return Turn(
+            text=" ".join(text_parts),
+            tool_calls=tool_calls,
+            output_items=[_to_input_item(item) for item in response.output],
+            input_tokens=response.usage.input_tokens,
+            output_tokens=response.usage.output_tokens,
+        )
diff --git a/benchmarks/snapshot-efficiency/src/report.py b/benchmarks/snapshot-efficiency/src/report.py
new file mode 100644
index 0000000..87bd980
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/src/report.py
@@ -0,0 +1,184 @@
+import json
+import statistics
+from pathlib import Path
+
+ROOT = Path(__file__).parent.parent  # benchmarks/snapshot-efficiency/
+RESULTS_DIR = ROOT / "results"
+REPORT_PATH = RESULTS_DIR / "report.md"
+
+CONDITION_ORDER = ["opera-compact", "opera-raw", "mcp-raw", "axi"]
+
+
+def load_results() -> dict[str, list[dict]]:
+    results: dict[str, list[dict]] = {}
+    for f in sorted(RESULTS_DIR.glob("*.jsonl")):
+        cid = f.stem
+        records = []
+        for line in f.read_text().splitlines():
+            line = line.strip()
+            if line:
+                records.append(json.loads(line))
+        results[cid] = records
+    return results
+
+
+def summarize(records: list[dict]) -> dict:
+    if not records:
+        return {}
+    tasks = set(r["task"] for r in records)
+    passes = [r for r in records if r.get("pass")]
+    pass_rate = len(passes) / len(records) * 100 if records else 0
+    input_tokens = [r["input_tokens"] for r in records]
+    output_tokens = [r["output_tokens"] for r in records]
+    total_tokens = [r["total_tokens"] for r in records]
+    snap_avg = [r["snapshot"]["avg_chars"] for r in records if r.get("snapshot", {}).get("avg_chars")]
+    snap_total = [r["snapshot"]["total_chars"] for r in records if r.get("snapshot", {}).get("total_chars")]
+    wall = [r["wall_clock_seconds"] for r in records]
+    tool_calls = [r["tool_call_count"] for r in records]
+
+    def avg(xs: list) -> float:
+        return statistics.mean(xs) if xs else 0.0
+
+    return {
+        "runs": len(records),
+        "tasks": len(tasks),
+        "pass_rate": pass_rate,
+        "avg_input_tokens": avg(input_tokens),
+        "avg_output_tokens": avg(output_tokens),
+        "avg_total_tokens": avg(total_tokens),
+        "avg_snap_chars": avg(snap_avg),
+        "avg_snap_total_chars": avg(snap_total),
+        "avg_wall_seconds": avg(wall),
+        "avg_tool_calls": avg(tool_calls),
+    }
+
+
+def per_task_summary(records: list[dict]) -> dict[str, dict]:
+    by_task: dict[str, list[dict]] = {}
+    for r in records:
+        by_task.setdefault(r["task"], []).append(r)
+    return {tid: summarize(recs) for tid, recs in sorted(by_task.items())}
+
+
+def fmt_int(x: float) -> str:
+    return f"{int(x):,}"
+
+
+def fmt_pct(x: float) -> str:
+    return f"{x:.0f}%"
+
+
+def fmt_chars(x: float) -> str:
+    if x >= 1000:
+        return f"{x/1000:.1f}k"
+    return str(int(x))
+
+
+def main() -> None:
+    results = load_results()
+    if not results:
+        print(f"No results found in {RESULTS_DIR}/ — run run_benchmark.py first")
+        return
+
+    lines: list[str] = ["# Snapshot Token Efficiency Benchmark\n"]
+
+    # --- Summary table ---
+    lines.append("## Summary\n")
+    header = (
+        "| Condition | Runs | Pass% | Avg input tok | Avg total tok | Avg snap chars | Avg wall (s) | Avg tool calls |"
+    )
+    sep = (
+        "|-----------|------|-------|---------------|---------------|----------------|--------------|----------------|"
+    )
+    lines += [header, sep]
+
+    ordered_cids = [c for c in CONDITION_ORDER if c in results] + [c for c in results if c not in CONDITION_ORDER]
+    summaries: dict[str, dict] = {}
+    for cid in ordered_cids:
+        s = summarize(results[cid])
+        summaries[cid] = s
+        row = (
+            f"| {cid} "
+            f"| {s['runs']} "
+            f"| {fmt_pct(s['pass_rate'])} "
+            f"| {fmt_int(s['avg_input_tokens'])} "
+            f"| {fmt_int(s['avg_total_tokens'])} "
+            f"| {fmt_chars(s['avg_snap_chars'])} "
+            f"| {s['avg_wall_seconds']:.1f} "
+            f"| {s['avg_tool_calls']:.1f} |"
+        )
+        lines.append(row)
+    lines.append("")
+
+    # --- Token savings vs mcp-raw ---
+    if "mcp-raw" in summaries and "opera-compact" in summaries:
+        baseline = summaries["mcp-raw"]["avg_total_tokens"]
+        compact = summaries["opera-compact"]["avg_total_tokens"]
+        if baseline > 0:
+            pct_saved = (baseline - compact) / baseline * 100
+            lines.append(f"> opera-compact saves **{pct_saved:.0f}%** total tokens vs mcp-raw baseline.\n")
+
+    # --- Per-task breakdown ---
+    all_tasks = sorted({r["task"] for records in results.values() for r in records})
+    lines.append("## Per-task breakdown\n")
+
+    for tid in all_tasks:
+        lines.append(f"### {tid}\n")
+        th = "| Condition | Pass% | Avg input tok | Avg snap chars |"
+        ts = "|-----------|-------|---------------|----------------|"
+        lines += [th, ts]
+        for cid in ordered_cids:
+            task_recs = [r for r in results[cid] if r["task"] == tid]
+            if not task_recs:
+                continue
+            s = summarize(task_recs)
+            row = (
+                f"| {cid} "
+                f"| {fmt_pct(s['pass_rate'])} "
+                f"| {fmt_int(s['avg_input_tokens'])} "
+                f"| {fmt_chars(s['avg_snap_chars'])} |"
+            )
+            lines.append(row)
+        lines.append("")
+
+    # --- Snapshot size distribution ---
+    lines.append("## Snapshot size distribution (avg chars per snapshot call)\n")
+    dist_header = "| Condition | Min | Median | Max |"
+    dist_sep = "|-----------|-----|--------|-----|"
+    lines += [dist_header, dist_sep]
+    for cid in ordered_cids:
+        all_snap = []
+        for r in results[cid]:
+            snap = r.get("snapshot", {})
+            # reconstruct per-call from avg×count (rough; exact per-call in agent_output.json)
+            if snap.get("avg_chars") and snap.get("count"):
+                all_snap.append(snap["avg_chars"])
+        if all_snap:
+            row = (
+                f"| {cid} "
+                f"| {fmt_chars(min(all_snap))} "
+                f"| {fmt_chars(statistics.median(all_snap))} "
+                f"| {fmt_chars(max(all_snap))} |"
+            )
+            lines.append(row)
+    lines.append("")
+
+    # --- Failures ---
+    lines.append("## Failures\n")
+    for cid in ordered_cids:
+        fails = [r for r in results[cid] if not r.get("pass")]
+        if fails:
+            lines.append(f"### {cid} ({len(fails)} failures)\n")
+            for r in fails:
+                lines.append(f"- **{r['task']}** run{r['run']}: {r.get('grade_reason', '')}")
+            lines.append("")
+
+    report = "\n".join(lines)
+    RESULTS_DIR.mkdir(parents=True, exist_ok=True)
+    REPORT_PATH.write_text(report)
+    print(report)
+    print(f"\nReport written to {REPORT_PATH}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/benchmarks/snapshot-efficiency/src/run_benchmark.py b/benchmarks/snapshot-efficiency/src/run_benchmark.py
new file mode 100644
index 0000000..f9f2a63
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/src/run_benchmark.py
@@ -0,0 +1,247 @@
+import argparse
+import json
+import os
+import shlex
+import subprocess
+import sys
+import time
+from pathlib import Path
+
+import yaml
+
+from agent import run_agent
+from judge import grade
+from tools import make_tool_set
+
+ROOT = Path(__file__).parent.parent  # benchmarks/snapshot-efficiency/
+RESULTS_DIR = ROOT / "results"
+
+
+def load_config() -> tuple[dict, dict, dict]:
+    config = ROOT / "config"
+    with open(config / "tasks.yaml") as f:
+        tasks = yaml.safe_load(f)["tasks"]
+    with open(config / "conditions.yaml") as f:
+        conditions = {c["id"]: c for c in yaml.safe_load(f)["conditions"]}
+    with open(config / "models.yaml") as f:
+        models = yaml.safe_load(f)
+    return tasks, conditions, models
+
+
+def artifact_dir(condition_id: str, task_id: str, run_n: int) -> Path:
+    d = RESULTS_DIR / condition_id / task_id / f"run{run_n}"
+    d.mkdir(parents=True, exist_ok=True)
+    return d
+
+
+def next_run_index(condition_id: str, task_id: str) -> int:
+    base = RESULTS_DIR / condition_id / task_id
+    if not base.exists():
+        return 0
+    existing = [d for d in base.iterdir() if d.is_dir() and d.name.startswith("run")]
+    return len(existing)
+
+
+def upsert_jsonl(condition_id: str, record: dict) -> None:
+    path = RESULTS_DIR / f"{condition_id}.jsonl"
+    with open(path, "a") as f:
+        f.write(json.dumps(record) + "\n")
+
+
+def start_daemon(condition: dict) -> subprocess.Popen | None:
+    start_cmd = condition.get("start")
+    if not start_cmd:
+        return None
+    print(f"  Starting daemon: {start_cmd}")
+    proc = subprocess.Popen(shlex.split(start_cmd), stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+    time.sleep(2)
+    return proc
+
+
+def stop_daemon(condition: dict, proc: subprocess.Popen | None) -> None:
+    stop_cmd = condition.get("stop")
+    if stop_cmd:
+        subprocess.run(shlex.split(stop_cmd), capture_output=True)
+    if proc:
+        proc.terminate()
+
+
+def run_once(
+    condition: dict,
+    task_id: str,
+    task: dict,
+    run_n: int,
+    model: str,
+    reasoning_effort: str,
+    judge_model: str,
+    judge_reasoning_effort: str,
+) -> dict:
+    tool_set = make_tool_set(condition)
+    result = run_agent(
+        task_prompt=task["prompt"],
+        tool_set=tool_set,
+        model=model,
+        reasoning_effort=reasoning_effort,
+    )
+    grading_hint = task.get("grading", {}).get("grading_hint")
+    if tool_set.all_errored:
+        verdict = {
+            "pass": False,
+            "reason": "all tool calls errored — tool not installed or not running",
+        }
+    else:
+        verdict = grade(
+            task["prompt"],
+            result.trajectory,
+            judge_model,
+            judge_reasoning_effort,
+            grading_hint=grading_hint,
+        )
+
+    # per-snapshot stats
+    sc = result.snapshot_chars
+    snapshot_stats = {
+        "count": len(sc),
+        "total_chars": sum(sc),
+        "avg_chars": int(sum(sc) / len(sc)) if sc else 0,
+        "max_chars": max(sc) if sc else 0,
+    }
+
+    record = {
+        "condition": condition["id"],
+        "task": task_id,
+        "run": run_n,
+        "pass": verdict.get("pass", False),
+        "grade_reason": verdict.get("reason", ""),
+        "answer": result.answer,
+        "input_tokens": result.input_tokens,
+        "output_tokens": result.output_tokens,
+        "total_tokens": result.total_tokens,
+        "tool_call_count": result.tool_call_count,
+        "wall_clock_seconds": round(result.wall_clock_seconds, 1),
+        "snapshot": snapshot_stats,
+        "error": result.error,
+    }
+
+    adir = artifact_dir(condition["id"], task_id, run_n)
+    (adir / "agent_output.json").write_text(
+        json.dumps(
+            {
+                "trajectory": result.trajectory,
+                "input_tokens": result.input_tokens,
+                "output_tokens": result.output_tokens,
+                "snapshot_chars": result.snapshot_chars,
+            },
+            indent=2,
+        )
+    )
+    (adir / "grade.json").write_text(json.dumps(verdict, indent=2))
+    (adir / "result.json").write_text(json.dumps(record, indent=2))
+
+    return record
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run snapshot benchmark")
+    parser.add_argument(
+        "--conditions",
+        default=None,
+        help="Comma-separated condition IDs (default: all)",
+    )
+    parser.add_argument("--tasks", default=None, help="Comma-separated task IDs (default: all)")
+    parser.add_argument("--repeats", type=int, default=5, help="Runs per condition×task")
+    parser.add_argument("--model", default=None, help="Agent model (overrides config/models.yaml)")
+    parser.add_argument(
+        "--reasoning-effort",
+        default=None,
+        dest="reasoning_effort",
+        help="Agent reasoning effort low/medium/high (overrides config/models.yaml)",
+    )
+    parser.add_argument(
+        "--judge-model",
+        default=None,
+        dest="judge_model",
+        help="Judge model (overrides config/models.yaml)",
+    )
+    parser.add_argument(
+        "--judge-reasoning-effort",
+        default=None,
+        dest="judge_reasoning_effort",
+        help="Judge reasoning effort low/medium/high (overrides config/models.yaml)",
+    )
+    args = parser.parse_args()
+
+    if not os.environ.get("OPENAI_API_KEY"):
+        sys.exit("Error: OPENAI_API_KEY environment variable not set")
+
+    all_tasks, all_conditions, models_cfg = load_config()
+
+    agent_model = args.model or models_cfg["agent"]["model"]
+    agent_effort = args.reasoning_effort or models_cfg["agent"]["reasoning_effort"]
+    judge_model = args.judge_model or models_cfg["judge"]["model"]
+    judge_effort = args.judge_reasoning_effort or models_cfg["judge"]["reasoning_effort"]
+
+    selected_conditions = args.conditions.split(",") if args.conditions else list(all_conditions.keys())
+    selected_tasks = args.tasks.split(",") if args.tasks else list(all_tasks.keys())
+
+    # validate
+    for cid in selected_conditions:
+        if cid not in all_conditions:
+            sys.exit(f"Unknown condition: {cid}. Available: {', '.join(all_conditions)}")
+    for tid in selected_tasks:
+        if tid not in all_tasks:
+            sys.exit(f"Unknown task: {tid}. Available: {', '.join(all_tasks)}")
+
+    RESULTS_DIR.mkdir(parents=True, exist_ok=True)
+
+    total = len(selected_conditions) * len(selected_tasks) * args.repeats
+    done = 0
+
+    for cid in selected_conditions:
+        condition = all_conditions[cid]
+        print(f"\n{'='*60}")
+        print(f"Condition: {cid}")
+        print(f"{'='*60}")
+
+        daemon = start_daemon(condition)
+        try:
+            for tid in selected_tasks:
+                task = all_tasks[tid]
+                for repeat in range(args.repeats):
+                    run_n = next_run_index(cid, tid)
+                    done += 1
+                    print(f"\n[{done}/{total}] {cid} / {tid} / run{run_n}")
+                    try:
+                        record = run_once(
+                            condition=condition,
+                            task_id=tid,
+                            task=task,
+                            run_n=run_n,
+                            model=agent_model,
+                            reasoning_effort=agent_effort,
+                            judge_model=judge_model,
+                            judge_reasoning_effort=judge_effort,
+                        )
+                        status = "PASS" if record["pass"] else "FAIL"
+                        tokens = record["total_tokens"]
+                        avg_snap = record["snapshot"]["avg_chars"]
+                        elapsed = record["wall_clock_seconds"]
+                        print(f"  {status} | {tokens} tokens | {avg_snap} avg snap chars | {elapsed}s")
+                        if record["error"]:
+                            print(f"  Error: {record['error']}")
+                        upsert_jsonl(cid, record)
+                    except KeyboardInterrupt:
+                        print("\nInterrupted.")
+                        stop_daemon(condition, daemon)
+                        sys.exit(0)
+                    except Exception as e:
+                        print(f"  Run failed: {e}")
+        finally:
+            stop_daemon(condition, daemon)
+
+    print(f"\nDone. Results in {RESULTS_DIR}/")
+    print("Run: python report.py")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/benchmarks/snapshot-efficiency/src/tools.py b/benchmarks/snapshot-efficiency/src/tools.py
new file mode 100644
index 0000000..6b9758c
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/src/tools.py
@@ -0,0 +1,222 @@
+import json
+import subprocess
+from dataclasses import dataclass, field
+
+import requests
+
+from utils import snapshot_chars
+
+
+@dataclass
+class ToolCallRecord:
+    tool_name: str
+    args: dict
+    result: str
+    snapshot_chars: int = 0
+    error: str | None = None
+
+
+@dataclass
+class ToolSet:
+    condition_id: str
+    definitions: list[dict]  # OpenAI tool schemas
+    records: list[ToolCallRecord] = field(default_factory=list)
+
+    def dispatch(self, name: str, args: dict) -> str:
+        raise NotImplementedError
+
+    @property
+    def all_errored(self) -> bool:
+        """True if every tool call returned an error — indicates the tool is not installed/running."""
+        return bool(self.records) and all(r.result.startswith("[error:") for r in self.records)
+
+
+# ---------------------------------------------------------------------------
+# CLI-mode tool set (opera-compact, opera-raw, axi)
+# ---------------------------------------------------------------------------
+
+
+class CLIToolSet(ToolSet):
+    def __init__(self, condition_id: str, cli_bin: str, raw: bool = False):
+        self.cli_bin = cli_bin
+        self.raw = raw
+        super().__init__(condition_id=condition_id, definitions=_CLI_SCHEMA)
+
+    def _run(self, *args: str, timeout: int = 60) -> str:
+        cmd = [self.cli_bin, *args]
+        try:
+            result = subprocess.run(
+                cmd,
+                capture_output=True,
+                text=True,
+                timeout=timeout,
+            )
+            output = result.stdout
+            if result.returncode != 0 and not output:
+                output = result.stderr or f"[exit {result.returncode}]"
+            return output.strip()
+        except subprocess.TimeoutExpired:
+            return f"[timeout after {timeout}s]"
+        except FileNotFoundError:
+            return f"[error: {self.cli_bin} not found in PATH]"
+
+    def dispatch(self, name: str, args: dict) -> str:
+        extra = ["--raw"] if self.raw and name in ("navigate", "snapshot", "click", "go_back") else []
+
+        match name:
+            case "navigate":
+                result = self._run("open", args.get("url", ""), *extra)
+            case "snapshot":
+                result = self._run("snapshot", *extra)
+            case "click":
+                result = self._run("click", args.get("ref", ""), *extra)
+            case "go_back":
+                result = self._run("back", *extra)
+            case _:
+                result = f"[unknown tool: {name}]"
+
+        record = ToolCallRecord(
+            tool_name=name,
+            args=args,
+            result=result,
+            snapshot_chars=(snapshot_chars(result) if name in ("navigate", "snapshot", "click", "go_back") else 0),
+        )
+        self.records.append(record)
+        return result
+
+
+# ---------------------------------------------------------------------------
+# Bridge-mode tool set (mcp-raw)
+# ---------------------------------------------------------------------------
+
+# Default bridge URL — matches opera-browser-cli's default port (OPERA_CLI_PORT).
+# Override via bridge_url in conditions.yaml.
+DEFAULT_BRIDGE_URL = "http://localhost:9224"
+
+
+class BridgeToolSet(ToolSet):
+    def __init__(self, condition_id: str, bridge_url: str = DEFAULT_BRIDGE_URL):
+        self.bridge_url = bridge_url.rstrip("/")
+        self.session = requests.Session()
+        super().__init__(condition_id=condition_id, definitions=_CLI_SCHEMA)
+
+    def _call(self, tool_name: str, tool_args: dict) -> str:
+        try:
+            resp = self.session.post(
+                f"{self.bridge_url}/call",
+                json={"name": tool_name, "args": tool_args},
+                timeout=60,
+            )
+            resp.raise_for_status()
+            data = resp.json()
+            # MCP result: {"result": [...content items...]}
+            result = data.get("result", data)
+            if isinstance(result, list):
+                parts = []
+                for item in result:
+                    if isinstance(item, dict) and item.get("type") == "text":
+                        parts.append(item["text"])
+                    elif isinstance(item, dict):
+                        parts.append(json.dumps(item))
+                    else:
+                        parts.append(str(item))
+                return "\n".join(parts)
+            return json.dumps(result)
+        except requests.exceptions.ConnectionError:
+            return "[error: bridge not running — start with: opera-browser-cli start]"
+        except Exception as e:
+            return f"[error: {e}]"
+
+    def dispatch(self, name: str, args: dict) -> str:
+        match name:
+            case "navigate":
+                result = self._call(
+                    "navigate_page",
+                    {"url": args.get("url", ""), "includeSnapshot": True},
+                )
+            case "snapshot":
+                result = self._call("take_snapshot", {})
+            case "click":
+                result = self._call("click", {"uid": args.get("ref", ""), "includeSnapshot": True})
+            case "go_back":
+                result = self._call("navigate_page", {"url": "back", "includeSnapshot": True})
+            case _:
+                result = f"[unknown tool: {name}]"
+
+        record = ToolCallRecord(
+            tool_name=name,
+            args=args,
+            result=result,
+            snapshot_chars=snapshot_chars(result),
+        )
+        self.records.append(record)
+        return result
+
+
+# ---------------------------------------------------------------------------
+# Factory
+# ---------------------------------------------------------------------------
+
+
+def make_tool_set(condition: dict) -> ToolSet:
+    mode = condition["tool_mode"]
+    cid = condition["id"]
+    if mode == "cli":
+        return CLIToolSet(
+            condition_id=cid,
+            cli_bin=condition["cli_bin"],
+            raw=condition.get("raw", False),
+        )
+    elif mode == "bridge":
+        return BridgeToolSet(
+            condition_id=cid,
+            bridge_url=condition.get("bridge_url", DEFAULT_BRIDGE_URL),
+        )
+    else:
+        raise ValueError(f"Unknown tool_mode: {mode}")
+
+
+# ---------------------------------------------------------------------------
+# OpenAI tool schemas (same for all conditions)
+# Responses API (/v1/responses) uses flat tool format — no nested "function" key
+# ---------------------------------------------------------------------------
+
+_CLI_SCHEMA: list[dict] = [
+    {
+        "type": "function",
+        "name": "navigate",
+        "description": "Navigate the browser to a URL and return the page snapshot.",
+        "parameters": {
+            "type": "object",
+            "properties": {"url": {"type": "string", "description": "Full URL to navigate to."}},
+            "required": ["url"],
+        },
+    },
+    {
+        "type": "function",
+        "name": "snapshot",
+        "description": "Return the current page's accessibility snapshot without navigating.",
+        "parameters": {"type": "object", "properties": {}, "required": []},
+    },
+    {
+        "type": "function",
+        "name": "click",
+        "description": "Click an element on the current page by its reference ID (e.g. @1.5) and return the updated snapshot.",  # noqa: E501
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "ref": {
+                    "type": "string",
+                    "description": "Element reference such as @1.5",
+                }
+            },
+            "required": ["ref"],
+        },
+    },
+    {
+        "type": "function",
+        "name": "go_back",
+        "description": "Navigate back to the previous page and return the snapshot.",
+        "parameters": {"type": "object", "properties": {}, "required": []},
+    },
+]
diff --git a/benchmarks/snapshot-efficiency/src/utils.py b/benchmarks/snapshot-efficiency/src/utils.py
new file mode 100644
index 0000000..6633501
--- /dev/null
+++ b/benchmarks/snapshot-efficiency/src/utils.py
@@ -0,0 +1,2 @@
+def snapshot_chars(text: str) -> int:
+    return len(text) if text else 0
diff --git a/src/bridge.ts b/src/bridge.ts
index e24d8de..19c5040 100644
--- a/src/bridge.ts
+++ b/src/bridge.ts
@@ -3,9 +3,10 @@
  *
  * Spawns opera-devtools-mcp as a child process and maintains a single
  * persistent MCP session. Exposes a simple HTTP API:
- *   POST /call  { name, args }  → { result }
- *   GET  /tools                 → [{ name, description }]
- *   GET  /health                → { status: "ok" }
+ *   POST /call           { name, args }  → { result }
+ *   GET  /tools                          → [{ name, description }]
+ *   GET  /health                         → { status: "ok" }
+ *   GET  /last-snapshot                  → { raw, pageUrl, capturedAt } | 404
  *
  * Writes a PID file to ~/.opera-browser-cli/bridge.pid on startup.
  */
@@ -23,6 +24,7 @@ import {
   type ServerResponse,
 } from "node:http";
 import { existsSync, mkdirSync, unlinkSync, writeFileSync } from "node:fs";
+import { extractPageOrigin } from "./snapshot.js";
 import { createRequire } from "node:module";
 import { dirname, join, resolve } from "node:path";
 import { homedir } from "node:os";
@@ -42,6 +44,26 @@ const OPERA_AI_TOOLS = new Set([
   "opera_make",
 ]);
 
+export interface LastSnapshotCache {
+  raw: string;
+  pageUrl: string | null;
+  capturedAt: number;
+}
+
+// The most recent raw snapshot text returned by take_snapshot.
+// Shared across all concurrent HTTP requests; last write wins.
+// Survives navigation — callers use pageUrl to detect drift if needed.
+let lastSnapshot: LastSnapshotCache | null = null;
+
+export function getLastSnapshotCache(): LastSnapshotCache | null {
+  return lastSnapshot;
+}
+
+/** Reset the snapshot cache — for use in tests only. */
+export function resetLastSnapshotCache(): void {
+  lastSnapshot = null;
+}
+
 export interface BridgeContentBlock {
   type: string;
   text?: string;
@@ -233,13 +255,20 @@ async function handleCallRequest(
     return;
   }
 
-  // Non-streaming path (unchanged).
+  // Non-streaming path.
   try {
     const result = await client.callTool(
       { name: payload.name, arguments: payload.args },
       undefined,
     );
     const text = extractToolText(getToolContent(result));
+    if (payload.name === "take_snapshot") {
+      lastSnapshot = {
+        raw: text,
+        pageUrl: extractPageOrigin(text),
+        capturedAt: Date.now(),
+      };
+    }
     res.statusCode = 200;
     res.end(JSON.stringify({ result: text }));
   } catch (error) {
@@ -265,6 +294,15 @@ export async function handleBridgeRequest(
     return;
   }
 
+  if (req.method === "GET" && req.url === "/last-snapshot") {
+    if (lastSnapshot === null) {
+      writeJson(res, 404, { error: "no snapshot cached" });
+    } else {
+      writeJson(res, 200, lastSnapshot);
+    }
+    return;
+  }
+
   try {
     if (req.method === "GET" && req.url === "/tools") {
       await handleToolsRequest(client, res);
diff --git a/src/cli.ts b/src/cli.ts
index dbf2d99..a421c50 100644
--- a/src/cli.ts
+++ b/src/cli.ts
@@ -13,6 +13,7 @@ import {
   getConfigFile,
   getLogFile,
   getSessionSnapshotIfRunning,
+  getLastSnapshot,
   loadConfig,
   parseConfigValue,
   stopBridge,
@@ -23,6 +24,9 @@ import {
   extractTitle,
   truncateSnapshot,
   truncateText,
+  compactSnapshot,
+  applyUrlLut,
+  resolveUrl,
 } from "./snapshot.js";
 import { getSuggestions } from "./suggestions.js";
 
@@ -90,7 +94,7 @@ tips:
 `;
 
 const COMMAND_HELP: Record<string, string> = {
-  open: `usage: opera-browser-cli open <url> [--full]
+  open: `usage: opera-browser-cli open <url> [--full] [--raw]
 Navigate to a URL and capture an accessibility snapshot.
 
 args:
@@ -98,6 +102,7 @@ args:
 
 flags:
   --full  Show complete snapshot without truncation
+  --raw   Show unprocessed MCP output (disables compact format)
 
 examples:
   opera-browser-cli open https://example.com
@@ -119,15 +124,31 @@ examples:
   opera-browser-cli screenshot ./element.png --uid @3
   opera-browser-cli screenshot ./full.png --full-page --format jpeg`,
 
-  snapshot: `usage: opera-browser-cli snapshot [--full]
+  snapshot: `usage: opera-browser-cli snapshot [--full] [--raw]
 Capture the current page accessibility snapshot.
 
 flags:
   --full  Show complete snapshot without truncation
+  --raw   Show unprocessed MCP output (disables compact format)
 
 examples:
   opera-browser-cli snapshot
-  opera-browser-cli snapshot --full`,
+  opera-browser-cli snapshot --full
+  opera-browser-cli snapshot --raw`,
+
+  url: `usage: opera-browser-cli url <$uN | @ref>
+Resolve a URL token or element ref from the last snapshot.
+
+args:
+  $uN   URL token printed in the snapshot's urls: trailer (e.g. $u3)
+  @ref  Element ref from the snapshot (e.g. @11.57)
+
+Tokens ($uN) are scoped to the last snapshot. If no snapshot is cached
+the bridge takes a fresh one automatically.
+
+examples:
+  opera-browser-cli url \\$u3
+  opera-browser-cli url @11.57`,
 
   click: `usage: opera-browser-cli click @<uid> [--full]
 Click an interactive element by its ref from the snapshot.
@@ -892,10 +913,11 @@ function readPackageVersion(): string {
   throw new Error("Could not determine opera-browser-cli package version");
 }
 
-function splitFullFlag(args: string[]): { args: string[]; full: boolean } {
+function splitFullFlag(args: string[]): { args: string[]; full: boolean; raw: boolean } {
   return {
-    args: args.filter((arg) => arg !== "--full"),
+    args: args.filter((arg) => arg !== "--full" && arg !== "--raw"),
     full: args.includes("--full"),
+    raw: args.includes("--raw"),
   };
 }
 
@@ -976,15 +998,18 @@ function parseSnapshotFromResponse(response: string): string | null {
     : trimmed.slice(0, nextHeading).trimEnd();
 }
 
-/** Format page metadata (TOON) + raw snapshot + suggestions. */
+/** Format page metadata (TOON) + snapshot + suggestions. */
 function formatPageOutput(
   snapshot: string,
   command: string,
   url?: string,
   full = false,
+  raw = false,
 ): string {
-  const title = extractTitle(snapshot);
-  const refs = countRefs(snapshot);
+  const tree = raw ? snapshot : compactSnapshot(snapshot);
+
+  const title = extractTitle(tree);
+  const refs = countRefs(tree);
 
   const blocks: string[] = [];
 
@@ -995,16 +1020,19 @@ function formatPageOutput(
   page.refs = refs;
   blocks.push(encode({ page }));
 
-  // Truncate snapshot
-  const tr = truncateSnapshot(snapshot, full);
-  let snapshotBlock = `snapshot:\n${tr.text.trimEnd()}`;
+  // Truncate snapshot, then apply URL LUT to the visible portion only.
+  // LUT runs after truncation so the trailer lists only URLs the agent can see.
+  const tr = truncateSnapshot(tree, full, raw ? 16000 : 12000);
+  const { body, trailer } = raw ? { body: tr.text, trailer: "" } : applyUrlLut(tr.text);
+  let snapshotBlock = `snapshot:\n${body.trimEnd()}`;
+  if (trailer) snapshotBlock += `\n${trailer}`;
   if (tr.truncated) {
     snapshotBlock += `\n    ... (truncated, ${tr.totalLength} chars total)`;
   }
   blocks.push(snapshotBlock);
 
   // Contextual suggestions
-  const suggestions = getSuggestions({ command, url, snapshot });
+  const suggestions = getSuggestions({ command, url, snapshot: tree });
   if (tr.truncated) {
     suggestions.push(
       `Run \`opera-browser-cli ${command}${url ? " " + url : ""} --full\` to see complete snapshot`,
@@ -1027,9 +1055,9 @@ function stripSnapshotHeader(text: string): string {
   return text.replace(/^[\s\S]*?##\s+Latest page snapshot\s*\n/, "");
 }
 
-/** Strip leading @ from uid ref. */
+/** Strip leading @ and normalise dot-form refs to underscore form for MCP ("@2.4" → "2_4"). */
 function parseUid(arg: string): string {
-  return arg.startsWith("@") ? arg.slice(1) : arg;
+  return arg.replace(/^@/, "").replace(/\./g, "_");
 }
 
 function isRecoverableOpenError(error: unknown): error is CdpError {
@@ -1062,7 +1090,7 @@ const SCROLL_FUNCTIONS: Record<string, string> = {
   bottom: "window.scrollTo(0, document.body.scrollHeight)",
 };
 
-async function handleOpen(args: string[], full: boolean): Promise<string> {
+async function handleOpen(args: string[], full: boolean, raw = false): Promise<string> {
   const url = args[0];
   if (!url) {
     throw new CdpError("Missing URL", "VALIDATION_ERROR", [
@@ -1079,12 +1107,12 @@ async function handleOpen(args: string[], full: boolean): Promise<string> {
     await callTool("new_page", { url });
   }
   const snapshot = stripSnapshotHeader(await callTool("take_snapshot"));
-  return formatPageOutput(snapshot, "open", url, full);
+  return formatPageOutput(snapshot, "open", url, full, raw);
 }
 
-async function handleSnapshot(full: boolean): Promise<string> {
+async function handleSnapshot(full: boolean, raw = false): Promise<string> {
   const snapshot = stripSnapshotHeader(await callTool("take_snapshot"));
-  return formatPageOutput(snapshot, "snapshot", undefined, full);
+  return formatPageOutput(snapshot, "snapshot", undefined, full, raw);
 }
 
 async function handleScreenshot(args: string[]): Promise<string> {
@@ -1120,7 +1148,7 @@ async function handleScreenshot(args: string[]): Promise<string> {
   return formatScreenshotOutput(parsed.filePath);
 }
 
-async function handleClick(args: string[], full: boolean): Promise<string> {
+async function handleClick(args: string[], full: boolean, raw = false): Promise<string> {
   const uid = args[0];
   if (!uid) {
     throw new CdpError("Missing element ref", "VALIDATION_ERROR", [
@@ -1129,10 +1157,10 @@ async function handleClick(args: string[], full: boolean): Promise<string> {
   }
 
   const snapshot = await callWithSnapshot("click", { uid: parseUid(uid) });
-  return formatPageOutput(snapshot, "click", undefined, full);
+  return formatPageOutput(snapshot, "click", undefined, full, raw);
 }
 
-async function handleFill(args: string[], full: boolean): Promise<string> {
+async function handleFill(args: string[], full: boolean, raw = false): Promise<string> {
   const uid = args[0];
   const value = args.slice(1).join(" ");
   if (!uid) {
@@ -1150,10 +1178,10 @@ async function handleFill(args: string[], full: boolean): Promise<string> {
     uid: parseUid(uid),
     value,
   });
-  return formatPageOutput(snapshot, "fill", undefined, full);
+  return formatPageOutput(snapshot, "fill", undefined, full, raw);
 }
 
-async function handlePress(args: string[], full: boolean): Promise<string> {
+async function handlePress(args: string[], full: boolean, raw = false): Promise<string> {
   const key = args[0];
   if (!key) {
     throw new CdpError("Missing key name", "VALIDATION_ERROR", [
@@ -1162,10 +1190,10 @@ async function handlePress(args: string[], full: boolean): Promise<string> {
   }
 
   const snapshot = await callWithSnapshot("press_key", { key });
-  return formatPageOutput(snapshot, "press", undefined, full);
+  return formatPageOutput(snapshot, "press", undefined, full, raw);
 }
 
-async function handleType(args: string[], full: boolean): Promise<string> {
+async function handleType(args: string[], full: boolean, raw = false): Promise<string> {
   const text = args.join(" ");
   if (!text) {
     throw new CdpError("Missing text", "VALIDATION_ERROR", [
@@ -1175,10 +1203,10 @@ async function handleType(args: string[], full: boolean): Promise<string> {
 
   await callTool("type_text", { text });
   const snapshot = stripSnapshotHeader(await callTool("take_snapshot"));
-  return formatPageOutput(snapshot, "type", undefined, full);
+  return formatPageOutput(snapshot, "type", undefined, full, raw);
 }
 
-async function handleScroll(args: string[], full: boolean): Promise<string> {
+async function handleScroll(args: string[], full: boolean, raw = false): Promise<string> {
   const dir = (args[0] ?? "down").toLowerCase();
   const fn = SCROLL_FUNCTIONS[dir];
   if (!fn) {
@@ -1189,13 +1217,13 @@ async function handleScroll(args: string[], full: boolean): Promise<string> {
 
   await callTool("evaluate_script", { function: fn });
   const snapshot = stripSnapshotHeader(await callTool("take_snapshot"));
-  return formatPageOutput(snapshot, "scroll", undefined, full);
+  return formatPageOutput(snapshot, "scroll", undefined, full, raw);
 }
 
-async function handleBack(full: boolean): Promise<string> {
+async function handleBack(full: boolean, raw = false): Promise<string> {
   await callTool("navigate_page", { type: "back" });
   const snapshot = stripSnapshotHeader(await callTool("take_snapshot"));
-  return formatPageOutput(snapshot, "back", undefined, full);
+  return formatPageOutput(snapshot, "back", undefined, full, raw);
 }
 
 async function handleWait(args: string[]): Promise<string> {
@@ -1317,7 +1345,7 @@ async function handlePages(): Promise<string> {
   return renderOutput(blocks);
 }
 
-async function handleNewPage(args: string[], full: boolean): Promise<string> {
+async function handleNewPage(args: string[], full: boolean, raw = false): Promise<string> {
   const url = args.filter((a) => !a.startsWith("--"))[0];
   if (!url) {
     throw new CdpError("Missing URL", "VALIDATION_ERROR", [
@@ -1329,12 +1357,13 @@ async function handleNewPage(args: string[], full: boolean): Promise<string> {
   if (background) toolArgs.background = true;
   await callTool("new_page", toolArgs);
   const snapshot = stripSnapshotHeader(await callTool("take_snapshot"));
-  return formatPageOutput(snapshot, "newpage", url, full);
+  return formatPageOutput(snapshot, "newpage", url, full, raw);
 }
 
 async function handleSelectPage(
   args: string[],
   full: boolean,
+  raw = false,
 ): Promise<string> {
   const id = args[0];
   if (!id) {
@@ -1350,7 +1379,7 @@ async function handleSelectPage(
   }
   await callTool("select_page", { pageId });
   const snapshot = stripSnapshotHeader(await callTool("take_snapshot"));
-  return formatPageOutput(snapshot, "selectpage", undefined, full);
+  return formatPageOutput(snapshot, "selectpage", undefined, full, raw);
 }
 
 async function handleClosePage(args: string[]): Promise<string> {
@@ -1412,7 +1441,7 @@ async function handleResize(args: string[]): Promise<string> {
 
 // --- Interaction handlers ---
 
-async function handleHover(args: string[], full: boolean): Promise<string> {
+async function handleHover(args: string[], full: boolean, raw = false): Promise<string> {
   const uid = args[0];
   if (!uid) {
     throw new CdpError("Missing element ref", "VALIDATION_ERROR", [
@@ -1420,10 +1449,10 @@ async function handleHover(args: string[], full: boolean): Promise<string> {
     ]);
   }
   const snapshot = await callWithSnapshot("hover", { uid: parseUid(uid) });
-  return formatPageOutput(snapshot, "hover", undefined, full);
+  return formatPageOutput(snapshot, "hover", undefined, full, raw);
 }
 
-async function handleDrag(args: string[], full: boolean): Promise<string> {
+async function handleDrag(args: string[], full: boolean, raw = false): Promise<string> {
   const from = args[0];
   const to = args[1];
   if (!from || !to) {
@@ -1435,10 +1464,10 @@ async function handleDrag(args: string[], full: boolean): Promise<string> {
     from_uid: parseUid(from),
     to_uid: parseUid(to),
   });
-  return formatPageOutput(snapshot, "drag", undefined, full);
+  return formatPageOutput(snapshot, "drag", undefined, full, raw);
 }
 
-async function handleFillForm(args: string[], full: boolean): Promise<string> {
+async function handleFillForm(args: string[], full: boolean, raw = false): Promise<string> {
   const { entries } = parseFillFormArgs(args);
   if (entries.length === 0) {
     throw new CdpError("No valid field entries", "VALIDATION_ERROR", [
@@ -1446,7 +1475,7 @@ async function handleFillForm(args: string[], full: boolean): Promise<string> {
     ]);
   }
   const snapshot = await callWithSnapshot("fill_form", { elements: entries });
-  return formatPageOutput(snapshot, "fillform", undefined, full);
+  return formatPageOutput(snapshot, "fillform", undefined, full, raw);
 }
 
 async function handleDialog(args: string[]): Promise<string> {
@@ -1463,7 +1492,7 @@ async function handleDialog(args: string[]): Promise<string> {
   return encode({ dialog: action });
 }
 
-async function handleUpload(args: string[], full: boolean): Promise<string> {
+async function handleUpload(args: string[], full: boolean, raw = false): Promise<string> {
   const uid = args[0];
   const filePath = args[1];
   if (!uid) {
@@ -1480,7 +1509,7 @@ async function handleUpload(args: string[], full: boolean): Promise<string> {
     uid: parseUid(uid),
     filePath,
   });
-  return formatPageOutput(snapshot, "upload", undefined, full);
+  return formatPageOutput(snapshot, "upload", undefined, full, raw);
 }
 
 // --- Emulation handler ---
@@ -2285,7 +2314,7 @@ async function handleHome(_full: boolean): Promise<string> {
       renderHelp(help),
     ]);
   }
-  const snapshot = stripSnapshotHeader(result);
+  const snapshot = compactSnapshot(stripSnapshotHeader(result));
   const title = extractTitle(snapshot);
   const refs = countRefs(snapshot);
   const page: Record<string, unknown> = {};
@@ -2299,14 +2328,48 @@ async function handleHome(_full: boolean): Promise<string> {
   return renderOutput([encode({ page }), renderHelp(help)]);
 }
 
+async function handleUrl(args: string[]): Promise<string> {
+  const target = args[0];
+  if (!target) {
+    throw new CdpError("Missing argument", "VALIDATION_ERROR", [
+      "Run `opera-browser-cli url \\$u3` to resolve a URL token",
+      "Run `opera-browser-cli url @11.57` to resolve an element ref",
+    ]);
+  }
+
+  // Prefer the bridge's cached snapshot to avoid an extra MCP round-trip.
+  // Fall back to a fresh snapshot if the cache is cold.
+  let raw: string;
+  const cached = await getLastSnapshot();
+  if (cached) {
+    raw = cached.raw;
+  } else {
+    await ensureBridge();
+    raw = stripSnapshotHeader(await callTool("take_snapshot"));
+  }
+
+  // Re-derive the full (non-truncated) URL map so tokens match what the agent
+  // saw, regardless of the truncation applied to the original output.
+  const compact = compactSnapshot(raw);
+  const { body, urlMap } = applyUrlLut(compact);
+
+  const resolved = resolveUrl(body, urlMap, target);
+  if (resolved === null) {
+    process.stderr.write(`url: "${target}" not found in last snapshot\n`);
+    process.exitCode = 1;
+    return "";
+  }
+  return resolved;
+}
+
 type CommandFn = (args: string[]) => Promise<string>;
 
 function withFullFlag(
-  handler: (args: string[], full: boolean) => Promise<string>,
+  handler: (args: string[], full: boolean, raw?: boolean) => Promise<string>,
 ): CommandFn {
   return (args) => {
     const parsed = splitFullFlag(args);
-    return handler(parsed.args, parsed.full);
+    return handler(parsed.args, parsed.full, parsed.raw);
   };
 }
 
@@ -2318,14 +2381,15 @@ function withoutFullFlag(
 
 const COMMANDS: Record<string, CommandFn> = {
   open: withFullFlag(handleOpen),
-  snapshot: async (args) => handleSnapshot(splitFullFlag(args).full),
+  snapshot: async (args) => { const f = splitFullFlag(args); return handleSnapshot(f.full, f.raw); },
+  url: withoutFullFlag(handleUrl),
   screenshot: withoutFullFlag(handleScreenshot),
   click: withFullFlag(handleClick),
   fill: withFullFlag(handleFill),
   type: withFullFlag(handleType),
   press: withFullFlag(handlePress),
   scroll: withFullFlag(handleScroll),
-  back: async (args) => handleBack(splitFullFlag(args).full),
+  back: async (args) => { const f = splitFullFlag(args); return handleBack(f.full, f.raw); },
   wait: withoutFullFlag(handleWait),
   eval: withFullFlag(handleEval),
   run: async () => handleRun(),
diff --git a/src/client.ts b/src/client.ts
index a9e9865..1cb21f9 100644
--- a/src/client.ts
+++ b/src/client.ts
@@ -409,6 +409,26 @@ export async function getBridgeStatus(): Promise<BridgeStatus> {
  * Get the current page snapshot without starting the bridge.
  * Returns null if the bridge is not running or healthy.
  */
+export interface CachedSnapshot {
+  raw: string;
+  pageUrl: string | null;
+  capturedAt: number;
+}
+
+/** Retrieve the most recent snapshot the bridge has cached, without triggering a new one. */
+export async function getLastSnapshot(): Promise<CachedSnapshot | null> {
+  const pidInfo = readPidFile();
+  if (!pidInfo || !isProcessAlive(pidInfo.pid)) return null;
+  try {
+    const resp = await httpGet(pidInfo.port, "/last-snapshot", 2000);
+    const data = JSON.parse(resp) as { error?: string } & Partial<CachedSnapshot>;
+    if (data.error || !data.raw) return null;
+    return { raw: data.raw, pageUrl: data.pageUrl ?? null, capturedAt: data.capturedAt ?? 0 };
+  } catch {
+    return null;
+  }
+}
+
 export async function getSessionSnapshotIfRunning(): Promise<string | null> {
   const pidInfo = readPidFile();
   if (!pidInfo || !isProcessAlive(pidInfo.pid)) {
diff --git a/src/run.ts b/src/run.ts
index 8e599fb..7adc678 100644
--- a/src/run.ts
+++ b/src/run.ts
@@ -9,6 +9,7 @@ import { mkdtempSync, writeFileSync, unlinkSync, rmdirSync } from "node:fs";
 import { join } from "node:path";
 import { tmpdir } from "node:os";
 import { CdpError } from "./client.js";
+import { compactSnapshot } from "./snapshot.js";
 
 type CallTool = (
   name: string,
@@ -47,9 +48,9 @@ function stripSnapshotHeader(text: string): string {
   return text.replace(/^[\s\S]*?##\s+Latest page snapshot\s*\n/, "");
 }
 
-/** Strip leading @ from uid ref string. */
+/** Strip leading @ and normalise dot-form refs to underscore form for MCP ("@2.4" → "2_4"). */
 function parseUid(ref: string): string {
-  return ref.startsWith("@") ? ref.slice(1) : ref;
+  return ref.replace(/^@/, "").replace(/\./g, "_");
 }
 
 /** Check if an open error is recoverable by falling back to new_page. */
@@ -63,7 +64,7 @@ function isRecoverableOpenError(error: unknown): boolean {
 
 // --- Selector detection ---
 
-const UID_RE = /^@?\d[\d_]*$/;
+const UID_RE = /^@?\d[\d_.]*$/;
 
 /** Returns true when the string looks like a @uid ref (e.g. "@12", "26_181"). */
 export function isUidRef(s: string): boolean {
@@ -163,7 +164,7 @@ export function createPageHelper(callTool: CallTool): PageHelper {
 
     async snapshot(): Promise<string> {
       const result = await callTool("take_snapshot");
-      return stripSnapshotHeader(result);
+      return compactSnapshot(stripSnapshotHeader(result));
     },
 
     async click(refOrSelector: string): Promise<void> {
diff --git a/src/snapshot.ts b/src/snapshot.ts
index 311a9ab..fcd96db 100644
--- a/src/snapshot.ts
+++ b/src/snapshot.ts
@@ -4,9 +4,19 @@ export interface RefInfo {
   type: string;
 }
 
-/** Count interactive refs (uid=...) in snapshot text. */
+/** Convert a canonical MCP ref ("2_4") to display form ("2.4"). */
+export function refToDisplay(mcpRef: string): string {
+  return mcpRef.replace(/_/g, ".");
+}
+
+/** Convert any ref form — "@2.4", "@2_4", "2.4", "2_4" — to MCP wire form "2_4". */
+export function refToMcp(ref: string): string {
+  return ref.replace(/^@/, "").replace(/\./g, "_");
+}
+
+/** Count interactive refs in snapshot text (accepts both uid= and compact @X.Y form). */
 export function countRefs(snapshot: string): number {
-  const matches = snapshot.match(/\buid=\S+/g);
+  const matches = snapshot.match(/^\s*(?:uid=\S+|@\d[\d.]*)\b/gm);
   return matches ? matches.length : 0;
 }
 
@@ -14,22 +24,317 @@ export function countRefs(snapshot: string): number {
 export function extractRefs(snapshot: string): RefInfo[] {
   const refs: RefInfo[] = [];
   for (const line of snapshot.split("\n")) {
-    const m = line.match(/\buid=(\S+)\s+(\w+)\s+"([^"]*)"/);
+    // Accept both uid=X_Y (raw MCP) and @X.Y (compact) forms;
+    // avoid \b before @ since @ is a non-word character
+    const m = line.match(/(?:uid=(\S+)|(?:^|[ \t])@([\d.]+))\s+([\w]+)\s+"([^"]*)"/);
     if (!m) continue;
-    refs.push({ ref: m[1], type: m[2], label: m[3] });
+    const rawRef = m[1] ?? m[2];
+    // Always return in display form so suggestion strings emit @X.Y refs
+    const ref = m[1] ? refToDisplay(rawRef) : rawRef;
+    refs.push({ ref, type: m[3], label: m[4] });
   }
   return refs;
 }
 
-/** Extract page title from snapshot (RootWebArea or first heading). */
+/** Extract page title from snapshot (RootWebArea/root root node or first heading). */
 export function extractTitle(snapshot: string): string {
-  const rootMatch = snapshot.match(/RootWebArea\s+"([^"]+)"/);
+  const rootMatch = snapshot.match(/(?:RootWebArea|root)\s+"([^"]+)"/);
   if (rootMatch) return rootMatch[1];
+  // Compact markdown heading after compactSnapshot: `@X.Y ## Title`
+  const mdMatch = snapshot.match(/^(?:@\S+\s+)?#{1,6}\s+(.+)$/m);
+  if (mdMatch) return mdMatch[1].trim();
   const headingMatch = snapshot.match(/\bheading\s+"([^"]+)"/);
   if (headingMatch) return headingMatch[1];
   return "";
 }
 
+// Query-string keys issued by external ad/analytics platforms that carry no functional
+// meaning for the destination page — safe to drop on any site.
+const NOISE_PARAM_EXACT = new Set([
+  // Google Ads click IDs
+  "gclid", "gbraid", "wbraid", "dclid", "gad_source",
+  // Social / messaging platform click IDs
+  "fbclid",    // Meta/Facebook
+  "msclkid",   // Microsoft Ads
+  "yclid",     // Yandex
+  "igshid",    // Instagram
+  "ttclid",    // TikTok
+  "twclid",    // Twitter/X
+  "li_fat_id", // LinkedIn
+  "srsltid",   // Google Shopping
+  "_ke",       // Klaviyo
+]);
+// Prefix-matched families (all members are tracking-only)
+const NOISE_PARAM_PREFIXES = [
+  "utm_", // Google Analytics UTM parameters
+  "mc_",  // Mailchimp
+];
+
+function isNoiseParam(key: string): boolean {
+  if (NOISE_PARAM_EXACT.has(key)) return true;
+  return NOISE_PARAM_PREFIXES.some((p) => key.startsWith(p));
+}
+
+/**
+ * Clean a URL value to reduce token bloat without losing addressability:
+ *  - returns null for javascript: and data: URLs so the caller drops the attribute entirely
+ *  - strips a matching page origin → relative path
+ *  - removes cross-site tracking query params (utm_*, gclid, fbclid, etc.)
+ *
+ * Preserves fragment, parameter order, and percent-encoding of remaining values.
+ */
+export function cleanUrl(url: string, origin: string | null): string | null {
+  if (url.startsWith("javascript:") || url.startsWith("data:")) return null;
+
+  let working = url;
+  if (origin && working.startsWith(origin)) {
+    working = working.slice(origin.length) || "/";
+  }
+
+  // Pull the fragment off first so query-param parsing can't accidentally consume it
+  let fragment = "";
+  const hashIdx = working.indexOf("#");
+  if (hashIdx >= 0) {
+    fragment = working.slice(hashIdx);
+    working = working.slice(0, hashIdx);
+  }
+
+  const qIdx = working.indexOf("?");
+  if (qIdx < 0) return working + fragment;
+
+  const path = working.slice(0, qIdx);
+  const query = working.slice(qIdx + 1);
+  if (!query) return path + fragment;
+
+  const kept = query.split("&").filter((part) => {
+    if (!part) return false;
+    const eq = part.indexOf("=");
+    const key = eq < 0 ? part : part.slice(0, eq);
+    return !isNoiseParam(key);
+  });
+
+  if (kept.length === 0) return path + fragment;
+  return `${path}?${kept.join("&")}${fragment}`;
+}
+
+/** Extract scheme://host from the root node's url= attribute, if present. */
+export function extractPageOrigin(tree: string): string | null {
+  const m = tree.match(
+    /^\s*(?:uid=\S+|@\S+)\s+(?:RootWebArea|root)\b[^\n]*\burl="([^"]+)"/m,
+  );
+  if (!m) return null;
+  try {
+    const u = new URL(m[1]);
+    return `${u.protocol}//${u.host}`;
+  } catch {
+    return null;
+  }
+}
+
+// Repeat a description value this many times before we treat it as boilerplate worth deduping.
+// Below this, the bytes saved by dropping repeats don't beat the risk of hiding meaningful copy.
+const DESCRIPTION_DEDUP_THRESHOLD = 3;
+
+// Chrome a11y tree uses PascalCase for some internal role names; map them to compact lowercase.
+const ROLE_RENAMES: Record<string, string> = {
+  RootWebArea: "root",
+  StaticText: "text",
+  DisclosureTriangle: "disclosure",
+  ColorWell: "color",
+  InputTime: "time",
+  Date: "date",
+};
+
+/**
+ * Compact an accessibility snapshot tree to reduce token usage (~30% fewer tokens).
+ * Removes noise nodes, strips ARIA default attributes, normalises role names,
+ * de-quotes numeric attributes, converts headings to markdown, and rewrites
+ * refs to the @PAGE.ELEM display format.
+ *
+ * Operates on the raw tree text (after MCP preamble has been stripped).
+ */
+export function compactSnapshot(tree: string): string {
+  const lines = tree.split("\n");
+  const out: string[] = [];
+  let dropDanglingQuote = false;
+
+  // Pre-pass: find page origin (for relative-URL rewriting) and count description values
+  // so we know which ones cross the dedup threshold.
+  const origin = extractPageOrigin(tree);
+  const descriptionCounts = new Map<string, number>();
+  for (const line of lines) {
+    const re = / description="([^"]*)"/g;
+    let m: RegExpExecArray | null;
+    while ((m = re.exec(line)) !== null) {
+      descriptionCounts.set(m[1], (descriptionCounts.get(m[1]) ?? 0) + 1);
+    }
+  }
+  const seenDescription = new Set<string>();
+
+  for (const raw of lines) {
+    let line = raw;
+
+    // <br> elements appear as LineBreak nodes; they're never useful in the a11y tree.
+    // Their label is a literal newline, so splitting on \n leaves a dangling `"` on the
+    // next line — skip that too.
+    if (/^\s*uid=\S+ LineBreak "/.test(line)) {
+      dropDanglingQuote = true;
+      continue;
+    }
+    if (dropDanglingQuote) {
+      dropDanglingQuote = false;
+      if (/^\s*"\s*$/.test(line)) continue;
+    }
+
+    // Whitespace-only text nodes between elements are structural artifacts, not content
+    if (/^\s*uid=\S+ StaticText "\s*"\s*$/.test(line)) continue;
+
+    // StaticText children that just echo the parent's accessible name are redundant —
+    // links, headings, buttons etc. already carry the label on their own line
+    {
+      const m = line.match(/^(\s*)uid=\S+ StaticText "([^"]+)"\s*$/);
+      if (m) {
+        const childIndent = m[1].length;
+        const label = m[2];
+        let drop = false;
+        for (let i = out.length - 1; i >= 0; i--) {
+          if (!out[i].trim()) continue;
+          // Previous lines may already be in compact @X.Y form (B1 runs per-line before push)
+          const pm = out[i].match(/^(\s*)(?:uid=\S+|@\S+) \w+ "([^"]+)"/);
+          if (pm && pm[1].length === childIndent - 2 && pm[2] === label) drop = true;
+          break;
+        }
+        if (drop) continue;
+      }
+    }
+
+    // Empty valuetext is the same as having no valuetext
+    line = line.replace(/ valuetext=""/g, "");
+
+    // `disableable` is redundant when `disabled` is already present
+    if (/ disabled\b/.test(line)) line = line.replace(/ disableable\b/g, "");
+
+    // Every option and tab is selectable by definition; the attribute adds nothing
+    if (/ (?:option|tab) "/.test(line)) line = line.replace(/ selectable\b/g, "");
+
+    // `relevant="additions text"` is the ARIA default for live regions; omit it
+    line = line.replace(/ relevant="additions text"/g, "");
+
+    // `atomic` is implicit for alert and status by the ARIA spec
+    if (/ (?:alert|status) /.test(line)) line = line.replace(/ atomic\b/g, "");
+
+    // `live=` defaults are mandated by ARIA for these roles; no need to repeat them
+    if (/ status /.test(line)) line = line.replace(/ live="polite"/g, "");
+    if (/ alert /.test(line)) line = line.replace(/ live="assertive"/g, "");
+
+    // combobox is always expandable with a popup; both attributes are implied by the role
+    if (/ combobox /.test(line)) {
+      line = line.replace(/ haspopup="(?:menu|listbox)"/g, "");
+      line = line.replace(/ expandable\b/g, "");
+    }
+
+    // Horizontal is the default orientation for sliders and listboxes
+    line = line.replace(/ orientation="horizontal"/g, "");
+
+    // Autocomplete mode is an implementation detail rarely useful for navigation
+    line = line.replace(/ autocomplete="(?:both|list)"/g, "");
+
+    // Drop javascript: URLs entirely (no agent-actionable info), strip the page origin
+    // from same-site links, and remove tracking/encoding query params
+    line = line.replace(/ url="([^"]+)"/g, (_full, rawUrl) => {
+      const cleaned = cleanUrl(rawUrl, origin);
+      return cleaned == null ? "" : ` url="${cleaned}"`;
+    });
+
+    // Boilerplate descriptions (e.g. "use arrow keys to navigate" repeated on every link)
+    // are recoverable from the first occurrence; drop the rest
+    line = line.replace(/ description="([^"]*)"/g, (full, value) => {
+      if ((descriptionCounts.get(value) ?? 0) < DESCRIPTION_DEDUP_THRESHOLD) return full;
+      if (seenDescription.has(value)) return "";
+      seenDescription.add(value);
+      return full;
+    });
+
+    // Normalise known PascalCase Chrome-internal role names to short lowercase forms.
+    // The uid= or @X.Y prefix is optional to handle simplified test fixtures.
+    line = line.replace(
+      /^(\s*(?:(?:uid=|@)\S+\s+)?)([A-Za-z][a-zA-Z]*)( )/,
+      (_, pre, role, post) => pre + (ROLE_RENAMES[role] ?? role) + post,
+    );
+
+    // Numeric attribute values don't need quotes — saves two tokens per attribute
+    line = line.replace(/(\w+)="(-?\d+)"/g, "$1=$2");
+
+    // `heading "Label" level=N` → `## Label` — markdown is shorter and familiar to models
+    {
+      const m = line.match(/^(\s*uid=\S+) heading "([^"]+)" level=(\d+)(.*)/);
+      if (m) {
+        const hashes = "#".repeat(parseInt(m[3], 10));
+        const extra = m[4].trim();
+        line = `${m[1]} ${hashes} ${m[2]}${extra ? " " + extra : ""}`;
+      }
+    }
+
+    // Rewrite refs last so all earlier transforms still match the uid= form;
+    // dot separator tokenises better than underscore in BPE encodings
+    line = line.replace(/\buid=(\d+)_(\d+)\b/g, (_, page, elem) => `@${page}.${elem}`);
+
+    out.push(line);
+  }
+
+  return collapseTextRuns(out).join("\n");
+}
+
+/**
+ * Merge consecutive text nodes at the same indent into one, then re-apply
+ * the echo-dedup: if the merged label exactly matches the parent's label,
+ * the collapsed line is dropped entirely (parent already carries the content).
+ *
+ * Only runs when 2+ text nodes were actually merged; single text nodes that
+ * already survived the per-line echo-dedup are passed through unchanged.
+ */
+function collapseTextRuns(lines: string[]): string[] {
+  const result: string[] = [];
+
+  for (let i = 0; i < lines.length; i++) {
+    const m = lines[i].match(/^(\s*)(@\S+) text "([^"]*)"\s*$/);
+    if (!m) {
+      result.push(lines[i]);
+      continue;
+    }
+
+    const [, indent, ref, firstLabel] = m;
+    let j = i + 1;
+    let merged = firstLabel;
+    while (j < lines.length) {
+      const next = lines[j].match(/^(\s*)@\S+ text "([^"]*)"\s*$/);
+      if (!next || next[1] !== indent) break;
+      merged += next[2];
+      j++;
+    }
+
+    if (j === i + 1) {
+      // Only one text node — pass through (already echo-deduped in main loop)
+      result.push(lines[i]);
+      continue;
+    }
+
+    // Multiple nodes merged — advance past consumed lines and echo-dedup the result
+    i = j - 1;
+    const childIndent = indent.length;
+    let drop = false;
+    for (let k = result.length - 1; k >= 0; k--) {
+      if (!result[k].trim()) continue;
+      const pm = result[k].match(/^(\s*)(?:uid=\S+|@\S+) \w+ "([^"]+)"/);
+      if (pm && pm[1].length === childIndent - 2 && pm[2] === merged) drop = true;
+      break;
+    }
+    if (!drop) result.push(`${indent}${ref} text "${merged}"`);
+  }
+
+  return result;
+}
+
 export interface TruncationResult {
   text: string;
   truncated: boolean;
@@ -87,3 +392,105 @@ const INPUT_TYPES = ["textbox", "searchbox", "input", "combobox", "textarea"];
 export function isInputType(type: string): boolean {
   return INPUT_TYPES.includes(type);
 }
+
+// --- URL LUT (Layer 2) ---
+
+const MIN_DEDUP_LEN = 15;
+const WHALE_THRESHOLD = 200;
+const WHALE_PREVIEW_CAP = 60;
+
+export interface UrlLutResult {
+  body: string;
+  trailer: string; // empty string when no tokens were assigned
+  urlMap: Map<string, string>; // token ($u1) → full cleaned URL
+}
+
+// Produce a short human-readable hint for a whale URL (no full value echoed).
+// Relative paths are already concise; absolute URLs strip the scheme first.
+function whalePreview(url: string): string {
+  const target = url.startsWith("/") ? url : url.replace(/^https?:\/\//, "");
+  return target.length <= WHALE_PREVIEW_CAP
+    ? target
+    : target.slice(0, WHALE_PREVIEW_CAP - 1) + "…";
+}
+
+/**
+ * Apply a URL lookup table to a compacted, already-truncated snapshot.
+ *
+ * Two classes of URL are replaced with short $uN tokens:
+ *   dedup  — appears ≥2× and length ≥ MIN_DEDUP_LEN → full URL printed in trailer
+ *   whale  — length ≥ WHALE_THRESHOLD and not already a dedup URL
+ *            → hidden in trailer with byte-size + path-stem preview only
+ *
+ * Must run AFTER truncation so the trailer only references URLs the agent can
+ * actually see in the body.  Token IDs are assigned in tree-walk (top-down)
+ * order and are therefore deterministic for identical input.
+ */
+export function applyUrlLut(text: string): UrlLutResult {
+  // Count occurrences of each URL value (Layer 1 has already cleaned them)
+  const urlCounts = new Map<string, number>();
+  const scanRe = / url="([^"]+)"/g;
+  let m: RegExpExecArray | null;
+  while ((m = scanRe.exec(text)) !== null) {
+    urlCounts.set(m[1], (urlCounts.get(m[1]) ?? 0) + 1);
+  }
+
+  const isDedup = (u: string) => (urlCounts.get(u) ?? 0) >= 2 && u.length >= MIN_DEDUP_LEN;
+  // Dedup wins when both conditions hold — URL gets full entry in trailer, not hidden.
+  const isWhale = (u: string) => u.length >= WHALE_THRESHOLD && !isDedup(u);
+
+  const urlToToken = new Map<string, string>();
+  const urlMap = new Map<string, string>();
+  let counter = 0;
+
+  const body = text.replace(/ url="([^"]+)"/g, (_full, url: string) => {
+    if (!isDedup(url) && !isWhale(url)) return _full;
+    if (!urlToToken.has(url)) {
+      const token = `$u${++counter}`;
+      urlToToken.set(url, token);
+      urlMap.set(token, url);
+    }
+    return ` url=${urlToToken.get(url)!}`;
+  });
+
+  if (urlMap.size === 0) return { body, trailer: "", urlMap };
+
+  const trailerLines = ["urls:"];
+  for (const [token, url] of urlMap) {
+    if (isWhale(url)) {
+      trailerLines.push(`  ${token} [hidden ${url.length}b → ${whalePreview(url)}]`);
+    } else {
+      trailerLines.push(`  ${token} ${url}`);
+    }
+  }
+
+  return { body, trailer: trailerLines.join("\n"), urlMap };
+}
+
+/**
+ * Resolve a URL from a LUT-applied snapshot body.
+ *
+ * target is either "$u3" (a LUT token) or "11.57" / "@11.57" (an element ref).
+ * For ref resolution the body is searched for the element's url= attribute;
+ * if it was tokenised, the token is further resolved via urlMap.
+ *
+ * Returns the full URL string, or null if not found.
+ */
+export function resolveUrl(
+  body: string,
+  urlMap: Map<string, string>,
+  target: string,
+): string | null {
+  const normalised = target.replace(/^@/, "");
+  if (normalised.startsWith("$u")) {
+    return urlMap.get(normalised) ?? null;
+  }
+  // ref → find line and extract url= (quoted plain value or unquoted token)
+  const escaped = normalised.replace(/\./g, "\\.");
+  const re = new RegExp(`@${escaped}\\b[^\\n]*? url=(?:"([^"]+)"|(\\$u\\d+))`);
+  const hit = body.match(re);
+  if (!hit) return null;
+  if (hit[1] !== undefined) return hit[1];
+  if (hit[2] !== undefined) return urlMap.get(hit[2]) ?? null;
+  return null;
+}
diff --git a/test/bridge.test.ts b/test/bridge.test.ts
index fdeb7ef..e68471b 100644
--- a/test/bridge.test.ts
+++ b/test/bridge.test.ts
@@ -6,10 +6,12 @@ import {
   buildTransportArgs,
   extractToolText,
   getErrorMessage,
+  getLastSnapshotCache,
   handleBridgeRequest,
   isBridgeClientConnected,
   parseBridgeCallPayload,
   resolveBridgeScript,
+  resetLastSnapshotCache,
   wrapTransportForIdCapture,
   type BridgeClient,
 } from "../src/bridge.js";
@@ -385,3 +387,99 @@ describe("handleBridgeRequest streaming", () => {
     expect(JSON.parse(mockB.endPayload)).toEqual({ result: "result-B" });
   });
 });
+
+// ---------------------------------------------------------------------------
+// Snapshot cache — lastSnapshot state + /last-snapshot endpoint
+// ---------------------------------------------------------------------------
+
+describe("snapshot cache", () => {
+  beforeEach(() => resetLastSnapshotCache());
+
+  const snapshotClient: BridgeClient = {
+    listTools: async () => ({ tools: [] }),
+    callTool: async () => ({
+      content: [{ type: "text", text: 'uid=1_0 RootWebArea "Page" url="https://example.com/"\n  link "Home"' }],
+    }),
+    close: async () => {},
+  };
+
+  it("cache is empty before any snapshot call", () => {
+    expect(getLastSnapshotCache()).toBeNull();
+  });
+
+  it("GET /last-snapshot returns 404 when cache is cold", async () => {
+    const req = makeMockRequest("GET", "/last-snapshot");
+    const mock = makeMockResponse();
+    await handleBridgeRequest(snapshotClient, req, mock.res);
+    expect(mock.res.statusCode).toBe(404);
+    expect(JSON.parse(mock.endPayload)).toHaveProperty("error");
+  });
+
+  it("take_snapshot call populates the cache", async () => {
+    const req = makeMockRequest("POST", "/call", JSON.stringify({ name: "take_snapshot", args: {} }));
+    const mock = makeMockResponse();
+    await handleBridgeRequest(snapshotClient, req, mock.res);
+
+    const cached = getLastSnapshotCache();
+    expect(cached).not.toBeNull();
+    expect(cached!.raw).toContain('RootWebArea "Page"');
+    expect(cached!.pageUrl).toBe("https://example.com");
+    expect(cached!.capturedAt).toBeGreaterThan(0);
+  });
+
+  it("GET /last-snapshot returns 200 with cached data after a snapshot", async () => {
+    // Populate cache
+    const postReq = makeMockRequest("POST", "/call", JSON.stringify({ name: "take_snapshot", args: {} }));
+    await handleBridgeRequest(snapshotClient, postReq, makeMockResponse().res);
+
+    // Now fetch it
+    const getReq = makeMockRequest("GET", "/last-snapshot");
+    const mock = makeMockResponse();
+    await handleBridgeRequest(snapshotClient, getReq, mock.res);
+
+    expect(mock.res.statusCode).toBe(200);
+    const data = JSON.parse(mock.endPayload);
+    expect(data.raw).toContain("RootWebArea");
+    expect(data.pageUrl).toBe("https://example.com");
+    expect(typeof data.capturedAt).toBe("number");
+  });
+
+  it("a non-snapshot tool call does not overwrite the cache", async () => {
+    // Populate with snapshot
+    const snapReq = makeMockRequest("POST", "/call", JSON.stringify({ name: "take_snapshot", args: {} }));
+    await handleBridgeRequest(snapshotClient, snapReq, makeMockResponse().res);
+    const first = getLastSnapshotCache()!.raw;
+
+    // Call a different tool
+    const clickClient: BridgeClient = {
+      listTools: async () => ({ tools: [] }),
+      callTool: async () => ({ content: [{ type: "text", text: "clicked" }] }),
+      close: async () => {},
+    };
+    const clickReq = makeMockRequest("POST", "/call", JSON.stringify({ name: "click", args: { uid: "1_1" } }));
+    await handleBridgeRequest(clickClient, clickReq, makeMockResponse().res);
+
+    expect(getLastSnapshotCache()!.raw).toBe(first);
+  });
+
+  it("second take_snapshot overwrites the cache (last write wins)", async () => {
+    let callCount = 0;
+    const twoSnapshotClient: BridgeClient = {
+      listTools: async () => ({ tools: [] }),
+      callTool: async () => {
+        callCount++;
+        return {
+          content: [{ type: "text", text: `RootWebArea "Page ${callCount}"` }],
+        };
+      },
+      close: async () => {},
+    };
+
+    const req1 = makeMockRequest("POST", "/call", JSON.stringify({ name: "take_snapshot", args: {} }));
+    const req2 = makeMockRequest("POST", "/call", JSON.stringify({ name: "take_snapshot", args: {} }));
+    await handleBridgeRequest(twoSnapshotClient, req1, makeMockResponse().res);
+    await handleBridgeRequest(twoSnapshotClient, req2, makeMockResponse().res);
+
+    expect(getLastSnapshotCache()!.raw).toContain("Page 2");
+  });
+});
diff --git a/test/fixtures/elements.html b/test/fixtures/elements.html
new file mode 100644
index 0000000..60f66c3
--- /dev/null
+++ b/test/fixtures/elements.html
@@ -0,0 +1,288 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8">
+  <title>Element Snapshot Test</title>
+  <style>
+    body { font-family: sans-serif; max-width: 800px; margin: 2rem auto; padding: 0 1rem; }
+    section { margin-bottom: 2rem; border: 1px solid #ccc; padding: 1rem; border-radius: 4px; }
+    h2 { margin-top: 0; }
+    fieldset { margin-bottom: 1rem; }
+    details { margin-bottom: 0.5rem; }
+    table { border-collapse: collapse; width: 100%; }
+    td, th { border: 1px solid #ccc; padding: 0.4rem 0.8rem; }
+    dialog { padding: 1.5rem; border-radius: 4px; }
+  </style>
+</head>
+<body>
+
+<h1>Element Snapshot Test Page</h1>
+
+<!-- BUTTONS -->
+<section aria-label="Buttons">
+  <h2>Buttons</h2>
+  <button type="button">Default button</button>
+  <button type="submit">Submit button</button>
+  <button type="button" disabled>Disabled button</button>
+  <button type="button" aria-pressed="false">Toggle button</button>
+  <button type="button" aria-expanded="false" aria-controls="dropdown-menu">Dropdown trigger</button>
+  <ul id="dropdown-menu" hidden role="menu">
+    <li role="menuitem">Option A</li>
+    <li role="menuitem">Option B</li>
+    <li role="menuitem">Option C</li>
+  </ul>
+</section>
+
+<!-- LINKS -->
+<section aria-label="Links">
+  <h2>Links</h2>
+  <a href="https://example.com">External link</a>
+  <a href="/internal">Internal link</a>
+  <a href="mailto:test@example.com">Email link</a>
+  <a href="#anchor">Anchor link</a>
+</section>
+
+<!-- TEXT INPUTS -->
+<section aria-label="Text Inputs">
+  <h2>Text Inputs</h2>
+  <form>
+    <fieldset>
+      <legend>Basic inputs</legend>
+      <label>Text: <input type="text" name="text" placeholder="Enter text" /></label><br>
+      <label>Email: <input type="email" name="email" placeholder="user@example.com" /></label><br>
+      <label>Password: <input type="password" name="password" /></label><br>
+      <label>Number: <input type="number" name="qty" min="0" max="99" value="1" /></label><br>
+      <label>Tel: <input type="tel" name="phone" placeholder="+1 555 000" /></label><br>
+      <label>URL: <input type="url" name="website" placeholder="https://" /></label><br>
+      <label>Search: <input type="search" name="q" placeholder="Search..." /></label><br>
+      <label>Date: <input type="date" name="dob" /></label><br>
+      <label>Time: <input type="time" name="appt" /></label><br>
+      <label>Color: <input type="color" name="color" value="#ff6600" /></label><br>
+      <label>Range: <input type="range" name="volume" min="0" max="100" value="50" /></label><br>
+      <label>File: <input type="file" name="upload" /></label><br>
+      <label>Hidden: <input type="hidden" name="token" value="abc123" /></label>
+    </fieldset>
+
+    <fieldset>
+      <legend>Textarea</legend>
+      <label>Message:
+        <textarea name="message" rows="4" cols="40" placeholder="Write something..."></textarea>
+      </label>
+    </fieldset>
+  </form>
+</section>
+
+<!-- CHECKBOXES & RADIOS -->
+<section aria-label="Checkboxes and Radios">
+  <h2>Checkboxes &amp; Radios</h2>
+  <form>
+    <fieldset>
+      <legend>Checkboxes</legend>
+      <label><input type="checkbox" name="opt" value="a" checked> Option A (checked)</label><br>
+      <label><input type="checkbox" name="opt" value="b"> Option B</label><br>
+      <label><input type="checkbox" name="opt" value="c" disabled> Option C (disabled)</label><br>
+      <label><input type="checkbox" name="opt" value="d" indeterminate> Option D (indeterminate)</label>
+    </fieldset>
+
+    <fieldset>
+      <legend>Radio buttons</legend>
+      <label><input type="radio" name="plan" value="free" checked> Free</label><br>
+      <label><input type="radio" name="plan" value="pro"> Pro</label><br>
+      <label><input type="radio" name="plan" value="enterprise"> Enterprise</label>
+    </fieldset>
+  </form>
+</section>
+
+<!-- SELECT / DROPDOWNS -->
+<section aria-label="Selects">
+  <h2>Select / Dropdowns</h2>
+  <form>
+    <label>Single select:
+      <select name="country">
+        <option value="">-- choose --</option>
+        <optgroup label="Europe">
+          <option value="pl">Poland</option>
+          <option value="de">Germany</option>
+          <option value="fr">France</option>
+        </optgroup>
+        <optgroup label="Americas">
+          <option value="us">United States</option>
+          <option value="ca">Canada</option>
+        </optgroup>
+      </select>
+    </label>
+    <br><br>
+    <label>Multi-select:
+      <select name="tags" multiple size="4">
+        <option value="js">JavaScript</option>
+        <option value="ts" selected>TypeScript</option>
+        <option value="py">Python</option>
+        <option value="go" selected>Go</option>
+      </select>
+    </label>
+    <br><br>
+    <label>Datalist:
+      <input list="browsers" name="browser" placeholder="Pick a browser">
+      <datalist id="browsers">
+        <option value="Opera">
+        <option value="Chrome">
+        <option value="Firefox">
+        <option value="Safari">
+      </datalist>
+    </label>
+  </form>
+</section>
+
+<!-- LISTS -->
+<section aria-label="Lists">
+  <h2>Lists</h2>
+  <ul>
+    <li>Unordered item 1</li>
+    <li>Unordered item 2
+      <ul>
+        <li>Nested item A</li>
+        <li>Nested item B</li>
+      </ul>
+    </li>
+    <li>Unordered item 3</li>
+  </ul>
+
+  <ol>
+    <li>Ordered item 1</li>
+    <li>Ordered item 2</li>
+    <li>Ordered item 3</li>
+  </ol>
+
+  <dl>
+    <dt>Term A</dt><dd>Definition of term A</dd>
+    <dt>Term B</dt><dd>Definition of term B</dd>
+  </dl>
+</section>
+
+<!-- ARIA ROLES -->
+<section aria-label="ARIA Widgets">
+  <h2>ARIA Widgets</h2>
+
+  <div role="tablist" aria-label="Sample tabs">
+    <button role="tab" aria-selected="true" aria-controls="tab1-panel" id="tab1">Tab One</button>
+    <button role="tab" aria-selected="false" aria-controls="tab2-panel" id="tab2">Tab Two</button>
+    <button role="tab" aria-selected="false" aria-controls="tab3-panel" id="tab3">Tab Three</button>
+  </div>
+  <div role="tabpanel" id="tab1-panel" aria-labelledby="tab1">Content for Tab One.</div>
+  <div role="tabpanel" id="tab2-panel" aria-labelledby="tab2" hidden>Content for Tab Two.</div>
+  <div role="tabpanel" id="tab3-panel" aria-labelledby="tab3" hidden>Content for Tab Three.</div>
+
+  <br>
+  <div role="slider" aria-label="Brightness" aria-valuenow="60" aria-valuemin="0" aria-valuemax="100" tabindex="0">Brightness: 60%</div>
+
+  <br>
+  <div role="progressbar" aria-label="Upload progress" aria-valuenow="45" aria-valuemin="0" aria-valuemax="100">45%</div>
+
+  <br>
+  <div role="switch" aria-checked="true" tabindex="0">Dark mode: ON</div>
+
+  <br>
+  <div role="spinbutton" aria-label="Quantity" aria-valuenow="3" aria-valuemin="1" aria-valuemax="10" tabindex="0">3</div>
+
+  <br>
+  <nav aria-label="Breadcrumb">
+    <ol>
+      <li><a href="/">Home</a></li>
+      <li><a href="/products">Products</a></li>
+      <li aria-current="page">Widget</li>
+    </ol>
+  </nav>
+</section>
+
+<!-- DETAILS / SUMMARY -->
+<section aria-label="Disclosure">
+  <h2>Details / Summary</h2>
+  <details>
+    <summary>Section One</summary>
+    <p>Content inside section one.</p>
+  </details>
+  <details open>
+    <summary>Section Two (open)</summary>
+    <p>Content inside section two, visible by default.</p>
+  </details>
+  <details>
+    <summary>Section Three</summary>
+    <p>Content inside section three.</p>
+  </details>
+</section>
+
+<!-- TABLE -->
+<section aria-label="Table">
+  <h2>Table</h2>
+  <table>
+    <caption>Sample Data Table</caption>
+    <thead>
+      <tr><th scope="col">Name</th><th scope="col">Role</th><th scope="col">Status</th></tr>
+    </thead>
+    <tbody>
+      <tr><td>Alice</td><td>Engineer</td><td>Active</td></tr>
+      <tr><td>Bob</td><td>Designer</td><td>Away</td></tr>
+      <tr><td>Carol</td><td>Manager</td><td>Active</td></tr>
+    </tbody>
+    <tfoot>
+      <tr><td colspan="3">3 members total</td></tr>
+    </tfoot>
+  </table>
+</section>
+
+<!-- DIALOG -->
+<section aria-label="Dialog">
+  <h2>Dialog</h2>
+  <button type="button" onclick="document.getElementById('demo-dialog').showModal()">Open Dialog</button>
+  <dialog id="demo-dialog" aria-label="Demo dialog">
+    <h3>Modal Dialog</h3>
+    <p>This is a native dialog element.</p>
+    <button type="button" onclick="document.getElementById('demo-dialog').close()">Close</button>
+    <button type="button">Confirm</button>
+  </dialog>
+</section>
+
+<!-- MEDIA -->
+<section aria-label="Media">
+  <h2>Media</h2>
+  <img src="https://via.placeholder.com/200x100" alt="Placeholder image" width="200" height="100">
+  <br><br>
+  <figure>
+    <img src="https://via.placeholder.com/150x150" alt="Square placeholder">
+    <figcaption>Figure with caption</figcaption>
+  </figure>
+</section>
+
+<!-- LIVE REGIONS -->
+<section aria-label="Live Regions">
+  <h2>Live Regions</h2>
+  <div aria-live="polite" aria-label="Status">Ready.</div>
+  <div aria-live="assertive" role="alert">No errors.</div>
+  <div role="status">All systems operational.</div>
+  <button type="button" onclick="document.querySelector('[aria-live=polite]').textContent='Updated at ' + new Date().toLocaleTimeString()">Trigger update</button>
+</section>
+
+<!-- NAVIGATION -->
+<section aria-label="Navigation examples">
+  <h2>Navigation</h2>
+  <nav aria-label="Main navigation">
+    <ul>
+      <li><a href="/">Home</a></li>
+      <li><a href="/about">About</a></li>
+      <li><a href="/contact">Contact</a></li>
+    </ul>
+  </nav>
+
+  <nav aria-label="Pagination">
+    <a href="?page=1" aria-label="Previous page">&#8592; Prev</a>
+    <a href="?page=1" aria-current="page">1</a>
+    <a href="?page=2">2</a>
+    <a href="?page=3">3</a>
+    <a href="?page=3" aria-label="Next page">Next &#8594;</a>
+  </nav>
+</section>
+
+<p id="anchor">Anchor target at bottom of page.</p>
+
+</body>
+</html>
diff --git a/test/run.test.ts b/test/run.test.ts
index 5518f22..faf6657 100644
--- a/test/run.test.ts
+++ b/test/run.test.ts
@@ -179,16 +179,19 @@ describe("createPageHelper", () => {
     expect(result).toBe(3);
   });
 
-  it("page.snapshot strips header", async () => {
+  it("page.snapshot strips header and compacts output", async () => {
     callTool.mockResolvedValueOnce(
-      '## Latest page snapshot\nRootWebArea "Title"\n  uid=1 link "Home"',
+      '## Latest page snapshot\nuid=1_0 RootWebArea "Title"\n  uid=1_1 link "Home" url="/"',
     );
 
     const page = createPageHelper(callTool);
     const snap = await page.snapshot();
 
     expect(callTool).toHaveBeenCalledWith("take_snapshot");
-    expect(snap).toContain("RootWebArea");
+    // RootWebArea is renamed to `root` and uid= refs become @X.Y by compactSnapshot
+    expect(snap).toContain("root");
+    expect(snap).not.toContain("RootWebArea");
+    expect(snap).not.toContain("uid=");
     expect(snap).not.toContain("## Latest");
   });
 
@@ -434,15 +437,18 @@ describe("page.open fallback", () => {
 // --- 8. page.snapshot strips wrapper headers ---
 
 describe("page.snapshot header stripping", () => {
-  it("strips MCP preamble from snapshot", async () => {
+  it("strips MCP preamble from snapshot and compacts output", async () => {
     callTool.mockResolvedValueOnce(
-      'Page snapshot captured.\n\n## Latest page snapshot\n\nRootWebArea "Hi"\n  uid=1 button "OK"',
+      'Page snapshot captured.\n\n## Latest page snapshot\n\nuid=1_0 RootWebArea "Hi"\n  uid=1_1 button "OK"',
     );
 
     const page = createPageHelper(callTool);
     const snap = await page.snapshot();
 
-    expect(snap).toMatch(/^RootWebArea/);
+    // RootWebArea is renamed to `root` and uid= refs become @X.Y by compactSnapshot
+    expect(snap).toMatch(/^@1\.0 root/);
+    expect(snap).not.toContain("RootWebArea");
+    expect(snap).not.toContain("uid=");
     expect(snap).not.toContain("Latest page snapshot");
     expect(snap).not.toContain("Page snapshot captured");
   });
@@ -480,15 +486,22 @@ describe("page.eval variants", () => {
 // --- 10. isUidRef detection ---
 
 describe("isUidRef", () => {
-  it("recognizes @-prefixed numeric refs", () => {
+  it("recognizes @-prefixed numeric refs (underscore form)", () => {
     expect(isUidRef("@12")).toBe(true);
     expect(isUidRef("@1_3")).toBe(true);
     expect(isUidRef("@26_181")).toBe(true);
   });
 
+  it("recognizes @-prefixed numeric refs (compact dot form)", () => {
+    expect(isUidRef("@2.4")).toBe(true);
+    expect(isUidRef("@12.181")).toBe(true);
+    expect(isUidRef("@1.0")).toBe(true);
+  });
+
   it("recognizes bare numeric refs", () => {
     expect(isUidRef("5")).toBe(true);
     expect(isUidRef("26_181")).toBe(true);
+    expect(isUidRef("2.4")).toBe(true);
   });
 
   it("rejects CSS selectors", () => {
diff --git a/test/snapshot.test.ts b/test/snapshot.test.ts
index 08e27e7..82ad41b 100644
--- a/test/snapshot.test.ts
+++ b/test/snapshot.test.ts
@@ -6,10 +6,17 @@ import {
   isInputType,
   truncateSnapshot,
   truncateText,
+  compactSnapshot,
+  refToDisplay,
+  refToMcp,
+  cleanUrl,
+  extractPageOrigin,
+  applyUrlLut,
+  resolveUrl,
 } from "../src/snapshot.js";
 
 describe("countRefs", () => {
-  it("counts uid= occurrences", () => {
+  it("counts uid= occurrences in raw form", () => {
     const snapshot = `RootWebArea "Example"
   uid=1 button "Submit"
   uid=2 textbox "Name"
@@ -17,13 +24,20 @@ describe("countRefs", () => {
     expect(countRefs(snapshot)).toBe(3);
   });
 
+  it("counts @X.Y refs in compact form", () => {
+    const snapshot = `@1.0 root "Example"
+  @1.1 button "Submit"
+  @1.2 textbox "Name"`;
+    expect(countRefs(snapshot)).toBe(3);
+  });
+
   it("returns 0 for no refs", () => {
     expect(countRefs('RootWebArea "Empty"')).toBe(0);
   });
 });
 
 describe("extractRefs", () => {
-  it("extracts ref info from snapshot lines", () => {
+  it("extracts ref info from raw uid= lines", () => {
     const snapshot = `  uid=1 button "Submit"
   uid=2 textbox "Name"`;
     const refs = extractRefs(snapshot);
@@ -32,6 +46,21 @@ describe("extractRefs", () => {
       { ref: "2", type: "textbox", label: "Name" },
     ]);
   });
+
+  it("extracts ref info from compact @X.Y lines and normalises to display form", () => {
+    const snapshot = `  @2.1 button "Submit"
+  @2.2 textbox "Name"`;
+    const refs = extractRefs(snapshot);
+    expect(refs).toEqual([
+      { ref: "2.1", type: "button", label: "Submit" },
+      { ref: "2.2", type: "textbox", label: "Name" },
+    ]);
+  });
+
+  it("normalises uid=X_Y refs to display form", () => {
+    const refs = extractRefs('  uid=2_4 button "Go"');
+    expect(refs[0].ref).toBe("2.4");
+  });
 });
 
 describe("extractTitle", () => {
@@ -39,6 +68,15 @@ describe("extractTitle", () => {
     expect(extractTitle('RootWebArea "My Page"')).toBe("My Page");
   });
 
+  it("extracts title from compact root", () => {
+    expect(extractTitle('@1.0 root "My Page" url="https://example.com"')).toBe("My Page");
+  });
+
+  it("extracts title from compact markdown heading", () => {
+    expect(extractTitle("@1.1 # Welcome")).toBe("Welcome");
+    expect(extractTitle("@1.2 ## Section")).toBe("Section");
+  });
+
   it("falls back to heading", () => {
     expect(extractTitle('  heading "Welcome"')).toBe("Welcome");
   });
@@ -151,3 +189,605 @@ describe("truncateText", () => {
     expect(result.totalLength).toBe(120);
   });
 });
+
+// --- refToDisplay / refToMcp ---
+
+describe("refToDisplay / refToMcp", () => {
+  it("converts MCP underscore refs to dot display form", () => {
+    expect(refToDisplay("2_4")).toBe("2.4");
+    expect(refToDisplay("12_181")).toBe("12.181");
+    expect(refToDisplay("1")).toBe("1");
+  });
+
+  it("converts display refs back to MCP underscore form", () => {
+    expect(refToMcp("2.4")).toBe("2.4".replace(/\./g, "_"));
+    expect(refToMcp("12.181")).toBe("12_181");
+    expect(refToMcp("@2.4")).toBe("2_4");
+    expect(refToMcp("@2_4")).toBe("2_4");
+    expect(refToMcp("2_4")).toBe("2_4");
+  });
+
+  it("round-trips correctly", () => {
+    expect(refToMcp(refToDisplay("2_4"))).toBe("2_4");
+    expect(refToMcp(refToDisplay("12_181"))).toBe("12_181");
+  });
+});
+
+// --- compactSnapshot ---
+
+describe("compactSnapshot", () => {
+  it("drops LineBreak nodes", () => {
+    const tree = `uid=1_0 root "Page"\n  uid=1_1 button "OK"\n  uid=1_2 LineBreak "\n"\n  uid=1_3 link "Home"`;
+    const result = compactSnapshot(tree);
+    expect(result).not.toContain("LineBreak");
+    expect(result).toContain("button");
+    expect(result).toContain("link");
+  });
+
+  it("drops whitespace-only StaticText nodes", () => {
+    const tree = `uid=1_0 root "Page"\n  uid=1_1 StaticText " "\n  uid=1_2 button "OK"`;
+    const result = compactSnapshot(tree);
+    expect(result).not.toMatch(/StaticText "\s+"/);
+    expect(result).toContain("button");
+  });
+
+  it("drops StaticText children that duplicate the parent label", () => {
+    const tree = [
+      `uid=1_0 root "Page"`,
+      `  uid=1_1 link "Home" url="/"`,
+      `    uid=1_2 StaticText "Home"`,
+      `  uid=1_3 button "Submit"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    // StaticText "Home" should be gone; the link and button should remain
+    expect(result).not.toMatch(/text "Home"/);
+    expect(result).toContain('link "Home"');
+    expect(result).toContain('button "Submit"');
+  });
+
+  it("keeps StaticText children whose label differs from the parent", () => {
+    const tree = [
+      `uid=1_0 root "Page"`,
+      `  uid=1_1 link "Click here" url="/"`,
+      `    uid=1_2 StaticText "go"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).toContain("go");
+  });
+
+  it("collapses consecutive text siblings and drops when merged label echoes parent", () => {
+    const tree = [
+      `uid=1_0 root "Page"`,
+      `  uid=1_1 link "[13]" url="/wiki/cite-13"`,
+      `    uid=1_2 StaticText "["`,
+      `    uid=1_3 StaticText "13"`,
+      `    uid=1_4 StaticText "]"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).toContain('link "[13]"');
+    expect(result).not.toMatch(/text "\[/);
+    expect(result).not.toMatch(/text "13"/);
+    expect(result).not.toMatch(/text "\]"/);
+    expect(result).not.toMatch(/text "\[13\]"/);
+  });
+
+  it("collapses consecutive text siblings and keeps when merged label differs from parent", () => {
+    const tree = [
+      `uid=1_0 root "Page"`,
+      `  uid=1_1 link "World" url="/"`,
+      `    uid=1_2 StaticText "Hel"`,
+      `    uid=1_3 StaticText "lo"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).toContain('link "World"');
+    expect(result).toMatch(/text "Hello"/);
+    expect(result).not.toMatch(/@1\.3/);
+  });
+
+  it("does not collapse text nodes at different indent levels", () => {
+    const tree = [
+      `uid=1_0 root "Page"`,
+      `  uid=1_1 StaticText "A"`,
+      `    uid=1_2 StaticText "B"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).toMatch(/text "A"/);
+    expect(result).toMatch(/text "B"/);
+  });
+
+  it("drops empty valuetext attribute", () => {
+    const tree = `uid=1_0 slider "Volume" value="50" valuemax="100" valuemin="0" valuetext=""`;
+    expect(compactSnapshot(tree)).not.toContain('valuetext=""');
+  });
+
+  it("drops disableable when disabled is present", () => {
+    const tree = `uid=1_0 button "Go" disableable disabled`;
+    expect(compactSnapshot(tree)).not.toContain("disableable");
+    expect(compactSnapshot(tree)).toContain("disabled");
+  });
+
+  it("drops selectable on option and tab roles", () => {
+    const tree = [
+      `uid=1_0 option "Alpha" selectable value="a"`,
+      `uid=1_1 tab "Home" selectable`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).not.toContain("selectable");
+  });
+
+  it("drops relevant='additions text'", () => {
+    const tree = `uid=1_0 status live="polite" relevant="additions text"`;
+    expect(compactSnapshot(tree)).not.toContain('relevant="additions text"');
+  });
+
+  it("drops atomic and default live= on alert/status", () => {
+    const tree = [
+      `uid=1_0 status atomic live="polite" relevant="additions text"`,
+      `uid=1_1 alert atomic live="assertive" relevant="additions text"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).not.toContain("atomic");
+    expect(result).not.toContain('live="polite"');
+    expect(result).not.toContain('live="assertive"');
+  });
+
+  it("drops implied combobox attributes", () => {
+    const tree = `uid=1_0 combobox "Country" expandable haspopup="menu" value="Poland"`;
+    const result = compactSnapshot(tree);
+    expect(result).not.toContain("haspopup");
+    expect(result).not.toContain("expandable");
+    expect(result).toContain('combobox "Country"');
+  });
+
+  it("drops orientation='horizontal'", () => {
+    const tree = `uid=1_0 slider "Volume" orientation="horizontal" value="50"`;
+    expect(compactSnapshot(tree)).not.toContain("orientation");
+  });
+
+  it("drops autocomplete attribute", () => {
+    const tree = `uid=1_0 combobox "Search" autocomplete="both"`;
+    expect(compactSnapshot(tree)).not.toContain("autocomplete");
+  });
+
+  it("renames PascalCase role names to compact lowercase forms", () => {
+    const tree = [
+      `uid=1_0 RootWebArea "Page"`,
+      `  uid=1_1 StaticText "Hello"`,
+      `  uid=1_2 DisclosureTriangle "Details" expandable`,
+      `  uid=1_3 ColorWell "Colour" value="#ff0000"`,
+      `  uid=1_4 InputTime "Appt"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).toContain("root");
+    expect(result).toContain("text");
+    expect(result).toContain("disclosure");
+    expect(result).toContain("color");
+    expect(result).toContain("time");
+    expect(result).not.toContain("RootWebArea");
+    expect(result).not.toContain("StaticText");
+    expect(result).not.toContain("DisclosureTriangle");
+    expect(result).not.toContain("ColorWell");
+    expect(result).not.toContain("InputTime");
+  });
+
+  it("strips quotes from numeric attribute values", () => {
+    const tree = `uid=1_0 spinbutton "Qty" value="3" valuemin="1" valuemax="10"`;
+    const result = compactSnapshot(tree);
+    expect(result).toContain("value=3");
+    expect(result).toContain("valuemin=1");
+    expect(result).toContain("valuemax=10");
+    expect(result).not.toContain('value="3"');
+  });
+
+  it("converts headings to markdown style", () => {
+    const tree = [
+      `uid=1_0 root "Page"`,
+      `  uid=1_1 heading "Section One" level="1"`,
+      `  uid=1_2 heading "Subsection" level="2"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).toContain("# Section One");
+    expect(result).toContain("## Subsection");
+    expect(result).not.toContain('heading "');
+    expect(result).not.toContain("level=");
+  });
+
+  it("rewrites uid=PAGE_ELEM refs to @PAGE.ELEM display form", () => {
+    const tree = `uid=2_4 button "Submit"`;
+    const result = compactSnapshot(tree);
+    expect(result).toContain("@2.4");
+    expect(result).not.toContain("uid=");
+  });
+
+  it("processes a realistic multi-element tree and is shorter than the original", () => {
+    const tree = [
+      `uid=2_0 RootWebArea "Test Page" url="file:///test.html"`,
+      `  uid=2_1 heading "Test Page" level="1"`,
+      `  uid=2_2 region "Links"`,
+      `    uid=2_3 link "Home" url="/"`,
+      `      uid=2_4 StaticText "Home"`,
+      `    uid=2_5 StaticText " "`,
+      `    uid=2_6 LineBreak "\n"`,
+      `  uid=2_7 region "Form"`,
+      `    uid=2_8 combobox "Country" expandable haspopup="menu" value="Poland"`,
+      `      uid=2_9 option "Poland" selectable selected value="Poland"`,
+      `      uid=2_10 option "Germany" selectable value="Germany"`,
+      `    uid=2_11 status atomic live="polite" relevant="additions text"`,
+      `      uid=2_12 StaticText "Ready."`,
+    ].join("\n");
+
+    const result = compactSnapshot(tree);
+
+    // Ref format
+    expect(result).not.toContain("uid=");
+    expect(result).toContain("@2.0");
+
+    // Role renames
+    expect(result).not.toContain("RootWebArea");
+    expect(result).not.toContain("StaticText");
+    expect(result).not.toContain("LineBreak");
+
+    // Noise removal
+    expect(result).not.toContain("selectable");
+    expect(result).not.toContain("atomic");
+    expect(result).not.toContain("expandable");
+    expect(result).not.toContain('live="polite"');
+    expect(result).not.toContain('relevant=');
+
+    // Markdown headings
+    expect(result).toContain("# Test Page");
+
+    // Shorter overall
+    expect(result.length).toBeLessThan(tree.length);
+  });
+});
+
+// --- cleanUrl ---
+
+describe("cleanUrl", () => {
+  it("returns null for javascript: URLs", () => {
+    expect(cleanUrl("javascript:void(0)", null)).toBeNull();
+    expect(cleanUrl("javascript:doStuff()", "https://x.com")).toBeNull();
+  });
+
+  it("returns null for data: URLs", () => {
+    expect(cleanUrl("data:image/png;base64,abc123", null)).toBeNull();
+    expect(cleanUrl("data:text/html,<h1>hi</h1>", "https://x.com")).toBeNull();
+  });
+
+  it("strips matching page origin", () => {
+    expect(cleanUrl("https://example.com/foo", "https://example.com")).toBe("/foo");
+  });
+
+  it("returns / when URL is exactly the origin", () => {
+    expect(cleanUrl("https://example.com", "https://example.com")).toBe("/");
+  });
+
+  it("does not strip a different origin", () => {
+    expect(cleanUrl("https://other.com/foo", "https://example.com")).toBe(
+      "https://other.com/foo",
+    );
+  });
+
+  it("leaves absolute URL unchanged when origin is null", () => {
+    expect(cleanUrl("https://example.com/foo?q=bar", null)).toBe(
+      "https://example.com/foo?q=bar",
+    );
+  });
+
+  it("drops Google Analytics UTM params", () => {
+    expect(cleanUrl("/p?id=1&utm_source=nl&utm_medium=email&utm_campaign=spring", null)).toBe(
+      "/p?id=1",
+    );
+  });
+
+  it("drops Google Ads click IDs (gclid, gbraid, wbraid, dclid, gad_source)", () => {
+    expect(cleanUrl("/p?q=x&gclid=abc&gbraid=def&wbraid=ghi&dclid=jkl&gad_source=1", null)).toBe(
+      "/p?q=x",
+    );
+  });
+
+  it("drops social platform click IDs (fbclid, msclkid, yclid, igshid, ttclid, twclid)", () => {
+    expect(
+      cleanUrl("/p?id=1&fbclid=a&msclkid=b&yclid=c&igshid=d&ttclid=e&twclid=f", null),
+    ).toBe("/p?id=1");
+  });
+
+  it("drops LinkedIn, Google Shopping, and Klaviyo click IDs", () => {
+    expect(cleanUrl("/p?id=1&li_fat_id=a&srsltid=b&_ke=c", null)).toBe("/p?id=1");
+  });
+
+  it("drops Mailchimp mc_ params", () => {
+    expect(cleanUrl("/p?id=1&mc_cid=abc&mc_eid=xyz", null)).toBe("/p?id=1");
+  });
+
+  it("preserves functional params (q, id, node, page, etc.)", () => {
+    expect(cleanUrl("/search?q=keyboard&page=2&node=42", null)).toBe(
+      "/search?q=keyboard&page=2&node=42",
+    );
+  });
+
+  it("preserves ie= and _encoding= (site-specific, not generic tracking)", () => {
+    expect(cleanUrl("/p?ie=UTF8&_encoding=UTF8&node=42", null)).toBe(
+      "/p?ie=UTF8&_encoding=UTF8&node=42",
+    );
+  });
+
+  it("drops the ? entirely when all params are noise", () => {
+    expect(cleanUrl("/p?gclid=abc&utm_source=google", null)).toBe("/p");
+  });
+
+  it("preserves the fragment", () => {
+    expect(cleanUrl("https://example.com/s?q=x&gclid=y#section", "https://example.com")).toBe(
+      "/s?q=x#section",
+    );
+  });
+
+  it("preserves the fragment when there is no query", () => {
+    expect(cleanUrl("https://example.com/foo#bar", "https://example.com")).toBe("/foo#bar");
+  });
+
+  it("preserves percent-encoded values in non-noise params", () => {
+    expect(cleanUrl("/p?q=hello%20world&gclid=x", null)).toBe("/p?q=hello%20world");
+  });
+});
+
+// --- extractPageOrigin ---
+
+describe("extractPageOrigin", () => {
+  it("returns origin from RootWebArea url= attribute", () => {
+    const tree = `uid=1_0 RootWebArea "Page" url="https://www.amazon.com/s?k=x"`;
+    expect(extractPageOrigin(tree)).toBe("https://www.amazon.com");
+  });
+
+  it("returns origin from compact root + @ref form", () => {
+    const tree = `@1.0 root "Page" url="https://example.com:8080/foo"`;
+    expect(extractPageOrigin(tree)).toBe("https://example.com:8080");
+  });
+
+  it("returns null when there is no root url=", () => {
+    expect(extractPageOrigin(`uid=1_0 RootWebArea "Page"`)).toBeNull();
+  });
+
+  it("returns null for a tree without a root node", () => {
+    expect(extractPageOrigin(`uid=1_1 button "Click"`)).toBeNull();
+  });
+
+  it("returns null for an unparseable URL", () => {
+    expect(
+      extractPageOrigin(`uid=1_0 RootWebArea "Page" url="not a url"`),
+    ).toBeNull();
+  });
+});
+
+// --- compactSnapshot Layer 1 (URL + description cleanup) ---
+
+describe("compactSnapshot URL cleanup", () => {
+  it("drops javascript: url= attributes but keeps the element", () => {
+    const tree = [
+      `uid=1_0 root "Page" url="https://x.com/"`,
+      `  uid=1_1 link "Search" url="javascript:void(0)"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).not.toContain("javascript:");
+    // Link line should have no url= attribute at all (root keeps its url= for origin lookup)
+    const linkLine = result.split("\n").find((l) => l.includes('link "Search"'))!;
+    expect(linkLine).not.toContain("url=");
+    expect(linkLine).toContain('link "Search"');
+  });
+
+  it("strips the page origin from same-site URLs", () => {
+    const tree = [
+      `uid=1_0 root "Page" url="https://www.amazon.com/s?k=x"`,
+      `  uid=1_1 link "Logo" url="https://www.amazon.com/ref_logo"`,
+      `  uid=1_2 link "Other" url="https://other.com/foo"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).toContain('url="/ref_logo"');
+    expect(result).toContain('url="https://other.com/foo"');
+  });
+
+  it("drops tracking query params from URLs", () => {
+    const tree = [
+      `uid=1_0 root "Page" url="https://example.com/"`,
+      `  uid=1_1 link "News" url="https://example.com/news?utm_source=nl&utm_medium=email&gclid=abc"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    expect(result).toContain('url="/news"');
+  });
+
+  it("dedups boilerplate description repeated >= threshold times", () => {
+    const boilerplate = "use arrow keys to navigate";
+    const tree = [
+      `uid=1_0 root "Page" url="https://x.com/"`,
+      `  uid=1_1 link "A" description="${boilerplate}"`,
+      `  uid=1_2 link "B" description="${boilerplate}"`,
+      `  uid=1_3 link "C" description="${boilerplate}"`,
+      `  uid=1_4 link "D" description="${boilerplate}"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    const matches = result.match(/description=/g) ?? [];
+    expect(matches.length).toBe(1);
+    expect(result).toContain(`description="${boilerplate}"`);
+    expect(result).toContain('link "D"');
+  });
+
+  it("keeps descriptions that occur fewer times than the threshold", () => {
+    const tree = [
+      `uid=1_0 root "Page" url="https://x.com/"`,
+      `  uid=1_1 link "A" description="hint one"`,
+      `  uid=1_2 link "B" description="hint one"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    const matches = result.match(/description=/g) ?? [];
+    expect(matches.length).toBe(2);
+  });
+
+  it("strips tracking params even when page origin is unknown", () => {
+    const tree = [
+      `uid=1_0 root "Page"`,
+      `  uid=1_1 link "News" url="https://example.com/news?utm_source=nl&gclid=abc"`,
+    ].join("\n");
+    const result = compactSnapshot(tree);
+    // No origin stripping (root has no url=), but tracking params are removed
+    expect(result).toContain('url="https://example.com/news"');
+  });
+});
+
+// --- applyUrlLut ---
+
+describe("applyUrlLut", () => {
+  it("returns body unchanged and empty trailer when no URLs are present", () => {
+    const text = `@1.0 root "Page"\n  @1.1 button "Click"`;
+    const { body, trailer, urlMap } = applyUrlLut(text);
+    expect(body).toBe(text);
+    expect(trailer).toBe("");
+    expect(urlMap.size).toBe(0);
+  });
+
+  it("leaves a short URL that appears once untouched", () => {
+    const text = `@1.0 root "Page"\n  @1.1 link "Home" url="/home"`;
+    const { body, trailer } = applyUrlLut(text);
+    expect(body).toContain('url="/home"');
+    expect(trailer).toBe("");
+  });
+
+  it("tokenises a URL that appears 2+ times (dedup)", () => {
+    const repeated = "/s?k=rgb+mechanical+keyboards&category=electronics";
+    const text = [
+      `@1.0 root "Page" url="${repeated}"`,
+      `  @1.1 link "A" url="${repeated}"`,
+      `  @1.2 link "B" url="${repeated}"`,
+    ].join("\n");
+    const { body, trailer, urlMap } = applyUrlLut(text);
+    expect(body).not.toContain(`url="${repeated}"`);
+    expect(body).toMatch(/url=\$u\d/);
+    expect(urlMap.size).toBe(1);
+    const [token, url] = [...urlMap.entries()][0];
+    expect(url).toBe(repeated);
+    expect(trailer).toContain(`${token} ${repeated}`);
+    // Full URL in trailer — not hidden form
+    expect(trailer).not.toContain("[hidden");
+  });
+
+  it("assigns tokens in tree-walk (first-occurrence) order", () => {
+    const urlA = "/page-a?x=1&y=2&z=3&lots=of&params=here";
+    const urlB = "/page-b?x=1&y=2&z=3&lots=of&params=here";
+    const text = [
+      `@1.0 root "Page"`,
+      `  @1.1 link "A" url="${urlA}"`,
+      `  @1.2 link "B" url="${urlB}"`,
+      `  @1.3 link "A2" url="${urlA}"`,
+      `  @1.4 link "B2" url="${urlB}"`,
+    ].join("\n");
+    const { urlMap } = applyUrlLut(text);
+    const tokens = [...urlMap.keys()];
+    expect(tokens[0]).toBe("$u1");
+    expect(urlMap.get("$u1")).toBe(urlA);
+    expect(tokens[1]).toBe("$u2");
+    expect(urlMap.get("$u2")).toBe(urlB);
+  });
+
+  it("tokenises a long URL appearing once as a whale (hidden in trailer)", () => {
+    const whale = "/sspa/click?" + "x".repeat(200);
+    const text = `@1.0 root "Page"\n  @1.1 link "Ad" url="${whale}"`;
+    const { body, trailer, urlMap } = applyUrlLut(text);
+    expect(body).toMatch(/url=\$u\d/);
+    expect(urlMap.get("$u1")).toBe(whale);
+    // Hidden form in trailer
+    expect(trailer).toContain("[hidden");
+    expect(trailer).toContain(`${whale.length}b`);
+    expect(trailer).not.toContain(whale);
+  });
+
+  it("whale trailer includes a path-stem preview", () => {
+    const whale = "/sspa/click?spc=" + "A".repeat(200);
+    const text = `@1.0 root "Page"\n  @1.1 link "Ad" url="${whale}"`;
+    const { trailer } = applyUrlLut(text);
+    expect(trailer).toContain("→ /sspa/click?spc=");
+    expect(trailer).toContain("…");
+  });
+
+  it("cross-host whale includes host in the preview (no scheme)", () => {
+    const whale = "https://aax-us-east.amazon.com/x/c/" + "B".repeat(200);
+    const text = `@1.0 root "Page"\n  @1.1 link "Ad" url="${whale}"`;
+    const { trailer } = applyUrlLut(text);
+    // Preview should start with host, not https://
+    expect(trailer).toMatch(/→ aax-us-east\.amazon\.com/);
+  });
+
+  it("dedup wins over whale when URL is both long and repeated", () => {
+    const url = "/long?" + "x".repeat(200);
+    const text = [
+      `@1.0 root "Page"`,
+      `  @1.1 link "A" url="${url}"`,
+      `  @1.2 link "B" url="${url}"`,
+    ].join("\n");
+    const { trailer } = applyUrlLut(text);
+    // Full URL printed in trailer — not the hidden form
+    expect(trailer).toContain(url);
+    expect(trailer).not.toContain("[hidden");
+  });
+
+  it("body + trailer length does not exceed input length", () => {
+    const repeated = "/s?k=rgb+mechanical+keyboards";
+    const text = [
+      `@1.0 root "Page" url="${repeated}"`,
+      `  @1.1 link "A" url="${repeated}"`,
+      `  @1.2 link "B" url="${repeated}"`,
+      `  @1.3 link "C" url="https://other.com/` + "x".repeat(200) + `"`,
+    ].join("\n");
+    const { body, trailer } = applyUrlLut(text);
+    expect(body.length + trailer.length).toBeLessThanOrEqual(text.length);
+  });
+
+  it("trailer only lists URLs visible in the supplied text (truncation interaction)", () => {
+    const urlInBody = "/visible?k=keyboard";
+    const urlTruncated = "/hidden?k=mouse";
+    // Simulate: body text was already truncated to contain only the first URL
+    const truncatedText = `@1.0 root "Page"\n  @1.1 link "A" url="${urlInBody}"\n  @1.2 link "A" url="${urlInBody}"`;
+    const { trailer } = applyUrlLut(truncatedText);
+    expect(trailer).toContain(urlInBody);
+    expect(trailer).not.toContain(urlTruncated);
+  });
+});
+
+// --- resolveUrl ---
+
+describe("resolveUrl", () => {
+  it("resolves a $uN token via urlMap", () => {
+    const urlMap = new Map([["$u1", "https://example.com/foo"]]);
+    expect(resolveUrl("", urlMap, "$u1")).toBe("https://example.com/foo");
+  });
+
+  it("resolves $uN with leading @ stripped", () => {
+    const urlMap = new Map([["$u2", "/bar"]]);
+    expect(resolveUrl("", urlMap, "$u2")).toBe("/bar");
+  });
+
+  it("returns null for an unknown token", () => {
+    expect(resolveUrl("", new Map(), "$u99")).toBeNull();
+  });
+
+  it("resolves a plain ref to its url= attribute in the body", () => {
+    const body = `@1.0 root "Page"\n  @1.1 link "Home" url="/home"`;
+    expect(resolveUrl(body, new Map(), "1.1")).toBe("/home");
+    expect(resolveUrl(body, new Map(), "@1.1")).toBe("/home");
+  });
+
+  it("resolves a ref whose url= was tokenised, via urlMap", () => {
+    const urlMap = new Map([["$u1", "/the-real-url"]]);
+    const body = `@1.0 root "Page"\n  @1.1 link "Ad" url=$u1`;
+    expect(resolveUrl(body, urlMap, "@1.1")).toBe("/the-real-url");
+  });
+
+  it("returns null when the ref has no url= attribute", () => {
+    const body = `@1.0 root "Page"\n  @1.1 button "Click"`;
+    expect(resolveUrl(body, new Map(), "@1.1")).toBeNull();
+  });
+
+  it("returns null when the ref does not exist in the body", () => {
+    const body = `@1.0 root "Page"`;
+    expect(resolveUrl(body, new Map(), "@9.9")).toBeNull();
+  });
+});
diff --git a/test/tasks/README.md b/test/tasks/README.md
new file mode 100644
index 0000000..60e271c
--- /dev/null
+++ b/test/tasks/README.md
@@ -0,0 +1,31 @@
+# Agent Task Cost Benchmarks
+
+These tasks measure the real token cost of agent-driven browser automation
+under two snapshot formats — **compact** (default) and **raw** (MCP verbatim).
+
+## How to run a task
+
+1. Open a **fresh Claude Code session** (clear context or start new).
+2. Paste the entire contents of a task file as your first message.
+3. Let Claude complete the task.
+4. Run `/cost` to record the session cost.
+5. Repeat with the matching `*-raw` variant.
+
+Compare the `/cost` output between the two runs. The compact vs raw difference
+shows the real-world token saving an agent gets on that page type.
+
+## Tasks
+
+| File | Scenario | Expected saving |
+|---|---|---|
+| `amazon-search-compact.md` / `*-raw.md` | Amazon product search → top 5 results | High (many long tracking URLs) |
+| `hn-top-stories-compact.md` / `*-raw.md` | Hacker News front page → top 5 stories | Medium (link-heavy, clean URLs) |
+
+## Notes
+
+- Results will differ between runs (dynamic pages). That's fine — the goal is
+  cost comparison, not result comparison.
+- Both variants do the same task; only the snapshot format changes.
+- The raw variants explicitly pass `--raw` on every command so the agent sees
+  the uncompressed MCP output.
+- Record your `/cost` output next to the task file for tracking over time.
diff --git a/test/tasks/amazon-search-compact.md b/test/tasks/amazon-search-compact.md
new file mode 100644
index 0000000..595cc88
--- /dev/null
+++ b/test/tasks/amazon-search-compact.md
@@ -0,0 +1,28 @@
+# Task: Amazon product search — compact mode
+
+**Mode:** compact (default opera-browser-cli output)
+
+Use `opera-browser-cli` to search Amazon for "rgb mechanical keyboards" and
+return the top 5 results. For each result include:
+- Product title
+- Price (if visible)
+- Product URL (resolve via `opera-browser-cli url @<ref>` if the URL is
+  tokenised as `$uN` in the snapshot)
+
+## Rules
+
+- Use `opera-browser-cli` for all browser interaction (it is available in PATH).
+- Do NOT pass `--raw` to any command — use the default compact output.
+- Skip sponsored/ad results; only list organic results.
+- If you need to scroll to find more results, do so.
+- When you are done, output the 5 results in this exact format and nothing else:
+
+```
+1. <title> | <price or "n/a"> | <url>
+2. <title> | <price or "n/a"> | <url>
+3. <title> | <price or "n/a"> | <url>
+4. <title> | <price or "n/a"> | <url>
+5. <title> | <price or "n/a"> | <url>
+```
+
+Start now.
diff --git a/test/tasks/amazon-search-raw.md b/test/tasks/amazon-search-raw.md
new file mode 100644
index 0000000..054d278
--- /dev/null
+++ b/test/tasks/amazon-search-raw.md
@@ -0,0 +1,31 @@
+# Task: Amazon product search — raw mode
+
+**Mode:** raw (uncompressed MCP output)
+
+Use `opera-browser-cli` to search Amazon for "rgb mechanical keyboards" and
+return the top 5 results. For each result include:
+- Product title
+- Price (if visible)
+- Product URL
+
+## Rules
+
+- Use `opera-browser-cli` for all browser interaction (it is available in PATH).
+- Pass `--raw` on EVERY command that produces a snapshot, e.g.:
+  - `opera-browser-cli open <url> --raw`
+  - `opera-browser-cli snapshot --raw`
+  - `opera-browser-cli scroll down --raw`
+  - `opera-browser-cli click @<ref> --raw`
+- Skip sponsored/ad results; only list organic results.
+- If you need to scroll to find more results, do so.
+- When you are done, output the 5 results in this exact format and nothing else:
+
+```
+1. <title> | <price or "n/a"> | <url>
+2. <title> | <price or "n/a"> | <url>
+3. <title> | <price or "n/a"> | <url>
+4. <title> | <price or "n/a"> | <url>
+5. <title> | <price or "n/a"> | <url>
+```
+
+Start now.
diff --git a/test/tasks/hn-top-stories-compact.md b/test/tasks/hn-top-stories-compact.md
new file mode 100644
index 0000000..63e3c4a
--- /dev/null
+++ b/test/tasks/hn-top-stories-compact.md
@@ -0,0 +1,24 @@
+# Task: Hacker News top stories — compact mode
+
+**Mode:** compact (default opera-browser-cli output)
+
+Use `opera-browser-cli` to open Hacker News and return the top 5 stories.
+For each story include:
+- Story title
+- Domain/source (e.g. "github.com")
+- Points and comment count (if visible)
+- URL of the story itself (not the HN comments page)
+
+## Rules
+
+- Use `opera-browser-cli` for all browser interaction (it is available in PATH).
+- Do NOT pass `--raw` to any command — use the default compact output.
+- If a URL is tokenised as `$uN`, resolve it with `opera-browser-cli url $uN`.
+- When you are done, output the 5 results in this exact format and nothing else:
+
+```
+1. <title> | <domain> | <points> pts <comments> comments | <url>
+2. ...
+```
+
+Start now.
diff --git a/test/tasks/hn-top-stories-raw.md b/test/tasks/hn-top-stories-raw.md
new file mode 100644
index 0000000..8774ce6
--- /dev/null
+++ b/test/tasks/hn-top-stories-raw.md
@@ -0,0 +1,27 @@
+# Task: Hacker News top stories — raw mode
+
+**Mode:** raw (uncompressed MCP output)
+
+Use `opera-browser-cli` to open Hacker News and return the top 5 stories.
+For each story include:
+- Story title
+- Domain/source (e.g. "github.com")
+- Points and comment count (if visible)
+- URL of the story itself (not the HN comments page)
+
+## Rules
+
+- Use `opera-browser-cli` for all browser interaction (it is available in PATH).
+- Pass `--raw` on EVERY command that produces a snapshot, e.g.:
+  - `opera-browser-cli open <url> --raw`
+  - `opera-browser-cli snapshot --raw`
+  - `opera-browser-cli scroll down --raw`
+  - `opera-browser-cli click @<ref> --raw`
+- When you are done, output the 5 results in this exact format and nothing else:
+
+```
+1. <title> | <domain> | <points> pts <comments> comments | <url>
+2. ...
+```
+
+Start now.