Computer Control Explorer

A self-contained, read-only web app for exploring a computer-control task dataset — browse tasks by category, read each task's files and metadata, and inspect parsed agent trajectories (the commands the agent ran), test results, and the oracle solve output. It parses Harbor's ATIF trajectory format (terminus-2 and codex), so it renders exactly what the eval harness produces.

It reads only the dataset — no database, no backend services — which makes it safe to hand to anyone who wants to explore a shipped dataset. The architecture and build plan live in PLAN.md; contributor guidance in AGENTS.md.

The data contract

Everything comes from one directory, given by the DATASET_DIR environment variable:

$DATASET_DIR/
  dataset/<slug>/        task.toml, instruction.md, README.md, rubric.txt,
                         solution/, tests/, environment/, …
  out/jobs/.../<trial>/  result.json, agent/trajectory.json (ATIF),
                         verifier/test-stdout.txt          (eval trials; optional)

Tasks come from dataset/; trials are discovered by walking out/jobs/ for any result.json that carries a trial_name + task_checksum. No other inputs.

Running it

npm install

Dev server (reads the dataset at request time, hot reload):

DATASET_DIR=/path/to/bundle npm run dev

Static export — a self-contained out/ with no runtime dependencies:

DATASET_DIR=/path/to/bundle npm run export   # → out/

The export uses trailing-slash routes (/task/<slug>/index.html), so it serves on any static file server with no rewrite rules:

python3 -m http.server -d out 8080      # or: npx serve out

Docker — a dataset-agnostic server image; mount the bundle as a volume and it reads it at request time. Pull the published multi-arch (amd64/arm64) image from Docker Hub — CI builds & pushes vettoai/computer-control-explorer on each release tag:

docker run --rm -p 3000:3000 -v /path/to/bundle:/data:ro -e DATASET_DIR=/data \
  vettoai/computer-control-explorer

…or build it locally:

docker build -t computer-control-explorer .
docker run --rm -p 3000:3000 -v /path/to/bundle:/data:ro -e DATASET_DIR=/data \
  computer-control-explorer

What it shows

Per task: a collapsible file tree with syntax-highlighted file contents and metadata, and a Trials tab with pass rate by run (model × task version × job folder), the oracle solve, and every agent trial — each with its parsed + raw trajectory (ATIF; terminus-2 and codex), test output, and reward. Crashed runs surface their error inline.

License

MIT. (Repository is currently private; intended to be open-sourced.)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
public		public
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer Control Explorer

The data contract

Running it

What it shows

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Computer Control Explorer

The data contract

Running it

What it shows

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages