M3.1 — Build the public leaderboard: dual metrics + certificate drilldown + trajectory

## Background
> New here? Read **#9** first — it explains the project and defines every term below.

Issue #3 made a minimal page that shows real data. This issue grows it into the public product. Two ranking metrics matter: **bugs per 1000 tokens** (fair across models regardless of their pricing) and **bugs per dollar** (practical cost-efficiency). Visitors should be able to click a model to see its bugs, click a bug to see the full counterexample (source problem → target problem → solutions → verdict), and view the **trajectory** — the exact sequence of `pred` commands the AI ran to find that bug, for transparency. It's a static site (just files, no server).

## Objective
Grow the minimal page into the public product: a dual-metric ranked table, per-model breakdown, per-bug detail pages, and a link to each bug's discovery trajectory.

## Interface (Input → Output)
- **Input:** the results files + index from #3/#4 + each session's saved trajectory (the log of commands the AI ran).
- **Output:** static pages — the leaderboard, a per-model page, and a per-bug detail page.

## Technical recommendations (suggestions)
- Reuse the existing chart; add a toggle between the two metrics.
- The per-bug page shows the certificate (source → target → solutions → verdict) and links to its trajectory.
- The trajectory view reads the agent's saved log and shows the `pred` commands in order.

## Verification (how a reviewer confirms this is done)
Use a small fixture with **2 models chosen so the two metrics rank them in opposite orders**:
1. Toggle the metric between "bugs / 1000 tokens" and "bugs / $" → the table **reorders**. *(Proves both metrics are really computed, not just labels.)*
2. Click a bug → the source/target/solutions/verdict shown **match the JSON file**, and the trajectory link shows the `pred` commands ending in exactly that bug.
3. A model that found **0 bugs** still appears, listed with 0.
4. A bug whose trajectory file is missing still renders ("trajectory unavailable") instead of breaking the page.

## Dependencies
Depends on #3 (results format + rendering) and #4 (multiple models' results).

## Out of scope
A second agent track (opencode); the full offline-reproducibility archive (#8).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

M3.1 — Build the public leaderboard: dual metrics + certificate drilldown + trajectory #6

Background

Objective

Interface (Input → Output)

Technical recommendations (suggestions)

Verification (how a reviewer confirms this is done)

Dependencies

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

M3.1 — Build the public leaderboard: dual metrics + certificate drilldown + trajectory #6

Description

Background

Objective

Interface (Input → Output)

Technical recommendations (suggestions)

Verification (how a reviewer confirms this is done)

Dependencies

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions