Skip to content

M4.1 — [Tracking] Harden and release the benchmark #8

Description

@GiggleLiu

Background

New here? Read #9 first — it explains the project and defines every term below.

This is a tracking issue — a deliberately coarse placeholder for the final "make it solid and shippable" milestone. We keep far-off work coarse on purpose and split it into specific, individually-checkable issues only when its turn comes (after the public site, #6#7, lands).

Objective

Take the working multi-model public benchmark to a maintainable, reproducible, documented release. To be split into specific issues when M3 is done.

Definition of done (what the split-out issues will cover)

Verification (how a reviewer confirms this is done — for now)

This issue lists the four areas above, each with a one-line "done" criterion. When #6#7 land, re-run the decomposition (the /infinite-bullets skill) to split it into individual, separately-verifiable issues. As a tracking issue, its job right now is to capture scope — not to be implemented.

Dependencies

Depends on M1–M3 (#1#7).

Out of scope

Anything already covered by #1#7.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions