Skip to content

M4.1d — Observability: per-session cost and failure report #13

Description

@Ferrari-72

Parent

Spawned from #8 (M4.1 tracking issue).

Objective

Surface per-session cost, failure reasons, and AI error rates so benchmark health can be monitored.

Definition of done

  • results/*.json per-result rows include an optional error field when result starts with error:
  • build_index.py aggregates error counts into the index entry (error_count, skip_count)
  • make test-unit still passes

Dependencies

Depends on #3, #4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions