Parent
Spawned from #8 (M4.1 tracking issue).
Objective
Surface per-session cost, failure reasons, and AI error rates so benchmark health can be monitored.
Definition of done
results/*.json per-result rows include an optional error field when result starts with error:
build_index.py aggregates error counts into the index entry (error_count, skip_count)
make test-unit still passes
Dependencies
Depends on #3, #4.
Parent
Spawned from #8 (M4.1 tracking issue).
Objective
Surface per-session cost, failure reasons, and AI error rates so benchmark health can be monitored.
Definition of done
results/*.jsonper-result rows include an optionalerrorfield whenresultstarts witherror:build_index.pyaggregates error counts into the index entry (error_count,skip_count)make test-unitstill passesDependencies
Depends on #3, #4.