Track live agent benchmark validation for Claude Code.
Command target:
ctx leaderboard --hallucination --live --agent claude
Acceptance:
- Run in an external environment where Claude Code CLI auth/session is available.
- Capture skipped/error output if the CLI is unavailable or blocked.
- Do not use this result in launch copy until it is reproducible.
- Keep offline deterministic benchmark as the primary launch claim.
Track live agent benchmark validation for Claude Code.
Command target:
Acceptance: