Track live agent benchmark validation for Gemini CLI.
Command target:
ctx leaderboard --hallucination --live --agent gemini
Acceptance:
- Run in an external environment where Gemini CLI auth/session is available.
- Capture skipped/error output if the CLI is unavailable or blocked.
- Do not use this result in launch copy until it is reproducible.
- Keep offline deterministic benchmark as the primary launch claim.
Track live agent benchmark validation for Gemini CLI.
Command target:
Acceptance: