Second thoughts: hide the conviction panel, widen the backfill window#380
Merged
Conversation
A day of production data showed the reviewer is reliable (100% write rate on mistral-small) and honest - it stays quiet on good answers and fires real, well-reasoned doubts on sloppy ones (confirmed while testing weaker main models). At a ~95%+ conviction base rate, a calm "stands by it" row on every fine answer is just chrome. Suppress the panel on conviction: AssistantBody gates the mount on a reinstated isDoubt(disposition), so only hedge/reframe/correct render. The reviewer still runs on every turn and the verdict still persists - this is purely a display gate - so an absent panel now means "reviewed, no doubt" and a visible panel always means something. Doubts stand out instead of being one more row among many, which is exactly what you want while model-testing. Also widen VERDICT_BACKFILL_DELAY_MS from 8s to 20s: observed reviewer latency ran to 12s, past the old backfill window, so a slow verdict plus a dropped realtime echo could still need a manual refresh. The backfill only fires on the rare dropped-echo path, where a slightly longer wait costs nothing and covering the slow tail matters more. vitest covers isDoubt; user manual, dev doc, and QA walkthrough updated to "conviction shows nothing".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SYNOPSIS
Two polish changes from a day of production data. The reviewer is now reliable and calibrated, so let conviction go quiet (only render doubts) and widen the delivery backstop.
PURPOSE
A day of live use showed the reliability fixes worked: 100% verdict write rate on
mistral-small, latency avg ~5s. And the reviewer is honest, not a rubber stamp - it stayed near-all-conviction on a good-model, task-heavy day, and (per the user) fired a lot of well-reasoned doubts when weaker models were driving. So a doubt is a trustworthy signal that tracks answer quality; conviction is just "nothing to see here."At a ~95%+ conviction base rate, a calm "stands by it" row on every fine answer is chrome. And the reviewer latency ran as high as 12s - past the 8s backfill window - so a slow verdict plus a dropped realtime echo could still need a manual refresh.
DESCRIPTION
AssistantBodygates the panel mount on a reinstatedisDoubt(disposition)primitive;convictionshows nothing. The reviewer still runs on every turn and the verdict still persists - purely a display gate. So an absent panel now means "reviewed, no doubt" and a visible panel always means something (which is exactly what you want while model-testing - the sloppy-answer flags pop instead of hiding among calm rows).Notes for reviewers
conviction. The DB still records every verdict; nothing about the reviewer changed.isDoubtreturns (deleted twice before as premature; now it has a real production consumer - the mount gate). vitest covers it. User manual, dev doc, QA walkthrough updated.mise run check+mise run knipgreen.Not verified
Cloud session, no browser: the visible effect (conviction rows disappearing, doubts still rendering) wants a glance on a live turn.
Generated by Claude Code