Skip to content

Second thoughts: hide the conviction panel, widen the backfill window#380

Merged
sysread merged 1 commit into
mainfrom
claude/second-thoughts-feature-nca3sf
Jul 3, 2026
Merged

Second thoughts: hide the conviction panel, widen the backfill window#380
sysread merged 1 commit into
mainfrom
claude/second-thoughts-feature-nca3sf

Conversation

@sysread

@sysread sysread commented Jul 3, 2026

Copy link
Copy Markdown
Owner

SYNOPSIS

Two polish changes from a day of production data. The reviewer is now reliable and calibrated, so let conviction go quiet (only render doubts) and widen the delivery backstop.

PURPOSE

A day of live use showed the reliability fixes worked: 100% verdict write rate on mistral-small, latency avg ~5s. And the reviewer is honest, not a rubber stamp - it stayed near-all-conviction on a good-model, task-heavy day, and (per the user) fired a lot of well-reasoned doubts when weaker models were driving. So a doubt is a trustworthy signal that tracks answer quality; conviction is just "nothing to see here."

At a ~95%+ conviction base rate, a calm "stands by it" row on every fine answer is chrome. And the reviewer latency ran as high as 12s - past the 8s backfill window - so a slow verdict plus a dropped realtime echo could still need a manual refresh.

DESCRIPTION

  • Panel renders only for a doubt. AssistantBody gates the panel mount on a reinstated isDoubt(disposition) primitive; conviction shows nothing. The reviewer still runs on every turn and the verdict still persists - purely a display gate. So an absent panel now means "reviewed, no doubt" and a visible panel always means something (which is exactly what you want while model-testing - the sloppy-answer flags pop instead of hiding among calm rows).
  • Backfill window 8s -> 20s. Past the observed max latency with margin. It only fires on the rare dropped-echo path, where a slightly longer wait costs nothing and covering the slow tail beats being quick.

Notes for reviewers

  • Behavior change is intentional: users will stop seeing a per-message row on the ~95% of answers that get conviction. The DB still records every verdict; nothing about the reviewer changed.
  • isDoubt returns (deleted twice before as premature; now it has a real production consumer - the mount gate). vitest covers it. User manual, dev doc, QA walkthrough updated. mise run check + mise run knip green.

Not verified

Cloud session, no browser: the visible effect (conviction rows disappearing, doubts still rendering) wants a glance on a live turn.


Generated by Claude Code

A day of production data showed the reviewer is reliable (100% write
rate on mistral-small) and honest - it stays quiet on good answers and
fires real, well-reasoned doubts on sloppy ones (confirmed while
testing weaker main models). At a ~95%+ conviction base rate, a calm
"stands by it" row on every fine answer is just chrome.

Suppress the panel on conviction: AssistantBody gates the mount on a
reinstated isDoubt(disposition), so only hedge/reframe/correct render.
The reviewer still runs on every turn and the verdict still persists -
this is purely a display gate - so an absent panel now means "reviewed,
no doubt" and a visible panel always means something. Doubts stand out
instead of being one more row among many, which is exactly what you
want while model-testing.

Also widen VERDICT_BACKFILL_DELAY_MS from 8s to 20s: observed reviewer
latency ran to 12s, past the old backfill window, so a slow verdict
plus a dropped realtime echo could still need a manual refresh. The
backfill only fires on the rare dropped-echo path, where a slightly
longer wait costs nothing and covering the slow tail matters more.

vitest covers isDoubt; user manual, dev doc, and QA walkthrough updated
to "conviction shows nothing".
@sysread sysread merged commit 16e987e into main Jul 3, 2026
1 check passed
@sysread sysread deleted the claude/second-thoughts-feature-nca3sf branch July 3, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants