feat: configurable tool-round cap and auto-compaction follow-ups#83
Merged
Conversation
A globally-configured auto_threshold_tokens sized for a 1M-context model would previously refuse a /model swap to a 200K-context model with "Config change failed: auto compaction threshold must be at most N tokens", forcing the user to edit config before every swap. Mirror the percent path's silent clamp: when an absolute threshold exceeds the new model's safe trigger, fall back to the model default rather than bail. The 50K floor still rejects too-small typos.
…iling The model-cap ceiling now clamps silently (96edff8), but the 50K floor still bailed with "auto compaction threshold must be at least 50000 tokens" on the token path, and bailed with a derived-tokens context line on the percent path. Two paths, two policies, opposite resolutions. Collapse to one shape: out-of-range thresholds snap to the nearest safe value in either direction. The percent-range check (1-100) stays as a bail because that's a domain check, not a clamp decision. Drops three rejecting tests in favor of clamp assertions covering the same surface, and rewrites the guide so the snap-into-range rule is described once rather than split across two sentences.
The derived defaults 167_000 (any 200K-window model with max_tokens saturating the reserve cap) and 967_000 ([1m]-tagged 1M window) were hardcoded across seven test sites in three files. Tweaking AUTO_COMPACTION_BUFFER_TOKENS or AUTO_COMPACTION_OUTPUT_RESERVE_CAP required touching them all. Introduce a single `test_thresholds` module next to `default_auto_threshold` with WINDOW_200K and WINDOW_1M, plus one anchor test that derives both through `default_auto_threshold` so the literals stay honest. Other tests reference the names; the only literals that survive are the rendered status-line messages, which now format! the threshold rather than embed its decimal repr.
…old_floor threshold_from_tokens and threshold_from_percent both open-coded the ceiling clamp with .map_or(_, |max| _.min(max)). Lift it into a named helper so the two paths read as floor(ceil(raw)) — one shape for both clamps and both inputs.
Half the send sites guarded with `_ = sink.send(X)` (silent on drop),
half with `if let Err(e) = sink.send(X) { tracing::error!(...) }` (logs
on drop). The mix meant Error / TurnComplete / Cancelled events could
vanish without a trace when the TUI channel filled, while
SessionResumed / SessionCompacted were already logged.
Add a default `AgentSink::emit(event, label)` that wraps send-and-log.
Sweep every call site whose loss would visibly desync the UI — turn
state, lifecycle, user-facing errors — onto it. Per-token streaming
(StreamToken, ThinkingToken) keeps raw `send` since dropping one token
is harmless and logging would spam.
display_auto_compaction prefixed the threshold with "on at", producing
"Auto compaction on at 167000 tokens." in status modals and config-
change messages. The leading "on" doubled with the noun ("compaction
on") and the resulting "on at" reads awkwardly.
Drop "on": "at {threshold} tokens" / "off" / "off (no threshold)".
The kv-row label and the format-config sentence both read cleanly.
The overlay rendered as a right-aligned Paragraph spanning the full chat width with theme.surface() as the widget style. Theme surface() in mocha (and other built-ins) inherits the terminal bg, so the row underneath wasn't masked and "4 new messages (ctrl+End) ↓" collided with the chat text it overlapped, e.g. "crates/oxide-code/sr4 new messages". Render the overlay as a pill — narrow Rect sized to label + 1-cell padding per side, fill it with a Block to guarantee opaque bg, then paint the spans inside. Chat content to the left of the pill stays untouched.
…user When compact_session errored mid-flight, the auto-compaction helper only `warn!`-logged and returned. The TUI had already shown Status::Compacting (via AutoCompactionStarted), so the user saw the spinner clear with no signal that compaction had not actually happened — the next prompt would clobber their token budget unexpectedly. After MAX_AUTO_COMPACT_FAILURES failures the early-return path also disabled auto-compaction silently for the rest of the session. Emit AgentEvent::Error with the failure detail on summarize-fail. Emit a single "Auto-compaction disabled for this session after 3 failures." notice on the bump that trips the breaker (either summarize-fail or persist-fail; persist already surfaced its own write error). Wrap the counter-bump in a record_auto_compact_failure helper so both call sites share the disablement check.
Run the CLAUDE.md prose rules over agent.rs comments: - Replace inappropriate `;` joiners with periods (or transitions where the two clauses are closely linked). Semicolons are now reserved for the cases where two independent clauses truly belong as one thought. - Tighten verbose docstrings (`TurnAbort`, `agent_turn`) to one-paragraph contracts instead of multi-paragraph mechanic restatements. - Add the missing `auto_compact_if_needed` docstring so its three return states (compacted / skipped / failed) are explicit. - Wrap the `#[expect(too_many_arguments)]` reason inside 100 columns. - Drop the inappropriate em-dash on `inert_user_rx`'s docstring.
Apply the same prose rules (no task-narration leads, fewer awkward semicolon joiners, contract-first docs) to the files the agent.rs sweep left untouched: welcome.rs regression lead, the multi-bullet stream_message wire-shape doc, SessionCompacted / ConfigChanged / AgentSink trait docs, and the redundant SUMMARY_PREFIX context line.
`paused_counter_saturates` described the mechanism. The scenario is "does not overflow at u32::MAX", which is what the test actually verifies. `jump_overlay_label_renders_idle_and_new_content_variants` bundled the idle and pluralization scenarios; splitting into one idle test and one pluralization test (two count values for the `s` suffix toggle) keeps the failure localized without fragmenting the suite.
`resolve_base` calls `is_dated_model_id` first, then `is_selectable_known_id`, then `candidates`. The helpers now follow that sequence so a top-down read tracks the resolver's branching, and `has_dated_suffix` (the only call from `is_dated_model_id`) sits with its caller. `empty_metadata_offenders_flags_a_synthetic_violator` named the mechanism. The scenario it covers is "reports a command with an empty description", so use that. In session/state.rs, the three `commit_compact_*` tests were nested under the `compact_entries` section despite covering a separate production function. Split them under `// ── commit_compact ──`.
The lead comments on `draw_frame_streaming_shows_spinner_in_status_bar`
and `parse_theme_inline_supports_every_modifier_flag` narrated the test
setup; the snapshot / assertions speak for themselves. The lead on
`reload_sets_load_error_and_clears_rows_when_list_paged_fails` was the
useful half — keep the WHY (footer signal) and drop the mechanism
("Pin the Err arm"). Rename `ctrl_d_pushes_confirm_modal_for_cursor_row`
to `ctrl_d_or_delete_pushes_...` so the name matches the test body, and
drop the comment that compensated for the gap.
In agent.rs, the `// ── agent_turn ──` test-section divider covered
only the shared fixtures (FakeClient, EchoTool, helpers). Rename it to
`// ── Fixtures ──` and add a real `// ── agent_turn ──` divider above
the first `agent_turn_*` test.
In tui/app.rs the `render_app` / `rendered_text` / `long_chat_block` fixtures lived under `// ── draw_frame ──` but were used by tests in several other sections, and they sat between the divider and the actual `draw_frame_*` tests. Hoist them into the top-level fixtures block at the head of the test module. The `jump_overlay_label` section was interleaved between draw_frame fixtures and tests; move it to after the draw_frame tests so each section reads as one contiguous group. In tui/components/chat.rs the `paused_counter_*` tests sit under `update_layout` but cover `bump_paused_counter`. Add the missing `// ── bump_paused_counter ──` divider. In agent/event.rs two adjacent `impl StdioSink` blocks held `new` and `render` with no semantic separation. Merge them. In slash/matcher.rs the `rank_by_prefix` helper (no internal callers, exported for sibling slash modules) sat between `matching_alias` and `on_name` / `on_alias`, splitting the `best_match` helper cluster. Move it after the cluster so the `best_match` helpers stay contiguous.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
… unbounded The hardcoded 25-round cap was tripping legitimate multi-hour agentic sessions. Default now runs unbounded; users opt into a cap via `[client] max_tool_rounds = N` or `OX_MAX_TOOL_ROUNDS=N`.
Folds the last inline `if let Err = sink.send()` / `tracing::error!` pattern into the canonical `emit` wrapper that the surrounding sweep introduced.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A follow-up sweep after the auto-compaction PR (#82). Fixes the model-swap clamp bug, surfaces auto-compaction failures to the user, tightens the jump-to-bottom overlay so it stops bleeding into chat, removes the hardcoded 25-round tool-loop cap in favor of a configurable opt-in, and rolls in a broader style pass over comments, test names, and section organization.
Design decisions
test_thresholdsconstants module shared by config, client, and welcome tests; the anchor test derives them fromdefault_auto_thresholdso a change to the reserve cap or buffer fails one assertion instead of drifting silently across files.AgentSink::emitlogs aterror!when the channel rejects a one-shot or user-facing event._ = sink.send(...)stays only for per-token streaming where a dropped event is harmless. Auto-compaction failures, the breaker trip, and theSessionRollednotification all route through the canonical wrapper as user-visibleErrorevents or one-shot signals.Paragraphform was leaking chat text through. ABlock-filledRectwith 1-cell padding gives the overlay an opaque surface bg, matching how the rest of the chrome paints.MAX_TOOL_ROUNDS = 25was tripping legitimate multi-hour agentic sessions. The new[client] max_tool_rounds(orOX_MAX_TOOL_ROUNDS) isOption<u32>:Noneruns unbounded;Some(N)bails the turn afterNrounds with the existing runaway-loop error. The cap stays available as a guard against tools stuck in a retry loop, but normal agent behavior no longer needs to fit inside an arbitrary 25-round budget.;joiners that read as run-on sentences, "X, not Y" framing that negates a strawman, multi-paragraph contracts that should be one tight paragraph, andRegression:/Pin the X arm of Y:task-narration leads on tests whose names already say what they cover.markdown/render.rs(Paragraphs, Headings, Lists, ...) or missingwrap_line_prefixes in single-function modules, the existing form is more useful and stays. Where a divider was actively misleading (agent_turncovering only fixtures,paused_counter_*sitting underupdate_layout,jump_overlay_labelinterleaved intodraw_frametests), the section is renamed or moved.Changes
agent.rsagent_turntakesmax_tool_rounds: Option<u32>(None= unbounded,Some(n)= bail afternrounds). Module doc, loop, and bail message updated. Replaces the 25-round const-test with pairedwith_some_cap_bails_*andwith_none_cap_runs_unbounded_*tests.agent.rs,agent/compact_boundary.rs,agent/event.rs,agent/compaction.rs,session/title_generator.rs,main.rsAgentSink::emit. Auto-compaction failures, breaker trip, andSessionRolledroute through it as user-visible signals. Comment sweep: drop;joiners that read as run-on sentences, tighten verbose docstrings, drop one task-narration lead.config.rsthreshold_from_tokensreturnsu32. Anchor test pins the 200K / 1M window thresholds throughdefault_auto_threshold.display_auto_compactionreads"at {n} tokens"instead of"on at {n} tokens". Newmax_tool_rounds: Option<u32>field onConfigandConfigSnapshot, loaded from[client] max_tool_roundsorOX_MAX_TOOL_ROUNDS. Newdisplay_max_tool_roundshelper for/config.config/file.rsClientConfigcarriesmax_tool_rounds: Option<u32>. Merge and parser tests extended.client/anthropic.rsstream_messagedoc dropped its multi-bullet "Wire shape" form for a tighter contract paragraph.set_modeltests reuse the newtest_thresholdsconstants. NewClient::max_tool_rounds()getter.main.rsagent_turncall sites (AgentLoopTask::handle_submit_prompt,bare_repl,headless) passclient.max_tool_rounds(). The remaining inlineif let Err = sink.send/tracing::error!onSessionRolledfolds intosink.emit.slash/config.rs/configmodal adds a "Max Tool Rounds" row showingunboundedor the configured value. Height anchor updated. Test assertions match the new"at {n} tokens"phrasing.tui/app.rsBlock+ 1-cell padding) so it no longer bleeds through chat.render_app/rendered_text/long_chat_blockhoisted to top-level fixtures since several sections use them.jump_overlay_labelsection moved out of thedraw_frameinterleave.paint_below_starters_*anddraw_frame_streaming_*drop their task-narration leads.tui/components/chat.rs// ── bump_paused_counter ──test divider. Renamepaused_counter_saturatestopaused_counter_does_not_overflow_at_u32_max.tui/components/welcome.rs,tui/theme/loader.rsslash/model.rsresolve_base's call sequence (is_dated_model_id→has_dated_suffix→is_selectable_known_id→candidates→ listing).slash/matcher.rsrank_by_prefixmoved after thebest_matchhelper cluster so the cluster stays contiguous.slash/registry.rsempty_metadata_offenders_flags_a_synthetic_violatorto a scenario-keyed name.slash/resume.rsctrl_d_pushes_*toctrl_d_or_delete_pushes_*since the body covers both; drop the now-redundant comment. Tighten thereload_*comment to the WHY.slash/status.rs"at {n} tokens"phrasing.session/state.rscommit_compact_*tests out of thecompact_entriessection.docs/guide/configuration.mdmax_tool_roundsrow in the[client]table, env-var row, and short prose section explaining the unbounded default and opt-in cap.Test plan
cargo fmt --all --checkcargo buildcargo clippy --all-targets -- -D warnings: zero warningscargo test: 2005 tests passcargo llvm-cov --ignore-filename-regex 'main\.rs': 98.63% line coveragepnpm lintpnpm spellcheck