Skip to content

feat: configurable tool-round cap and auto-compaction follow-ups#83

Merged
hakula139 merged 16 commits into
mainfrom
refactor/sweep
May 13, 2026
Merged

feat: configurable tool-round cap and auto-compaction follow-ups#83
hakula139 merged 16 commits into
mainfrom
refactor/sweep

Conversation

@hakula139
Copy link
Copy Markdown
Owner

@hakula139 hakula139 commented May 13, 2026

Summary

A follow-up sweep after the auto-compaction PR (#82). Fixes the model-swap clamp bug, surfaces auto-compaction failures to the user, tightens the jump-to-bottom overlay so it stops bleeding into chat, removes the hardcoded 25-round tool-loop cap in favor of a configurable opt-in, and rolls in a broader style pass over comments, test names, and section organization.

Design decisions

  • Clamp the auto-compaction threshold silently on both ends. A user-configured threshold above the active model's safe trigger now snaps down instead of erroring on swap, mirroring the existing snap-up at the 50K floor. The percent path already worked this way, so the two arms are now symmetric.
  • Centralize the threshold table on the context window, not the model id. The 200K / 1M window pins live in one test_thresholds constants module shared by config, client, and welcome tests; the anchor test derives them from default_auto_threshold so a change to the reserve cap or buffer fails one assertion instead of drifting silently across files.
  • Make sink event delivery loggable. New AgentSink::emit logs at error! when the channel rejects a one-shot or user-facing event. _ = sink.send(...) stays only for per-token streaming where a dropped event is harmless. Auto-compaction failures, the breaker trip, and the SessionRolled notification all route through the canonical wrapper as user-visible Error events or one-shot signals.
  • Render the jump-to-bottom overlay inside a sized pill. The transparent right-aligned Paragraph form was leaking chat text through. A Block-filled Rect with 1-cell padding gives the overlay an opaque surface bg, matching how the rest of the chrome paints.
  • Default the per-turn tool-round cap to unbounded. The hardcoded MAX_TOOL_ROUNDS = 25 was tripping legitimate multi-hour agentic sessions. The new [client] max_tool_rounds (or OX_MAX_TOOL_ROUNDS) is Option<u32>: None runs unbounded; Some(N) bails the turn after N rounds with the existing runaway-loop error. The cap stays available as a guard against tools stuck in a retry loop, but normal agent behavior no longer needs to fit inside an arbitrary 25-round budget.
  • Drop the antithesis / semicolon over-use in prose comments. Sweep agent.rs and peers for ; joiners that read as run-on sentences, "X, not Y" framing that negates a strawman, multi-paragraph contracts that should be one tight paragraph, and Regression: / Pin the X arm of Y: task-narration leads on tests whose names already say what they cover.
  • Keep the test suite lean over rigidly mirrored. Where the test reviewer flagged categorical sections in markdown/render.rs (Paragraphs, Headings, Lists, ...) or missing wrap_line_ prefixes in single-function modules, the existing form is more useful and stays. Where a divider was actively misleading (agent_turn covering only fixtures, paused_counter_* sitting under update_layout, jump_overlay_label interleaved into draw_frame tests), the section is renamed or moved.

Changes

File Description
agent.rs agent_turn takes max_tool_rounds: Option<u32> (None = unbounded, Some(n) = bail after n rounds). Module doc, loop, and bail message updated. Replaces the 25-round const-test with paired with_some_cap_bails_* and with_none_cap_runs_unbounded_* tests.
agent.rs, agent/compact_boundary.rs, agent/event.rs, agent/compaction.rs, session/title_generator.rs, main.rs New AgentSink::emit. Auto-compaction failures, breaker trip, and SessionRolled route through it as user-visible signals. Comment sweep: drop ; joiners that read as run-on sentences, tighten verbose docstrings, drop one task-narration lead.
config.rs Threshold floor and ceiling now both clamp silently. threshold_from_tokens returns u32. Anchor test pins the 200K / 1M window thresholds through default_auto_threshold. display_auto_compaction reads "at {n} tokens" instead of "on at {n} tokens". New max_tool_rounds: Option<u32> field on Config and ConfigSnapshot, loaded from [client] max_tool_rounds or OX_MAX_TOOL_ROUNDS. New display_max_tool_rounds helper for /config.
config/file.rs ClientConfig carries max_tool_rounds: Option<u32>. Merge and parser tests extended.
client/anthropic.rs stream_message doc dropped its multi-bullet "Wire shape" form for a tighter contract paragraph. set_model tests reuse the new test_thresholds constants. New Client::max_tool_rounds() getter.
main.rs Three agent_turn call sites (AgentLoopTask::handle_submit_prompt, bare_repl, headless) pass client.max_tool_rounds(). The remaining inline if let Err = sink.send / tracing::error! on SessionRolled folds into sink.emit.
slash/config.rs /config modal adds a "Max Tool Rounds" row showing unbounded or the configured value. Height anchor updated. Test assertions match the new "at {n} tokens" phrasing.
tui/app.rs Jump-to-bottom overlay paints inside a sized pill (Block + 1-cell padding) so it no longer bleeds through chat. render_app / rendered_text / long_chat_block hoisted to top-level fixtures since several sections use them. jump_overlay_label section moved out of the draw_frame interleave. paint_below_starters_* and draw_frame_streaming_* drop their task-narration leads.
tui/components/chat.rs Add the missing // ── bump_paused_counter ── test divider. Rename paused_counter_saturates to paused_counter_does_not_overflow_at_u32_max.
tui/components/welcome.rs, tui/theme/loader.rs Drop test-body narration leads.
slash/model.rs Helper order now follows resolve_base's call sequence (is_dated_model_idhas_dated_suffixis_selectable_known_idcandidates → listing).
slash/matcher.rs rank_by_prefix moved after the best_match helper cluster so the cluster stays contiguous.
slash/registry.rs Rename empty_metadata_offenders_flags_a_synthetic_violator to a scenario-keyed name.
slash/resume.rs Rename ctrl_d_pushes_* to ctrl_d_or_delete_pushes_* since the body covers both; drop the now-redundant comment. Tighten the reload_* comment to the WHY.
slash/status.rs Test assertions match the new "at {n} tokens" phrasing.
session/state.rs Split commit_compact_* tests out of the compact_entries section.
docs/guide/configuration.md Threshold guide now describes the silent clamping behavior on both ends. New max_tool_rounds row in the [client] table, env-var row, and short prose section explaining the unbounded default and opt-in cap.

Test plan

  • cargo fmt --all --check
  • cargo build
  • cargo clippy --all-targets -- -D warnings: zero warnings
  • cargo test: 2005 tests pass
  • cargo llvm-cov --ignore-filename-regex 'main\.rs': 98.63% line coverage
  • pnpm lint
  • pnpm spellcheck

hakula139 added 14 commits May 13, 2026 10:25
A globally-configured auto_threshold_tokens sized for a 1M-context model
would previously refuse a /model swap to a 200K-context model with
"Config change failed: auto compaction threshold must be at most N
tokens", forcing the user to edit config before every swap.

Mirror the percent path's silent clamp: when an absolute threshold
exceeds the new model's safe trigger, fall back to the model default
rather than bail. The 50K floor still rejects too-small typos.
…iling

The model-cap ceiling now clamps silently (96edff8), but the 50K floor
still bailed with "auto compaction threshold must be at least 50000
tokens" on the token path, and bailed with a derived-tokens context line
on the percent path. Two paths, two policies, opposite resolutions.

Collapse to one shape: out-of-range thresholds snap to the nearest safe
value in either direction. The percent-range check (1-100) stays as a
bail because that's a domain check, not a clamp decision.

Drops three rejecting tests in favor of clamp assertions covering the
same surface, and rewrites the guide so the snap-into-range rule is
described once rather than split across two sentences.
The derived defaults 167_000 (any 200K-window model with max_tokens
saturating the reserve cap) and 967_000 ([1m]-tagged 1M window) were
hardcoded across seven test sites in three files. Tweaking
AUTO_COMPACTION_BUFFER_TOKENS or AUTO_COMPACTION_OUTPUT_RESERVE_CAP
required touching them all.

Introduce a single `test_thresholds` module next to `default_auto_threshold`
with WINDOW_200K and WINDOW_1M, plus one anchor test that derives both
through `default_auto_threshold` so the literals stay honest. Other tests
reference the names; the only literals that survive are the rendered
status-line messages, which now format! the threshold rather than embed
its decimal repr.
…old_floor

threshold_from_tokens and threshold_from_percent both open-coded the
ceiling clamp with .map_or(_, |max| _.min(max)). Lift it into a named
helper so the two paths read as floor(ceil(raw)) — one shape for both
clamps and both inputs.
Half the send sites guarded with `_ = sink.send(X)` (silent on drop),
half with `if let Err(e) = sink.send(X) { tracing::error!(...) }` (logs
on drop). The mix meant Error / TurnComplete / Cancelled events could
vanish without a trace when the TUI channel filled, while
SessionResumed / SessionCompacted were already logged.

Add a default `AgentSink::emit(event, label)` that wraps send-and-log.
Sweep every call site whose loss would visibly desync the UI — turn
state, lifecycle, user-facing errors — onto it. Per-token streaming
(StreamToken, ThinkingToken) keeps raw `send` since dropping one token
is harmless and logging would spam.
display_auto_compaction prefixed the threshold with "on at", producing
"Auto compaction on at 167000 tokens." in status modals and config-
change messages. The leading "on" doubled with the noun ("compaction
on") and the resulting "on at" reads awkwardly.

Drop "on": "at {threshold} tokens" / "off" / "off (no threshold)".
The kv-row label and the format-config sentence both read cleanly.
The overlay rendered as a right-aligned Paragraph spanning the full chat
width with theme.surface() as the widget style. Theme surface() in
mocha (and other built-ins) inherits the terminal bg, so the row
underneath wasn't masked and "4 new messages (ctrl+End) ↓" collided
with the chat text it overlapped, e.g. "crates/oxide-code/sr4 new
messages".

Render the overlay as a pill — narrow Rect sized to label + 1-cell
padding per side, fill it with a Block to guarantee opaque bg, then
paint the spans inside. Chat content to the left of the pill stays
untouched.
…user

When compact_session errored mid-flight, the auto-compaction helper
only `warn!`-logged and returned. The TUI had already shown
Status::Compacting (via AutoCompactionStarted), so the user saw the
spinner clear with no signal that compaction had not actually happened
— the next prompt would clobber their token budget unexpectedly.

After MAX_AUTO_COMPACT_FAILURES failures the early-return path also
disabled auto-compaction silently for the rest of the session.

Emit AgentEvent::Error with the failure detail on summarize-fail. Emit
a single "Auto-compaction disabled for this session after 3 failures."
notice on the bump that trips the breaker (either summarize-fail or
persist-fail; persist already surfaced its own write error). Wrap the
counter-bump in a record_auto_compact_failure helper so both call sites
share the disablement check.
Run the CLAUDE.md prose rules over agent.rs comments:

- Replace inappropriate `;` joiners with periods (or transitions where
  the two clauses are closely linked). Semicolons are now reserved for
  the cases where two independent clauses truly belong as one thought.
- Tighten verbose docstrings (`TurnAbort`, `agent_turn`) to one-paragraph
  contracts instead of multi-paragraph mechanic restatements.
- Add the missing `auto_compact_if_needed` docstring so its three return
  states (compacted / skipped / failed) are explicit.
- Wrap the `#[expect(too_many_arguments)]` reason inside 100 columns.
- Drop the inappropriate em-dash on `inert_user_rx`'s docstring.
Apply the same prose rules (no task-narration leads, fewer awkward
semicolon joiners, contract-first docs) to the files the agent.rs sweep
left untouched: welcome.rs regression lead, the multi-bullet
stream_message wire-shape doc, SessionCompacted / ConfigChanged /
AgentSink trait docs, and the redundant SUMMARY_PREFIX context line.
`paused_counter_saturates` described the mechanism. The scenario is
"does not overflow at u32::MAX", which is what the test actually
verifies. `jump_overlay_label_renders_idle_and_new_content_variants`
bundled the idle and pluralization scenarios; splitting into one idle
test and one pluralization test (two count values for the `s` suffix
toggle) keeps the failure localized without fragmenting the suite.
`resolve_base` calls `is_dated_model_id` first, then
`is_selectable_known_id`, then `candidates`. The helpers now follow that
sequence so a top-down read tracks the resolver's branching, and
`has_dated_suffix` (the only call from `is_dated_model_id`) sits with
its caller.

`empty_metadata_offenders_flags_a_synthetic_violator` named the
mechanism. The scenario it covers is "reports a command with an empty
description", so use that.

In session/state.rs, the three `commit_compact_*` tests were nested
under the `compact_entries` section despite covering a separate
production function. Split them under `// ── commit_compact ──`.
The lead comments on `draw_frame_streaming_shows_spinner_in_status_bar`
and `parse_theme_inline_supports_every_modifier_flag` narrated the test
setup; the snapshot / assertions speak for themselves. The lead on
`reload_sets_load_error_and_clears_rows_when_list_paged_fails` was the
useful half — keep the WHY (footer signal) and drop the mechanism
("Pin the Err arm"). Rename `ctrl_d_pushes_confirm_modal_for_cursor_row`
to `ctrl_d_or_delete_pushes_...` so the name matches the test body, and
drop the comment that compensated for the gap.

In agent.rs, the `// ── agent_turn ──` test-section divider covered
only the shared fixtures (FakeClient, EchoTool, helpers). Rename it to
`// ── Fixtures ──` and add a real `// ── agent_turn ──` divider above
the first `agent_turn_*` test.
In tui/app.rs the `render_app` / `rendered_text` / `long_chat_block`
fixtures lived under `// ── draw_frame ──` but were used by tests in
several other sections, and they sat between the divider and the actual
`draw_frame_*` tests. Hoist them into the top-level fixtures block at
the head of the test module. The `jump_overlay_label` section was
interleaved between draw_frame fixtures and tests; move it to after the
draw_frame tests so each section reads as one contiguous group.

In tui/components/chat.rs the `paused_counter_*` tests sit under
`update_layout` but cover `bump_paused_counter`. Add the missing
`// ── bump_paused_counter ──` divider.

In agent/event.rs two adjacent `impl StdioSink` blocks held `new` and
`render` with no semantic separation. Merge them.

In slash/matcher.rs the `rank_by_prefix` helper (no internal callers,
exported for sibling slash modules) sat between `matching_alias` and
`on_name` / `on_alias`, splitting the `best_match` helper cluster.
Move it after the cluster so the `best_match` helpers stay contiguous.
@hakula139 hakula139 added the enhancement New feature or request label May 13, 2026
@hakula139 hakula139 self-assigned this May 13, 2026
@hakula139 hakula139 added the enhancement New feature or request label May 13, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 98.85714% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/oxide-code/src/client/anthropic.rs 75.00% 3 Missing ⚠️
crates/oxide-code/src/tui/app.rs 98.57% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@hakula139 hakula139 added refactor Reorganize the code to be cleaner and easier readable documentation Improvements or additions to documentation and removed enhancement New feature or request labels May 13, 2026
hakula139 added 2 commits May 13, 2026 14:44
… unbounded

The hardcoded 25-round cap was tripping legitimate multi-hour agentic
sessions. Default now runs unbounded; users opt into a cap via
`[client] max_tool_rounds = N` or `OX_MAX_TOOL_ROUNDS=N`.
Folds the last inline `if let Err = sink.send()` / `tracing::error!` pattern
into the canonical `emit` wrapper that the surrounding sweep introduced.
@hakula139 hakula139 changed the title refactor: sweep auto-compaction follow-ups and style debt feat: configurable tool-round cap and auto-compaction follow-ups May 13, 2026
@hakula139 hakula139 added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels May 13, 2026
@hakula139 hakula139 added bug Something isn't working refactor Reorganize the code to be cleaner and easier readable and removed refactor Reorganize the code to be cleaner and easier readable labels May 13, 2026
@hakula139 hakula139 merged commit 44529ea into main May 13, 2026
4 checks passed
@hakula139 hakula139 deleted the refactor/sweep branch May 13, 2026 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request refactor Reorganize the code to be cleaner and easier readable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant