feat: configurable tool-round cap and auto-compaction follow-ups by hakula139 · Pull Request #83 · hakula139/oxide-code

hakula139 · 2026-05-13T05:32:37Z

Summary

A follow-up sweep after the auto-compaction PR (#82). Fixes the model-swap clamp bug, surfaces auto-compaction failures to the user, tightens the jump-to-bottom overlay so it stops bleeding into chat, removes the hardcoded 25-round tool-loop cap in favor of a configurable opt-in, and rolls in a broader style pass over comments, test names, and section organization.

Design decisions

Clamp the auto-compaction threshold silently on both ends. A user-configured threshold above the active model's safe trigger now snaps down instead of erroring on swap, mirroring the existing snap-up at the 50K floor. The percent path already worked this way, so the two arms are now symmetric.
Centralize the threshold table on the context window, not the model id. The 200K / 1M window pins live in one test_thresholds constants module shared by config, client, and welcome tests; the anchor test derives them from default_auto_threshold so a change to the reserve cap or buffer fails one assertion instead of drifting silently across files.
Make sink event delivery loggable. New AgentSink::emit logs at error! when the channel rejects a one-shot or user-facing event. _ = sink.send(...) stays only for per-token streaming where a dropped event is harmless. Auto-compaction failures, the breaker trip, and the SessionRolled notification all route through the canonical wrapper as user-visible Error events or one-shot signals.
Render the jump-to-bottom overlay inside a sized pill. The transparent right-aligned Paragraph form was leaking chat text through. A Block-filled Rect with 1-cell padding gives the overlay an opaque surface bg, matching how the rest of the chrome paints.
Default the per-turn tool-round cap to unbounded. The hardcoded MAX_TOOL_ROUNDS = 25 was tripping legitimate multi-hour agentic sessions. The new [client] max_tool_rounds (or OX_MAX_TOOL_ROUNDS) is Option<u32>: None runs unbounded; Some(N) bails the turn after N rounds with the existing runaway-loop error. The cap stays available as a guard against tools stuck in a retry loop, but normal agent behavior no longer needs to fit inside an arbitrary 25-round budget.
Drop the antithesis / semicolon over-use in prose comments. Sweep agent.rs and peers for ; joiners that read as run-on sentences, "X, not Y" framing that negates a strawman, multi-paragraph contracts that should be one tight paragraph, and Regression: / Pin the X arm of Y: task-narration leads on tests whose names already say what they cover.
Keep the test suite lean over rigidly mirrored. Where the test reviewer flagged categorical sections in markdown/render.rs (Paragraphs, Headings, Lists, ...) or missing wrap_line_ prefixes in single-function modules, the existing form is more useful and stays. Where a divider was actively misleading (agent_turn covering only fixtures, paused_counter_* sitting under update_layout, jump_overlay_label interleaved into draw_frame tests), the section is renamed or moved.

Changes

File	Description
`agent.rs`	`agent_turn` takes `max_tool_rounds: Option<u32>` (`None` = unbounded, `Some(n)` = bail after `n` rounds). Module doc, loop, and bail message updated. Replaces the 25-round const-test with paired `with_some_cap_bails_` and `with_none_cap_runs_unbounded_` tests.
`agent.rs`, `agent/compact_boundary.rs`, `agent/event.rs`, `agent/compaction.rs`, `session/title_generator.rs`, `main.rs`	New `AgentSink::emit`. Auto-compaction failures, breaker trip, and `SessionRolled` route through it as user-visible signals. Comment sweep: drop `;` joiners that read as run-on sentences, tighten verbose docstrings, drop one task-narration lead.
`config.rs`	Threshold floor and ceiling now both clamp silently. `threshold_from_tokens` returns `u32`. Anchor test pins the 200K / 1M window thresholds through `default_auto_threshold`. `display_auto_compaction` reads `"at {n} tokens"` instead of `"on at {n} tokens"`. New `max_tool_rounds: Option<u32>` field on `Config` and `ConfigSnapshot`, loaded from `[client] max_tool_rounds` or `OX_MAX_TOOL_ROUNDS`. New `display_max_tool_rounds` helper for `/config`.
`config/file.rs`	`ClientConfig` carries `max_tool_rounds: Option<u32>`. Merge and parser tests extended.
`client/anthropic.rs`	`stream_message` doc dropped its multi-bullet "Wire shape" form for a tighter contract paragraph. `set_model` tests reuse the new `test_thresholds` constants. New `Client::max_tool_rounds()` getter.
`main.rs`	Three `agent_turn` call sites (`AgentLoopTask::handle_submit_prompt`, `bare_repl`, `headless`) pass `client.max_tool_rounds()`. The remaining inline `if let Err = sink.send` / `tracing::error!` on `SessionRolled` folds into `sink.emit`.
`slash/config.rs`	`/config` modal adds a "Max Tool Rounds" row showing `unbounded` or the configured value. Height anchor updated. Test assertions match the new `"at {n} tokens"` phrasing.
`tui/app.rs`	Jump-to-bottom overlay paints inside a sized pill (`Block` + 1-cell padding) so it no longer bleeds through chat. `render_app` / `rendered_text` / `long_chat_block` hoisted to top-level fixtures since several sections use them. `jump_overlay_label` section moved out of the `draw_frame` interleave. `paint_below_starters_` and `draw_frame_streaming_` drop their task-narration leads.
`tui/components/chat.rs`	Add the missing `// ── bump_paused_counter ──` test divider. Rename `paused_counter_saturates` to `paused_counter_does_not_overflow_at_u32_max`.
`tui/components/welcome.rs`, `tui/theme/loader.rs`	Drop test-body narration leads.
`slash/model.rs`	Helper order now follows `resolve_base`'s call sequence (`is_dated_model_id` → `has_dated_suffix` → `is_selectable_known_id` → `candidates` → listing).
`slash/matcher.rs`	`rank_by_prefix` moved after the `best_match` helper cluster so the cluster stays contiguous.
`slash/registry.rs`	Rename `empty_metadata_offenders_flags_a_synthetic_violator` to a scenario-keyed name.
`slash/resume.rs`	Rename `ctrl_d_pushes_` to `ctrl_d_or_delete_pushes_` since the body covers both; drop the now-redundant comment. Tighten the `reload_*` comment to the WHY.
`slash/status.rs`	Test assertions match the new `"at {n} tokens"` phrasing.
`session/state.rs`	Split `commit_compact_*` tests out of the `compact_entries` section.
`docs/guide/configuration.md`	Threshold guide now describes the silent clamping behavior on both ends. New `max_tool_rounds` row in the `[client]` table, env-var row, and short prose section explaining the unbounded default and opt-in cap.

Test plan

cargo fmt --all --check
cargo build
cargo clippy --all-targets -- -D warnings: zero warnings
cargo test: 2005 tests pass
cargo llvm-cov --ignore-filename-regex 'main\.rs': 98.63% line coverage
pnpm lint
pnpm spellcheck

A globally-configured auto_threshold_tokens sized for a 1M-context model would previously refuse a /model swap to a 200K-context model with "Config change failed: auto compaction threshold must be at most N tokens", forcing the user to edit config before every swap. Mirror the percent path's silent clamp: when an absolute threshold exceeds the new model's safe trigger, fall back to the model default rather than bail. The 50K floor still rejects too-small typos.

…iling The model-cap ceiling now clamps silently (96edff8), but the 50K floor still bailed with "auto compaction threshold must be at least 50000 tokens" on the token path, and bailed with a derived-tokens context line on the percent path. Two paths, two policies, opposite resolutions. Collapse to one shape: out-of-range thresholds snap to the nearest safe value in either direction. The percent-range check (1-100) stays as a bail because that's a domain check, not a clamp decision. Drops three rejecting tests in favor of clamp assertions covering the same surface, and rewrites the guide so the snap-into-range rule is described once rather than split across two sentences.

The derived defaults 167_000 (any 200K-window model with max_tokens saturating the reserve cap) and 967_000 ([1m]-tagged 1M window) were hardcoded across seven test sites in three files. Tweaking AUTO_COMPACTION_BUFFER_TOKENS or AUTO_COMPACTION_OUTPUT_RESERVE_CAP required touching them all. Introduce a single `test_thresholds` module next to `default_auto_threshold` with WINDOW_200K and WINDOW_1M, plus one anchor test that derives both through `default_auto_threshold` so the literals stay honest. Other tests reference the names; the only literals that survive are the rendered status-line messages, which now format! the threshold rather than embed its decimal repr.

…old_floor threshold_from_tokens and threshold_from_percent both open-coded the ceiling clamp with .map_or(_, |max| _.min(max)). Lift it into a named helper so the two paths read as floor(ceil(raw)) — one shape for both clamps and both inputs.

Half the send sites guarded with `_ = sink.send(X)` (silent on drop), half with `if let Err(e) = sink.send(X) { tracing::error!(...) }` (logs on drop). The mix meant Error / TurnComplete / Cancelled events could vanish without a trace when the TUI channel filled, while SessionResumed / SessionCompacted were already logged. Add a default `AgentSink::emit(event, label)` that wraps send-and-log. Sweep every call site whose loss would visibly desync the UI — turn state, lifecycle, user-facing errors — onto it. Per-token streaming (StreamToken, ThinkingToken) keeps raw `send` since dropping one token is harmless and logging would spam.

display_auto_compaction prefixed the threshold with "on at", producing "Auto compaction on at 167000 tokens." in status modals and config- change messages. The leading "on" doubled with the noun ("compaction on") and the resulting "on at" reads awkwardly. Drop "on": "at {threshold} tokens" / "off" / "off (no threshold)". The kv-row label and the format-config sentence both read cleanly.

The overlay rendered as a right-aligned Paragraph spanning the full chat width with theme.surface() as the widget style. Theme surface() in mocha (and other built-ins) inherits the terminal bg, so the row underneath wasn't masked and "4 new messages (ctrl+End) ↓" collided with the chat text it overlapped, e.g. "crates/oxide-code/sr4 new messages". Render the overlay as a pill — narrow Rect sized to label + 1-cell padding per side, fill it with a Block to guarantee opaque bg, then paint the spans inside. Chat content to the left of the pill stays untouched.

…user When compact_session errored mid-flight, the auto-compaction helper only `warn!`-logged and returned. The TUI had already shown Status::Compacting (via AutoCompactionStarted), so the user saw the spinner clear with no signal that compaction had not actually happened — the next prompt would clobber their token budget unexpectedly. After MAX_AUTO_COMPACT_FAILURES failures the early-return path also disabled auto-compaction silently for the rest of the session. Emit AgentEvent::Error with the failure detail on summarize-fail. Emit a single "Auto-compaction disabled for this session after 3 failures." notice on the bump that trips the breaker (either summarize-fail or persist-fail; persist already surfaced its own write error). Wrap the counter-bump in a record_auto_compact_failure helper so both call sites share the disablement check.

Run the CLAUDE.md prose rules over agent.rs comments: - Replace inappropriate `;` joiners with periods (or transitions where the two clauses are closely linked). Semicolons are now reserved for the cases where two independent clauses truly belong as one thought. - Tighten verbose docstrings (`TurnAbort`, `agent_turn`) to one-paragraph contracts instead of multi-paragraph mechanic restatements. - Add the missing `auto_compact_if_needed` docstring so its three return states (compacted / skipped / failed) are explicit. - Wrap the `#[expect(too_many_arguments)]` reason inside 100 columns. - Drop the inappropriate em-dash on `inert_user_rx`'s docstring.

Apply the same prose rules (no task-narration leads, fewer awkward semicolon joiners, contract-first docs) to the files the agent.rs sweep left untouched: welcome.rs regression lead, the multi-bullet stream_message wire-shape doc, SessionCompacted / ConfigChanged / AgentSink trait docs, and the redundant SUMMARY_PREFIX context line.

`paused_counter_saturates` described the mechanism. The scenario is "does not overflow at u32::MAX", which is what the test actually verifies. `jump_overlay_label_renders_idle_and_new_content_variants` bundled the idle and pluralization scenarios; splitting into one idle test and one pluralization test (two count values for the `s` suffix toggle) keeps the failure localized without fragmenting the suite.

`resolve_base` calls `is_dated_model_id` first, then `is_selectable_known_id`, then `candidates`. The helpers now follow that sequence so a top-down read tracks the resolver's branching, and `has_dated_suffix` (the only call from `is_dated_model_id`) sits with its caller. `empty_metadata_offenders_flags_a_synthetic_violator` named the mechanism. The scenario it covers is "reports a command with an empty description", so use that. In session/state.rs, the three `commit_compact_*` tests were nested under the `compact_entries` section despite covering a separate production function. Split them under `// ── commit_compact ──`.

The lead comments on `draw_frame_streaming_shows_spinner_in_status_bar` and `parse_theme_inline_supports_every_modifier_flag` narrated the test setup; the snapshot / assertions speak for themselves. The lead on `reload_sets_load_error_and_clears_rows_when_list_paged_fails` was the useful half — keep the WHY (footer signal) and drop the mechanism ("Pin the Err arm"). Rename `ctrl_d_pushes_confirm_modal_for_cursor_row` to `ctrl_d_or_delete_pushes_...` so the name matches the test body, and drop the comment that compensated for the gap. In agent.rs, the `// ── agent_turn ──` test-section divider covered only the shared fixtures (FakeClient, EchoTool, helpers). Rename it to `// ── Fixtures ──` and add a real `// ── agent_turn ──` divider above the first `agent_turn_*` test.

In tui/app.rs the `render_app` / `rendered_text` / `long_chat_block` fixtures lived under `// ── draw_frame ──` but were used by tests in several other sections, and they sat between the divider and the actual `draw_frame_*` tests. Hoist them into the top-level fixtures block at the head of the test module. The `jump_overlay_label` section was interleaved between draw_frame fixtures and tests; move it to after the draw_frame tests so each section reads as one contiguous group. In tui/components/chat.rs the `paused_counter_*` tests sit under `update_layout` but cover `bump_paused_counter`. Add the missing `// ── bump_paused_counter ──` divider. In agent/event.rs two adjacent `impl StdioSink` blocks held `new` and `render` with no semantic separation. Merge them. In slash/matcher.rs the `rank_by_prefix` helper (no internal callers, exported for sibling slash modules) sat between `matching_alias` and `on_name` / `on_alias`, splitting the `best_match` helper cluster. Move it after the cluster so the `best_match` helpers stay contiguous.

codecov · 2026-05-13T05:33:51Z

Codecov Report

❌ Patch coverage is 98.85714% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/oxide-code/src/client/anthropic.rs	75.00%	3 Missing ⚠️
crates/oxide-code/src/tui/app.rs	98.57%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

… unbounded The hardcoded 25-round cap was tripping legitimate multi-hour agentic sessions. Default now runs unbounded; users opt into a cap via `[client] max_tool_rounds = N` or `OX_MAX_TOOL_ROUNDS=N`.

Folds the last inline `if let Err = sink.send()` / `tracing::error!` pattern into the canonical `emit` wrapper that the surrounding sweep introduced.

hakula139 added 14 commits May 13, 2026 10:25

hakula139 added the enhancement New feature or request label May 13, 2026

hakula139 self-assigned this May 13, 2026

hakula139 added the enhancement New feature or request label May 13, 2026

hakula139 added refactor Reorganize the code to be cleaner and easier readable documentation Improvements or additions to documentation and removed enhancement New feature or request labels May 13, 2026

hakula139 added 2 commits May 13, 2026 14:44

feat(agent): make per-turn tool-round cap configurable and default to…

1308cca

… unbounded The hardcoded 25-round cap was tripping legitimate multi-hour agentic sessions. Default now runs unbounded; users opt into a cap via `[client] max_tool_rounds = N` or `OX_MAX_TOOL_ROUNDS=N`.

refactor(main): route SessionRolled through AgentSink::emit

97dd867

Folds the last inline `if let Err = sink.send()` / `tracing::error!` pattern into the canonical `emit` wrapper that the surrounding sweep introduced.

hakula139 changed the title ~~refactor: sweep auto-compaction follow-ups and style debt~~ feat: configurable tool-round cap and auto-compaction follow-ups May 13, 2026

hakula139 added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels May 13, 2026

hakula139 added bug Something isn't working refactor Reorganize the code to be cleaner and easier readable and removed refactor Reorganize the code to be cleaner and easier readable labels May 13, 2026

hakula139 merged commit 44529ea into main May 13, 2026
4 checks passed

hakula139 deleted the refactor/sweep branch May 13, 2026 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: configurable tool-round cap and auto-compaction follow-ups#83

feat: configurable tool-round cap and auto-compaction follow-ups#83
hakula139 merged 16 commits into
mainfrom
refactor/sweep

hakula139 commented May 13, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hakula139 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Changes

Test plan

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hakula139 commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading