diff --git a/docs/superpowers/specs/2026-06-16-first-mate-design.md b/docs/superpowers/specs/2026-06-16-first-mate-design.md index 70be744..f8f9dd3 100644 --- a/docs/superpowers/specs/2026-06-16-first-mate-design.md +++ b/docs/superpowers/specs/2026-06-16-first-mate-design.md @@ -93,7 +93,11 @@ Claude). So the supervisor is net-new code, specified here: `exit`, crash), respawn and re-mark. The marker prevents double-spawn and tells the heartbeat which `%N` to push to. (Stored alongside the captain's log — same table tenancy, same boot-read.) -- **Home.** The `bridge` tmux session (already created as the interim home). +- **Home.** A `bridge` tmux session. The supervisor **ensures the session + exists** (`has-session` → `new-session` else `new-window`, the + `_do_spawn_claude_tool` branch shape) and opens a single `first-mate` window + in it — it must NOT assume a pre-existing session (a fresh prod boot has + none; any hand-made `bridge` session is incidental). ### Per-pane tool visibility (known gap, deferred) diff --git a/docs/superpowers/specs/2026-06-16-first-mate-v1b-structure.md b/docs/superpowers/specs/2026-06-16-first-mate-v1b-structure.md new file mode 100644 index 0000000..23a461b --- /dev/null +++ b/docs/superpowers/specs/2026-06-16-first-mate-v1b-structure.md @@ -0,0 +1,586 @@ +# First Mate — v1b (Live integration) — code-structure proposal + +**Spec:** `docs/superpowers/specs/2026-06-16-first-mate-design.md` (v1 → v1b) +**Continuity:** `docs/superpowers/specs/2026-06-16-first-mate-v1a-structure.md` +**Scope:** v1b only — wire the inert v1a substrate into a live, prod-gated, +budget-spending first-mate pane. Supervisor + system prompt + heartbeat + +adapter + interrupt hooks. No conn, no spawn/terminate-by-first-mate (v2). +**Date:** 2026-06-16 + +v1a already landed on this branch: `first_mate.py` (`PaneDigest`/`FleetDigest`, +`build_fleet_digest`, `fleet_diverged`), `activity.py` storage +(`get/set/clear_first_mate`, `append/recent_captain_log`, the `captain_log` + +`first_mate` tables), and the two captain's-log MCP tools in `channels.py`. This +proposal builds the IO half that calls all of it. + +--- + +## 1. Spec pushback + +Three structural assumptions in the spec/inputs I disagree with or need to +correct against the real code: + +- **"The `bridge` tmux session (already created as the interim home)" — it does + not exist.** `grep -rn bridge periscope/` finds nothing that creates a + `bridge` session; the only matches are unrelated ("anyio-bridged", "LGTM + bridge"). The supervisor must **create the `bridge` session on first spawn** + (the `new-session -d -s bridge` path), not assume it. This is a one-line + correction but it changes the supervisor's spawn to "ensure-session-then- + window," exactly the shape `_do_spawn_claude_tool` already branches on + (`has-session` → `new-session` vs `new-window`, channels.py:478-490). Treated + as fact in §3/§4; flagged here because the spec asserts otherwise. + +- **The spawn must use `config.claude_exec()`, NOT the `CLAUDE_EXEC` constant.** + `_do_spawn_claude_tool` reaches for the bare constant (`from periscope.config + import CLAUDE_EXEC`, channels.py:505) — which means its spawn path is **not + stubable** via the `PERISCOPE_CLAUDE_EXEC` test seam. `_layout_two_window` + does it right: `from periscope.config import claude_exec; exec_cmd = + claude_exec()` (worktree_spawn.py). The supervisor's whole automated test + story (`tmux_test_server` sets `PERISCOPE_CLAUDE_EXEC=cat`) **depends on going + through `claude_exec()`**. So I do not "reuse `_do_spawn_claude_tool`'s shape" + literally — I reuse `_layout_two_window`'s exec resolution and consent + handling. Load-bearing; see §4 supervisor. + +- **The supervisor cannot call `_layout_two_window` directly — it raises + `HTTPException`.** `worktree_spawn._layout_two_window` is "deliberately coupled + to FastAPI" (its docstring) and `raise HTTPException(500, ...)` on tmux + failure. A lifespan task is not in a request; an `HTTPException` there is just + a confusing 500-shaped exception with no handler. The supervisor needs its own + small single-window spawn primitive (one Claude window in `bridge`, not the + two-window claude+shell layout), so it is **new code that borrows the + send-keys + `claude_exec()` + `dismiss_dev_channels_consent_bg` + + `stamp_new_window` sequence**, not a call into the HTTP-coupled layout + helper. This is also right on the merits: the first mate is one pane, it has + no "shell" sibling. + +Nothing else in the v1b spec violates the taste rules — it explicitly mirrors +the narrator's pure-core-plus-worker-tick shape, which is the right call. + +--- + +## 2. Assumptions + +- **The first-mate window is single-window, named `first-mate`, in session + `bridge`.** Spec says "the `bridge` tmux session"; I name the window + `first-mate` (matching the spec's `bridge:first-mate` demo reference). The + supervisor's liveness check keys on the marker's stored `pane_id` (`%N`), not + the window name — so a user-rename of the window doesn't trip a respawn. + +- **`--append-system-prompt` is appended to the `claude_exec()` string, with the + role text passed as a single shell-quoted argument.** The spec names this as + "the one new flag on the launch string." The supervisor builds + `f"{claude_exec()} --append-system-prompt {shlex.quote(ROLE_PROMPT)}"` and + sends it via `send-keys ... Enter` — same delivery channel as every other + spawn. (The role text is multi-paragraph; `shlex.quote` keeps it one arg + through the shell that `send-keys` types into.) + +- **Liveness = "the marker's `pane_id` still appears in `list_windows()`."** + `panes.list_windows()` is already imported by `activity.py` and enumerates + live tmux windows with their `pane_id`. A marked pane whose `%N` is absent + from the live set is dead → respawn. This reuses the read the worker already + does each tick; no new tmux call shape. + +- **The supervisor runs on the worker's cadence, not a private clock.** Rather + than a second lifespan task with its own `asyncio.sleep`, the cheapest correct + thing is to fold the supervisor pass into the **start** of the existing + prod-gated worker tick (it already wakes every 30s; the first mate doesn't + need sub-30s respawn latency). This also means one prod gate (the worker's), + not two. See §4 + Decision 1 — this is a genuine close call and I commit to + the in-worker pass, with the standalone-task alternative flagged. + +- **Last-sent digest lives in a module global in the heartbeat module**, exactly + like narrator's `_enabled_checked` module global and `last_ctx` threaded + through `run_worker`. Divergence is recomputed against it each tick; on process + restart it resets to `None` → first tick re-pushes the full picture (correct: + a restarted first mate needs the current picture anyway). No DB row for it — + it is ephemeral by design. + +- **`build_window_view` is assembled by the worker and passed in.** v1a's + `build_fleet_digest` is pure-over-assembled-dicts. The worker tick already has + `list_windows()`; the adapter (below) is what turns those + `pane_status_lines()` + into the curated contract. `build_window_view` lives in `window_view.py`, + which `activity.py` must **not** top-import (cycle discipline, activity.py:8-10) + — so the adapter/heartbeat call site uses a **function-level import**, the same + escape hatch `_worker_tick` already uses for `narrator` (activity.py:739-740). + +- **`status_line` comes from `activity.pane_status_lines()`, not the view.** + Confirmed: `build_window_view` has no `status_line` key; the narrator's status + is in the `pane_status` table, read in bulk via `pane_status_lines() -> + {pane_id: (status, generated_at, rail)}`. The adapter joins the two on + `pane_id`. + +- **`blocked` = the pane's most-recent `channel_alerts` entry has + `kind == "need_human"`.** `build_window_view` carries `channel_alerts` (a list + of `{id,message,kind,severity,ts}` dicts, channels.py:174-180). `blocked` is + derived: the newest-`ts` alert's `kind` is `need_human`. (v1a already assumed + this; the adapter is where it's computed from the real list.) + +- **`idle_s` = `now - max(focused_at, acted_at)`.** Both keys are present in the + view (`focused_at` explicit, `acted_at` explicit). Idle is time since the pane + was last focused or acted on, floored at 0. + +--- + +## 3. File layout + +``` +periscope/ + first_mate.py EDIT add the IO half below the existing pure core. + NEW: window_views_to_digest() adapter (pure-ish: + dicts in, FleetDigest out — joins pane_status_lines); + ROLE_PROMPT constant (the system prompt); + supervisor_pass() (liveness + respawn); + heartbeat_pass() (assemble→diverge→push); + _spawn_first_mate() (single-window bridge spawn); + module global _LAST_SENT: FleetDigest | None. + Pure core (build_fleet_digest/fleet_diverged) stays + untouched and import-light; IO half lazy-imports the + heavy modules inside the functions. + + activity.py EDIT _worker_tick: after the narrator pass, call + first_mate.supervisor_pass() then + first_mate.heartbeat_pass(panes-or-views). + Function-level import (cycle discipline), same as + the narrator import already there. + + channels.py EDIT _do_notify_tool: when kind == "need_human", fire an + immediate emit_channel_event to the first-mate pane + (the interrupt hook). Add the deferred + _do_fleet_digest_tool (the on-demand pull) — now it + has a cached digest to return (first_mate._LAST_SENT + or a fresh assemble). + + app.py NO CHANGE the supervisor rides the already-prod-gated + run_worker; no new lifespan task (see Decision 1). + If Decision 1 flips to a standalone task, this gets + one is_prod()-gated _task("first-mate-supervisor", …). + +tests/ + test_first_mate.py EDIT add adapter mapping tests (literal view dicts + + pane_status_lines → assert FleetDigest fields); + heartbeat divergence/no-push/not-attached-fallback + with a fake emit; supervisor decision (marker + alive→noop, dead→respawn) with list_windows faked. + test_first_mate_spawn.py NEW @needs_tmux real-tmux integration: supervisor + spawns a first-mate window into a real isolated + bridge session (PERISCOPE_CLAUDE_EXEC=cat), marks it, + kill the pane, second pass respawns + re-marks. + test_channels.py EDIT need_human hook fires emit_channel_event to the + marked first-mate pane; non-need_human kinds don't; + fleet-digest pull tool returns the cached digest / + refuses non-first-mate callers. + test_app.py NO CHANGE run_worker stays mocked (the heartbeat + + supervisor live inside it, so mocking run_worker + already prevents a live first-mate tick — verify the + existing patch covers it; see Test strategy). +``` + +No new route file. v1b adds no HTTP surface — the heartbeat is a worker-internal +push, the pull is an MCP tool, the supervisor is a lifespan-adjacent task. + +--- + +## 4. Per-module structure + +### `periscope/first_mate.py` — IO half added below the pure core (rung 1 functions + one module global) + +Mirrors `narrator.py`: pure decision functions at the top (already there from +v1a), IO shell below, driven by the worker, cross-tick state in a module global. +**No class.** There is no coupled mutable state that a class would encapsulate — +the one piece of cross-tick state (`_LAST_SENT`) is exactly the narrator's +`module-global cache` shape (`narrator._enabled_checked`), and everything else is +a function over passed-in arguments. A `FirstMate` class would be grouping-by-noun; +rung-3 trigger (coupled mutable state + lifecycle) is not met — the lifecycle +lives in tmux + the `first_mate` marker row, not in a Python object. + +**Module global (cross-tick state):** + +```python +_LAST_SENT: FleetDigest | None = None # last digest pushed to the first mate +``` + +**The adapter — the flagged integration risk (rung 1 pure function):** + +```python +def window_views_to_digest( + *, window_views: list[dict], status_lines: dict[str, tuple[str, int, str | None]], + usage: dict | None, now: int, +) -> FleetDigest: + """Curate the read model into the v1a contract, then build_fleet_digest. + PURE: dicts in, FleetDigest out. window_views: build_window_view output + (filtered to is_claude inside). status_lines: activity.pane_status_lines(). + Joins status_line onto each pane by pane_id, derives blocked from the + newest channel_alerts entry, idle_s from focused_at/acted_at.""" +``` + +This function is the **single seam** the plan-review flagged. Keeping it pure +(literal dicts in, `FleetDigest` out, zero imports of store/usage/view) is what +makes it unit-testable with hand-written inputs and is the deliberate move away +from the four-mocks-per-test smell. It does the join and the field derivations +(`handle←pid`, `status_line←status_lines[pid]`, `blocked←newest need_human alert`, +`pr`/`ci`←view, `idle_s←now-max(focused_at,acted_at)`) and then delegates to the +v1a `build_fleet_digest`. Per-pane curation is the spec's "Periscope does the +aggregation pass." Keyword-only per the multi-arg rule. + +**The system prompt (a module constant):** + +```python +ROLE_PROMPT = """...""" # role + standing-tier + absolute prohibitions +``` + +A constant, not a file. It is ~40-60 lines of prose, drafted from the spec's +"Autonomy" + "Never" sections (role: chief-of-staff watcher; standing authority: +observe/summarize/peek/clearly-idle-nudge; prohibitions: no fdy-merge, no +force-push, no prod). A constant keeps it versioned in the diff and greppable; +a separate file buys nothing at this size and adds a read-at-spawn IO path. If +it grows past ~150 lines or wants non-engineer editing, promote to a file then — +not now (YAGNI). + +**The supervisor (rung 1 functions, IO):** + +```python +def supervisor_pass(*, now: int) -> None: + """One liveness check. If the first_mate marker is missing or its pane_id + is no longer a live window, spawn a fresh first-mate window and re-mark. + Idempotent: a live marked pane is a no-op.""" + +def _spawn_first_mate(*, now: int) -> None: + """Ensure the `bridge` session, open a single `first-mate` window running + claude_exec() + --append-system-prompt ROLE_PROMPT, stamp it, set the + first_mate marker. Borrows the send-keys + consent-dismiss + stamp sequence + from worktree_spawn._layout_two_window but single-window and no HTTPException.""" +``` + +`supervisor_pass` lazy-imports `activity` (marker read/write) and `panes` +(`list_windows`). `_spawn_first_mate` lazy-imports `tmux`/`_tmux_mutate`, +`config.claude_exec`, `channels.dismiss_dev_channels_consent_bg`, and +`pids.stamp_new_window`. Liveness: + +``` +marker = activity.get_first_mate() +live_panes = {w["pane_id"] for w in list_windows()} +if marker and marker.pane_id in live_panes: + return # alive — no-op (prevents double-spawn) +_spawn_first_mate(now=now) # missing or dead — (re)spawn + re-mark +``` + +The marker is what prevents double-spawn (a live marked pane short-circuits) and +tells the heartbeat which `%N` to push to. `_spawn_first_mate` ends with +`activity.set_first_mate(pane_id=<%N from stamp>, session_id=None, at=now)` — +`session_id` stays `None` in v1b (the `pane_sessions` hook fills the JSONL id on +the first prompt; the marker doesn't need it for push, which keys on `%N`). + +**Spawn sequence (single window, no HTTPException):** + +``` +sess = "bridge" +if not has-session bridge: _tmux_mutate("new-session","-d","-s","bridge","-c",home,"-n","first-mate") +else: _tmux_mutate("new-window","-t","bridge:","-n","first-mate","-c",home) +target = "bridge:first-mate" +exec_cmd = f"{config.claude_exec()} --append-system-prompt {shlex.quote(ROLE_PROMPT)}" +time.sleep(0.1) # let rc finish — same reason as _layout_two_window +_tmux_mutate("send-keys","-t",target,exec_cmd,"Enter") +if "--dangerously-load-development-channels" in exec_cmd: + channels.dismiss_dev_channels_consent_bg(target) +pid = stamp_new_window(target) +pane_id = tmux("display-message","-t",target,"-p","#{pane_id}").strip() +activity.set_first_mate(pane_id=pane_id, session_id=None, at=now) +``` + +**The heartbeat (rung 1 function, IO):** + +```python +async def heartbeat_pass(*, window_views: list[dict], now: int) -> None: + """Assemble the digest, compare to the last sent, push the delta on + divergence to the marked first-mate pane via emit_channel_event. On a + not-attached False, drop and let the next tick re-push (divergence-based, + nothing lost). Also scans for watched-PR CI-red transitions and pushes + those ahead of digest divergence.""" + global _LAST_SENT + ... +``` + +`heartbeat_pass` is `async` because `emit_channel_event` is async +(channels.py:677). It: +1. Lazy-imports `activity.pane_status_lines`, `usage.cached_plan_usage`, + `channels.emit_channel_event`. +2. Reads the marker; if none → return (no first mate to push to; supervisor will + bring one up). +3. `cur = window_views_to_digest(window_views=…, status_lines=…, usage=…, now=now)`. +4. `diverged, reason = fleet_diverged(_LAST_SENT, cur)`. +5. CI-red scan: compares each pane's `ci` against `_LAST_SENT`'s for a + `→ ✗` transition on a watched PR; a red flip forces a push even if the + overall digest didn't otherwise diverge (interrupt tier 2). +6. If diverged or CI-red: `ok = await emit_channel_event(marker.pane_id, + _render_delta(_LAST_SENT, cur, reason), meta={...})`. On `ok` → `_LAST_SENT = + cur`. On `not ok` (pane not attached) → **leave `_LAST_SENT` unchanged** so + the next tick re-pushes the still-diverged picture (the spec's retry-next-tick + fallback; correctness hinges on NOT advancing `_LAST_SENT` on a failed push). + +`_render_delta(prev, cur, reason)` is a small **pure** function (rung 1) turning +two digests + reason into the human-readable delta string ("since last tick: auth +pane went blocked; budget 62%→71%") — unit-testable, no IO. This is the v1b piece +the v1a structure explicitly deferred ("full delta prose isn't needed until +there's a pane to push to"). + +### `periscope/activity.py` — `_worker_tick` extension (one block, function-level import) + +The worker already assembles the Claude `panes` list and runs the narrator. v1b +appends two calls after the narrator pass, mirroring its function-level-import +discipline exactly: + +```python +# after narrator.tick(panes), inside _worker_tick: +try: + from periscope import first_mate # function-level: first_mate's IO + first_mate.supervisor_pass(now=now) # half lazy-imports back into the + # worker's deps — cycle-safe + # build the curated views for the heartbeat. The worker captured `panes` + # (window dict + parsed); the heartbeat needs build_window_view output. + from periscope.window_view import build_window_view + views = [build_window_view(w, now)[0] for w, _ in panes] + asyncio.run(first_mate.heartbeat_pass(window_views=views, now=now)) +except Exception: + log.exception("first-mate pass failed") +``` + +**Open structural question (Decision 2):** `_worker_tick` runs in a worker +*thread* (`asyncio.to_thread(_worker_tick, …)`), so it has no running event +loop; calling the async `heartbeat_pass` needs `asyncio.run(...)` (a fresh loop +per tick) or `heartbeat_pass` must be made sync with a sync `emit`. I commit to +making **`heartbeat_pass` accept being called via `asyncio.run`** inside the +thread (one short-lived loop per 30s tick is cheap and isolated), and flag the +alternative (push the heartbeat call back up into `run_worker`'s real loop where +`await` is natural) in Decision 2. The narrator is fully sync, which is why this +question is new to v1b. + +Rationale: `activity.py` stays the worker host; it gains ~8 lines and one more +function-level import, consistent with how it already calls the narrator. No new +module-level imports → no new cycle. `build_window_view` is imported +function-level (it lives in `window_view.py`, off-limits to top-import). + +### `periscope/channels.py` — interrupt hook + the deferred pull tool (rung 1) + +**`need_human` interrupt hook** at the `_do_notify_tool` write point +(channels.py:181-193). After the alert is appended and the durable `activity.record` +mirror, add: + +```python +if kind == "need_human": + # immediate first-mate wake, out of band from the 30s heartbeat + from periscope import activity + marker = activity.get_first_mate() + if marker is not None: + meta = {"kind": "interrupt", "source_pane": pane} + _schedule_emit(marker.pane_id, f"need_human from {pane}: {message}", meta) +``` + +`_do_notify_tool` is **sync** but `emit_channel_event` is **async** — the hook +needs to schedule the coroutine without blocking the tool handler. `_do_notify_tool` +runs inside the MCP server's anyio task (there *is* a loop here, unlike the +worker thread), so `_schedule_emit` is `asyncio.create_task(emit_channel_event( +…))` wrapped via the project's `_task` crash-wrapper (CLAUDE.md invariant 8: +naked tasks that raise vanish). This is the one place the hook differs from the +worker's `asyncio.run` — flagged in Decision 2. + +**The deferred fleet-digest pull tool** (`_do_fleet_digest_tool`, the third +first-mate tool v1a deferred). Now there's a cached digest to return: + +```python +def _do_fleet_digest_tool(pane: str, arguments: dict): + if not _require_first_mate(pane): + return _tool_result({"ok": False, "error": "first-mate-only tool"}) + from periscope import first_mate + d = first_mate._LAST_SENT + return _tool_result({"ok": True, "digest": _serialize_digest(d)} if d + else {"ok": True, "digest": None}) +``` + +Registered in `_CHANNEL_TOOLS` alongside the two captain's-log tools, same +self-guard pattern. Reading `first_mate._LAST_SENT` (the last *pushed* digest) is +correct for an on-demand pull — the first mate asks "what's the current fleet +picture?" and gets the same digest the heartbeat would have pushed. + +--- + +## 5. Patterns + +**Used:** +- **Pure adapter + pure delta-render, IO heartbeat/supervisor shell** — mirrors + narrator's pure-core / `tick`-`_generate` split. The spec names this mirror. +- **Module-global cross-tick state** (`_LAST_SENT`) — the narrator's + `_enabled_checked` / `run_worker`'s `last_ctx` shape. Ephemeral, resets on + restart by design. +- **Function-level imports for cycle-prone heavy modules** (`window_view`, + `first_mate`, `activity`) — the established escape hatch (activity.py:739-740, + channels.py:207). +- **Functional tool registry record + self-guard** (`_do_fleet_digest_tool` + + `_require_first_mate`) — the v1a-established tool-add pattern. +- **`_task` crash-wrapper for the fire-and-forget emit** in the need_human hook — + CLAUDE.md invariant 8. + +**Considered and rejected:** +- **A `FirstMateSupervisor` class** holding marker + last-sent + spawn config — + rejected. The cross-tick state is one `FleetDigest | None`; the marker lives in + SQLite; the spawn is a function. No coupled mutable state for a class to own + (rung-3 not met). Functions + one module global is the narrator's proven shape. +- **A standalone lifespan supervisor task** (its own `asyncio.sleep` loop in + `app.py`) — rejected for v1b in favor of folding into the worker tick; a second + prod gate and a second 30s clock for a respawn that tolerates 30s latency is + more surface for no benefit. Flagged as the close call in Decision 1 (it's the + spec's literal phrasing, "a lifespan-managed task"). +- **Reusing `_do_spawn_claude_tool` / `_layout_two_window` for the spawn** — + rejected. The former skips the `claude_exec()` test seam (unstubable); the + latter raises `HTTPException` (request-coupled) and builds a two-window layout + the first mate doesn't want. Borrow the *sequence*, not the function. +- **A custom exception for "no first mate marker"** — rejected per the rule. + `heartbeat_pass` simply returns when the marker is absent; the pull tool + returns `digest: None`. No caller needs to catch a typed error. +- **Persisting `_LAST_SENT` to a DB row** — rejected (YAGNI). Divergence is + self-healing across restart: a fresh process re-pushes the current picture on + tick one. A row would add a write per tick and a migration for zero behavioral + gain. + +--- + +## 6. Test strategy (per module) + +The split's whole point: everything that *reasons* (adapter, divergence, delta, +supervisor decision, push fallback) is unit-tested with literal inputs; the one +thing that *spawns* is a real-tmux integration test with a stub exec; the live +Claude + real Haiku heartbeat reasoning is **not** auto-tested and is verified by +the spec's demo. + +**`tests/test_first_mate.py` (additions) — unit, zero live deps.** +- **Adapter mapping** (the flagged risk): hand-built `window_views` dicts (the + curated subset of real `build_window_view` keys: `pid`, `is_claude`, + `focused_at`, `acted_at`, `pr`, `ci`, `channel_alerts`) + a literal + `status_lines` dict + a fake `usage` dict → assert every `PaneDigest` field, + the `status_line` join by `pid`, `blocked` from the newest `need_human` alert, + `idle_s` from `now-max(focused,acted)`, `is_claude` filter, `usage=None → + budget None`. No mocks — the pure adapter takes dicts. *This is the structure + that prevents the mock-heavy smell.* +- **Heartbeat divergence + fallback**: drive `heartbeat_pass` with a fake + `emit_channel_event` (a recording stub) and a faked marker. Assert: diverged + picture → one emit, `_LAST_SENT` advances; identical next tick → no emit; + emit returns `False` (not attached) → `_LAST_SENT` **unchanged** → next tick + re-pushes (the retry-next-tick contract); CI `✓→✗` on a watched PR forces an + emit even when otherwise non-divergent. +- **Supervisor decision**: fake `list_windows()` + the marker. marker present & + pane in live set → `_spawn_first_mate` not called (assert via a spy); marker + missing → spawn called; marker present but pane absent (dead) → spawn called. + `_spawn_first_mate` itself stubbed here (its tmux reality is the integration + test below). +- **`_render_delta`**: pure — two literal digests → assert the delta string + mentions the changed panes/budget. + +**`tests/test_first_mate_spawn.py` (NEW) — `@needs_tmux` integration, real tmux, +stub claude.** Follows `test_worktree_spawn.py` exactly: `tmux_test_server` +fixture (`PERISCOPE_TMUX_SOCKET` isolated `-L`, `PERISCOPE_CLAUDE_EXEC=cat`) + +`fresh_activity_db`. Because `_spawn_first_mate` goes through `config.claude_exec()` +(the pushback in §1), `cat` is what actually launches → the window stays alive. +- supervisor first pass (no marker) → a `bridge:first-mate` window exists, + marker set to its `%N`, stamped with a `@periscope_id`. +- second pass (marker alive) → no new window (idempotent, no double-spawn). +- kill the marked pane → third pass respawns, marker updated to the new `%N`. +- *Real dependency on purpose* — a mocked tmux would pass while the real + new-session/new-window/send-keys/stamp sequence broke (the Q1-2026 + mocked-migration lesson; the supervisor's spawn is exactly the kind of + real-subprocess sequence that must run for real). + +**`tests/test_channels.py` (additions) — unit, real `activity` DB via fixture, +in-memory channel dicts via `reset_channel_state`.** +- need_human hook: with the marker set to `%9` and a fake/recording + `emit_channel_event`, `_do_notify_tool("%5", {kind:"need_human", message:…})` + schedules an emit to `%9`; `kind:"info"`/`"done"` schedule none; no marker → + no emit (and no crash). +- fleet-digest pull tool: registered in `_CHANNEL_TOOLS`; refuses a non-first-mate + caller; with a marked pane and `first_mate._LAST_SENT` set, returns the + serialized digest; with `_LAST_SENT=None` returns `digest: None`. + +**`tests/test_app.py` — unchanged, but verify the invariant holds.** The +supervisor + heartbeat live *inside* `_worker_tick`, which runs inside +`run_worker`, which the lifespan tests already mock (`mocker.patch( +"periscope.activity.run_worker", side_effect=_noop)`, test_app.py:60/116). So the +existing mock already prevents a live first-mate spawn + Haiku push during +pytest. **This must stay true** — if the heartbeat call ever moves out of +`run_worker` into its own lifespan task, that task needs its own mock in +test_app.py. Called out explicitly because it is the CLAUDE.md +"lifespan-tests-mock-run_worker" landmine, now load-bearing for budget safety. + +**Explicitly NOT auto-tested (honesty):** +- A real first-mate Claude actually booting from `claude_exec() + + --append-system-prompt` and obeying the role/prohibitions — that's a live + Claude; verified by the spec's demo, not a test. +- Real Haiku/Claude heartbeat *reasoning* over a pushed delta — same; the test + covers that a push *happens* and *what bytes* are pushed, never what the model + does with them. +- `dismiss_dev_channels_consent_bg` interacting with a real consent prompt — the + stub exec (`cat`) shows no prompt, so the consent-dismiss branch is exercised + for "is it called" but not "does it dismiss a real prompt." + +No testability smells: every reasoning unit is reachable with literal inputs +because the adapter and divergence stay pure and the IO (emit, tmux, marker) is +injected/faked or run for real on an isolated socket. + +--- + +## 7. Decisions to sanity-check + +1. **Supervisor in the worker tick vs. a standalone lifespan task.** I fold + `supervisor_pass` into the start of `_worker_tick` (one prod gate, one 30s + clock, no `app.py` change). *Alternative:* a separate `is_prod()`-gated + `_task("first-mate-supervisor", …)` with its own loop, which is the spec's + literal phrasing ("a lifespan-managed task"). *Close because:* the standalone + task is what the spec wrote and gives independent respawn latency; but the + first mate tolerates 30s respawn latency, and a second clock + second prod + gate is pure surface. If you want sub-30s respawn or want the supervisor to + survive a wedged worker tick, flip to the standalone task — it's a clean swap + (move `supervisor_pass` into its own `async def` loop in `app.py`, gated like + `activity.run_worker`). + +2. **Calling the async heartbeat/emit from sync worker-thread code. RESOLVED + (correctness, not preference): the heartbeat emit MUST hoist to `run_worker`'s + main loop — `asyncio.run` in the worker thread is a cross-loop bug.** + `_worker_tick` runs via `await asyncio.to_thread(_worker_tick, …)` + (activity.py:756) — a worker thread with no event loop. `emit_channel_event` + does `await session._write_stream.send(…)` (channels.py) on an anyio stream + **bound to the main loop** (the MCP listener runs there via + `asyncio.start_unix_server`). `asyncio.run(...)` in the thread spins a *fresh* + loop and would send on a main-loop-bound session from the wrong loop — anyio + streams are loop-affine; this fails/corrupts. So the structure is: + - **`_worker_tick` (thread):** runs `supervisor_pass` (sync tmux — correct in + a thread) and assembles the heartbeat *decision* — build the digest, diverge + vs `_LAST_SENT`, render the delta — and **stashes a pending push** + `(pane_id, content, cur_digest)` into `last_ctx` (e.g. `last_ctx["_fm_push"]`). + No `emit`, no `await`, no `asyncio.run`. + - **`run_worker` (main loop):** after `await asyncio.to_thread(...)`, if a + pending push exists, `ok = await emit_channel_event(pane_id, content)`; set + `first_mate._LAST_SENT = cur_digest` **only on `ok`** (the retry-next-tick + fallback hinges on not advancing `_LAST_SENT` on a failed send). This is the + only `run_worker` change. + The **need_human hook** stays `_task(create_task(emit_channel_event(...)))` — + `_do_notify_tool` already runs in the MCP anyio task on the main loop with the + sessions, so `create_task` is loop-correct there. So `heartbeat_pass` is NOT a + single async function; it splits into a sync `heartbeat_decide(...) -> push|None` + (thread, pure-ish, unit-testable) and the `await emit` in `run_worker`. + Ticks are sequential (`run_worker` awaits the tick, then the emit, then + sleeps), so `_LAST_SENT` is never accessed concurrently. + +3. **`build_window_view` rebuilt in the tick vs. reusing `/api/state` + assembly.** The heartbeat needs `build_window_view` output, which the worker + doesn't currently build (it only `parse_pane`s). I rebuild it in the tick from + the captured `panes`. *Alternative:* read the last `/api/state` snapshot that + the poll route already assembled. *Close because:* reusing the route's + assembly avoids duplicate `build_window_view` calls, but the route builds on + *its* cadence (3s poll) not the worker's, and there's no stored last-snapshot + to read — the route assembles per request. Rebuilding in the tick keeps the + heartbeat self-contained and on its own clock; the per-tick cost is ~one + `build_window_view` per Claude pane, already the order of what the route does. + +4. **System prompt as a module constant vs. a file.** Constant in `first_mate.py`. + *Alternative:* `periscope/first_mate_prompt.txt` read at spawn. *Close + because:* a file is nicer for a non-engineer to edit and keeps the module + lean, but at ~50 lines a constant is versioned-in-diff, greppable, and needs + no read-IO at spawn. Promote to a file when it grows or wants out-of-band + editing — not now. diff --git a/periscope/activity.py b/periscope/activity.py index 9c8958a..a235bee 100644 --- a/periscope/activity.py +++ b/periscope/activity.py @@ -719,8 +719,32 @@ def _check_reset(pane_id: str, cwd: str, context_pct, last_ctx: dict) -> bool: # it captures each active Claude pane, runs the context-reset check, and # drives the narrator (semantic status + auto-rename). +def _safe_usage() -> dict | None: + try: + from periscope.usage import cached_plan_usage + return cached_plan_usage() + except Exception: + return None + + +async def _emit_pending_first_mate(last_ctx: dict) -> None: + """Main-loop side of the heartbeat: send a Push stashed by _worker_tick. + Awaited in run_worker (the MCP sessions are main-loop-affine — must NOT be + emitted from the worker thread).""" + pending = last_ctx.pop("_fm_push", None) + if not pending: + return + pane_id, content, cur = pending + from periscope.channels import emit_channel_event + from periscope import first_mate + ok = await emit_channel_event(pane_id, content, {"kind": "fleet_digest"}) + if ok: + first_mate._LAST_SENT = cur # advance only on a successful send + + def _worker_tick(last_ctx: dict) -> None: """One worker pass. Blocking (tmux + git subprocesses) — run off-loop.""" + now = int(time.time()) panes: list[tuple[dict, dict]] = [] for w in list_windows(): target = f"{w['session']}:{w['index']}" @@ -742,6 +766,21 @@ def _worker_tick(last_ctx: dict) -> None: narrator.tick(panes) except Exception: log.exception("narrator tick failed") + # First mate: supervisor liveness + heartbeat decision (sync; the async + # emit is hoisted to run_worker's main loop — see _emit_pending_first_mate). + try: + from periscope import first_mate + first_mate.supervisor_pass(now=now) + cur = first_mate.build_fleet_digest( + panes=first_mate.assemble_pane_views(panes, now), usage=_safe_usage(), now=now, + ) + push = first_mate.heartbeat_decide( + prev=first_mate._LAST_SENT, cur=cur, marker=get_first_mate(), + ) + if push is not None: + last_ctx["_fm_push"] = (push.pane_id, push.content, cur) + except Exception: + log.exception("first-mate worker pass failed") # Keep periscope.db-wal bounded — see checkpoint() docstring for why # SQLite's default auto-checkpoint isn't enough on its own. checkpoint() @@ -754,6 +793,7 @@ async def run_worker() -> None: while True: try: await asyncio.to_thread(_worker_tick, last_ctx) + await _emit_pending_first_mate(last_ctx) except Exception: log.exception("activity worker tick failed") await asyncio.sleep(30) diff --git a/periscope/app.py b/periscope/app.py index beb7039..d5a0dd1 100644 --- a/periscope/app.py +++ b/periscope/app.py @@ -84,8 +84,13 @@ def _pane_sessions_housekeeping() -> None: # spent Haiku on every narrator tick. Same guard as the MCP listener. # NB: _task's signature is _task(name, coro). if config.is_prod(): - from periscope import activity + from periscope import activity, first_mate activity_task = _task("activity-worker", activity.run_worker()) + # Register the bridge rail project so the supervisor-spawned first-mate + # pane is reachable in the dashboard (not folded into 'dev'). Main-loop + # state write — safe here, unlike the worker thread. Prod-only: dev never + # spawns a first mate. + first_mate.register_bridge_project() else: activity_task = None try: diff --git a/periscope/channels.py b/periscope/channels.py index fe51a08..d6f3411 100644 --- a/periscope/channels.py +++ b/periscope/channels.py @@ -194,6 +194,17 @@ def _do_notify_tool(pane: str, arguments: dict): except Exception: log.warning("activity.record failed for notify()", exc_info=True) + # Interrupt tier: a need_human wakes the first mate immediately, out of band + # from the 30s heartbeat. (Other kinds ride the next heartbeat digest.) + if kind == "need_human": + try: + from periscope import activity as _activity + marker = _activity.get_first_mate() + if marker is not None: + _schedule_first_mate_emit(marker.pane_id, f"need_human from {pane}: {message}") + except Exception: + log.warning("first-mate need_human hook failed", exc_info=True) + body = {"ok": True, "kind": kind, "severity": severity} return _tool_result(body) @@ -240,6 +251,27 @@ def _do_captains_log_append_tool(pane: str, arguments: dict): return _tool_result({"ok": True}) +def _serialize_digest(d) -> dict: + return { + "at": d.at, "budget_pct": d.budget_pct, "budget_resets_at": d.budget_resets_at, + "panes": [ + {"handle": p.handle, "name": p.name, "session": p.session, + "status_line": p.status_line, "blocked": p.blocked, "pr": p.pr, + "ci": p.ci, "idle_s": p.idle_s} + for p in d.panes + ], + } + + +def _do_fleet_digest_tool(pane: str, arguments: dict): + """Return the last-pushed fleet digest (first-mate-only on-demand pull).""" + if not _require_first_mate(pane): + return _tool_result({"ok": False, "error": "first-mate-only tool"}) + from periscope import first_mate + d = first_mate._LAST_SENT + return _tool_result({"ok": True, "digest": _serialize_digest(d) if d else None}) + + def _resolve_window(match) -> tuple[str, str]: """Find the first `list_windows()` entry satisfying `match`, resolve its persistent @periscope_id (minting one if the window is new), and return @@ -703,6 +735,14 @@ async def emit_channel_event(pane: str, content: str, meta: dict | None = None) return False +def _schedule_first_mate_emit(pane_id: str, content: str) -> None: + """Fire-and-forget a channel push to the first-mate pane from a main-loop + context (the MCP tool handler runs there). Wrapped in _task so a crash is + logged, not swallowed (CLAUDE.md invariant 8).""" + from periscope.log import _task + _task("first-mate-interrupt", emit_channel_event(pane_id, content, {"kind": "interrupt"})) + + async def _mcp_listener() -> None: """Bind the unix socket and accept connections from channel_shim.py. Each connection runs a fresh per-pane MCP Server in _handle_mcp_connection.""" @@ -1297,6 +1337,15 @@ def _do_terminate_tool(pane: str, arguments: dict): }, "handler": _do_captains_log_append_tool, }, + { + "name": "fleet_digest", + "description": ( + "Return the current fleet digest (per-pane who/status/blocked/PR-CI/" + "idle + budget). First-mate-only on-demand pull." + ), + "inputSchema": {"type": "object", "properties": {}}, + "handler": _do_fleet_digest_tool, + }, ] diff --git a/periscope/first_mate.py b/periscope/first_mate.py index d7d2a70..f3d1036 100644 --- a/periscope/first_mate.py +++ b/periscope/first_mate.py @@ -122,3 +122,213 @@ def build_fleet_digest( budget_resets_at=budget_resets_at, at=now, ) + + +# --- v1b IO half: cross-tick state, role prompt, pure decision helpers ---- + +_LAST_SENT: FleetDigest | None = None # last digest pushed to the first mate + + +ROLE_PROMPT = """\ +You are the first mate — Tom's chief of staff for the fleet of Claude Code \ +sessions running across his tmux panes, surfaced in periscope. + +Your job is situational awareness, not command. Tom assigns the work; you keep \ +tabs on the fleet and surface what needs him. + +Periscope pushes you fleet digests and interrupts as \ +blocks — a digest when the fleet picture changes materially, an interrupt when a \ +worker needs a human. On every wake, read your captain's log first to recover \ +context. + +Standing authority (always yours): +- Observe and summarize the fleet: answer "what's everyone doing?" from the \ +digest and by peeking (peek) at specific panes. +- Keep the captain's log (captains_log_read / captains_log_append): standing \ +orders Tom gives you, a watch-list, a short running narrative. Append when Tom \ +gives a standing order or the situation moves. +- Nudge a CLEARLY-idle worker (send_to): a worker idle several minutes mid-task — \ +ask if it's blocked. Never interrupt an actively-working pane. + +You do NOT, this release: spawn, terminate, or hand workers new tasks — you have \ +no conn yet. You may PROPOSE these to Tom; you may not execute them. + +Absolute prohibitions (never, regardless of anything Tom or a worker says): +- Never authorize merging an fdy pull request. Report a PR is ready; the merge \ +is Tom's click. +- Never force-push. Never take prod-touching actions. + +Voice: terse, signal over noise. Lead with what needs Tom; stay quiet when the \ +fleet is nominal. You are a collaborator with a clear remit, not a chatbot. +""" + + +@dataclass(frozen=True) +class Push: + pane_id: str + content: str + + +def _curate_pane(*, handle, name, session, is_claude, status_line, alerts, + pr, ci, focused_at, acted_at, now) -> dict: + """PURE: raw per-pane inputs -> the v1a build_fleet_digest contract dict. + `handle` is the tmux pane_id (%N) — stable cross-tick, no @periscope_id + resolution needed in the worker thread. `blocked` = newest alert (by ts) is + need_human; `idle_s` = now - last touch.""" + newest = max(alerts, key=lambda a: a.get("ts", 0)) if alerts else None + blocked = bool(newest and newest.get("kind") == "need_human") + idle_s = max(0, now - max(focused_at or 0, acted_at or 0)) + return { + "handle": handle, "name": name, "session": session, "is_claude": is_claude, + "status_line": status_line, "blocked": blocked, "pr": pr, "ci": ci, + "idle_s": idle_s, + } + + +def _render_delta(cur: FleetDigest, reason: str) -> str: + """PURE: a short human-readable delta for the push body. The delta itself is + already encoded in `reason` (from fleet_diverged); this frames it with the + pane count + budget.""" + budget = "" + if cur.budget_pct is not None: + budget = f" · budget {cur.budget_pct}%" + if cur.budget_resets_at: + budget += f" (resets {cur.budget_resets_at})" + n = len(cur.panes) + return f"fleet: {n} pane(s){budget} — {reason}" + + +def heartbeat_decide(*, prev, cur, marker) -> "Push | None": + """PURE: decide whether to push `cur` to the first mate. Returns a Push + (pane_id + rendered delta) or None. No IO; the caller computes `cur` and + awaits the emit on the main loop.""" + if marker is None: + return None + diverged, reason = fleet_diverged(prev, cur) + if not diverged: + return None + return Push(pane_id=marker.pane_id, content=_render_delta(cur, reason)) + + +def assemble_pane_views(panes: list, now: int) -> list[dict]: + """IO glue: turn the worker's (window, parsed) pairs into curated contract + dicts via read-only primitives + the pure _curate_pane. No build_window_view + (its poll-coupled side effects must not fire on the worker's cadence).""" + from periscope import activity + from periscope.channels import channel_state_for + from periscope.git_pr import cached_git_state, cached_pr_state + from periscope.panes import recency_stamps_for + + status_lines = activity.pane_status_lines() + out = [] + for w, parsed in panes: + if not parsed.get("is_claude"): + continue + # Worker rows carry pane_id (%N) + pid_raw, NOT a resolved @periscope_id + # (pid is attached only after _attach_git_then_resolve_pids, which writes + # state.json and is NOT thread-safe — must not run in the to_thread tick). + # %N is stable across ticks and keys pane_status + channel state, so use it + # as the digest handle directly. + pane_id = w.get("pane_id") or "" + cwd = w.get("cwd") or "" + target = f"{w.get('session')}:{w.get('index')}" + st = status_lines.get(pane_id) # pane_status is keyed by %N + git = cached_git_state(cwd) or {} + pr = cached_pr_state(cwd, git.get("branch")) or {} + stamps = recency_stamps_for(target) + out.append(_curate_pane( + handle=pane_id, name=w.get("name") or w.get("index") or "", + session=w.get("session") or "", + is_claude=True, status_line=st[0] if st else None, + alerts=channel_state_for(pane_id).get("alerts", []), + pr=pr.get("pr"), ci=pr.get("ci"), + focused_at=stamps.get("focused_at", 0), acted_at=stamps.get("acted_at", 0), + now=now, + )) + return out + + +FIRST_MATE_SESSION = "bridge" +FIRST_MATE_WINDOW = "first-mate" + + +def supervisor_pass(*, now: int) -> None: + """Ensure exactly one live first-mate pane. No-op if the marked pane is + alive; (re)spawn + re-mark if the marker is missing or its pane is gone. + Idempotent — a live marker short-circuits, preventing double-spawn.""" + from periscope import activity + from periscope.panes import list_windows + + marker = activity.get_first_mate() + live = {w.get("pane_id") for w in list_windows()} + if marker is not None and marker.pane_id in live: + return + _spawn_first_mate(now=now) + + +def _spawn_first_mate(*, now: int) -> None: + """Ensure the `bridge` session, open a single `first-mate` window running + claude_exec() + --append-system-prompt ROLE_PROMPT, stamp it, set the + marker. Borrows worktree_spawn._layout_two_window's sequence (single window, + no HTTPException — this is a lifespan task, not a request).""" + import os + import shlex + import time as _time + from periscope.tmux import tmux, _tmux_mutate + # Function-level imports (keep them here): a test monkeypatches + # `periscope.config.is_prod`, which only takes effect if is_prod is + # re-resolved per call rather than bound at module import. + from periscope.config import claude_exec, is_prod + from periscope.channels import dismiss_dev_channels_consent_bg + from periscope.pids import stamp_new_window + from periscope.open_ops import _session_live # socket-aware has-session + from periscope.log import log + from periscope import activity + + if not is_prod(): + return # defense in depth: never spawn a budget-spender off prod + + home = os.path.expanduser("~") + if not _session_live(FIRST_MATE_SESSION): + ok, msg = _tmux_mutate("new-session", "-d", "-s", FIRST_MATE_SESSION, + "-c", home, "-n", FIRST_MATE_WINDOW) + else: + ok, msg = _tmux_mutate("new-window", "-t", f"{FIRST_MATE_SESSION}:", + "-c", home, "-n", FIRST_MATE_WINDOW) + if not ok: + # Don't stamp a marker for a window that doesn't exist — the next tick + # retries cleanly. Stamping now would leak a bogus marker. + log.warning("first-mate spawn: tmux window create failed: %s", msg) + return + target = f"{FIRST_MATE_SESSION}:{FIRST_MATE_WINDOW}" + exec_cmd = f"{claude_exec()} --append-system-prompt {shlex.quote(ROLE_PROMPT)}" + _time.sleep(0.1) # let rc finish before the command lands (CLAUDE.md note 5) + _tmux_mutate("send-keys", "-t", target, exec_cmd, "Enter") + if "--dangerously-load-development-channels" in exec_cmd: + dismiss_dev_channels_consent_bg(target) + stamp_new_window(target) + pane_id = tmux("display-message", "-t", target, "-p", "#{pane_id}").strip() + if not pane_id: + # A bogus empty marker is never in the live set, so the supervisor would + # respawn every tick — an unbounded window/budget leak. Leave the marker + # unset; the next tick retries cleanly. + log.warning("first-mate spawn: could not read pane_id; leaving marker unset") + return + activity.set_first_mate(pane_id=pane_id, session_id=None, at=now) + + +def register_bridge_project(*, home: str | None = None) -> None: + """Register the `bridge` session as a first-class rail project so the + first-mate pane is reachable in the dashboard instead of folding into the + 'dev' group. Idempotent; `repo=None` (the rail renders a null-repo project as + its own group labelled by `name`). Writes state.json, so call from the + main loop (the prod-gated lifespan), NOT the worker thread.""" + import os + from periscope import projects + + pinned = os.path.realpath(home or os.path.expanduser("~")) + if projects.get_project(pinned): + projects.update_project(pinned, tmux_session=FIRST_MATE_SESSION, name="bridge") + else: + projects.create_project(pinned, name="bridge", + tmux_session=FIRST_MATE_SESSION, repo=None, base_branch=None) diff --git a/static/src/split/__tests__/railTree.test.js b/static/src/split/__tests__/railTree.test.js index 7a02bd0..eced6fe 100644 --- a/static/src/split/__tests__/railTree.test.js +++ b/static/src/split/__tests__/railTree.test.js @@ -96,6 +96,16 @@ describe("mergeLiveAndPrefs", () => { expect(m.panesByWorktree["notes"]).toEqual(["n1"]); // no "review" }); + it("the bridge session renders as its own 'bridge' group, not folded into dev", () => { + const bridgeProjects = [proj({ pinned_dir: "/Users/tom", repo: null, tmux_session: "bridge", name: "bridge" })]; + const ws = [win({ pid: "fm", session: "bridge", project_pinned_dir: "/Users/tom" })]; + const m = mergeLiveAndPrefs(ws, bridgeProjects, [], {}, {}); + expect(m.repoOrder).toEqual(["/Users/tom"]); // own group, not MAIN_KEY + expect(m.worktreesByRepo["/Users/tom"]).toEqual(["bridge"]); + expect(m.panesByWorktree["bridge"]).toEqual(["fm"]); // the first-mate pane, no review row + expect(groupLabel("/Users/tom", indexProjects(bridgeProjects))).toBe("bridge"); + }); + it("dev pane order persists via prefs panes_by_worktree[MAIN_KEY]", () => { const ws = [ win({ pid: "x", session: "main", project_pinned_dir: MAIN_KEY }), diff --git a/tests/test_channels.py b/tests/test_channels.py index 9b235de..94ab445 100644 --- a/tests/test_channels.py +++ b/tests/test_channels.py @@ -652,3 +652,54 @@ def test_captains_log_append_rejects_empty_text(fresh_activity_db): activity.set_first_mate(pane_id="%9", session_id=None, at=1) bad = _body(_do_captains_log_append_tool("%9", {"kind": "watch", "text": " "})) assert bad["ok"] is False + + +def test_need_human_notify_schedules_emit_to_first_mate(fresh_activity_db, monkeypatch): + from periscope import activity + from periscope.channels import _do_notify_tool + activity.set_first_mate(pane_id="%9", session_id=None, at=1) + sent = [] + # Replace the scheduler so we don't need a running loop in this sync test. + monkeypatch.setattr("periscope.channels._schedule_first_mate_emit", + lambda pane_id, content: sent.append((pane_id, content))) + _do_notify_tool("%5", {"message": "blocked on schema", "kind": "need_human"}) + assert sent and sent[0][0] == "%9" and "blocked on schema" in sent[0][1] + + +def test_non_need_human_notify_schedules_no_emit(fresh_activity_db, monkeypatch): + from periscope import activity + from periscope.channels import _do_notify_tool + activity.set_first_mate(pane_id="%9", session_id=None, at=1) + sent = [] + monkeypatch.setattr("periscope.channels._schedule_first_mate_emit", + lambda pane_id, content: sent.append((pane_id, content))) + _do_notify_tool("%5", {"message": "done", "kind": "done"}) + assert sent == [] + + +def test_need_human_notify_no_marker_is_safe(fresh_activity_db, monkeypatch): + from periscope.channels import _do_notify_tool + sent = [] + monkeypatch.setattr("periscope.channels._schedule_first_mate_emit", + lambda pane_id, content: sent.append((pane_id, content))) + _do_notify_tool("%5", {"message": "x", "kind": "need_human"}) # no first mate set + assert sent == [] + + +def test_fleet_digest_tool_refuses_non_first_mate(fresh_activity_db): + from periscope.channels import _do_fleet_digest_tool + r = _body(_do_fleet_digest_tool("%5", {})) + assert r["ok"] is False and "first-mate" in r["error"].lower() + + +def test_fleet_digest_tool_returns_cached_digest(fresh_activity_db): + from periscope import activity, first_mate + from periscope.channels import _do_fleet_digest_tool + activity.set_first_mate(pane_id="%9", session_id=None, at=1) + first_mate._LAST_SENT = first_mate.FleetDigest( + panes=(first_mate.PaneDigest("@1", "w", "s", "run", False, 7, "✓", 3),), + budget_pct=55, budget_resets_at=None, at=10) + r = _body(_do_fleet_digest_tool("%9", {})) + assert r["ok"] is True and r["digest"]["budget_pct"] == 55 + assert r["digest"]["panes"][0]["handle"] == "@1" + first_mate._LAST_SENT = None # reset module global for other tests diff --git a/tests/test_first_mate.py b/tests/test_first_mate.py index 2b65ef7..fd9f3b0 100644 --- a/tests/test_first_mate.py +++ b/tests/test_first_mate.py @@ -131,3 +131,204 @@ def test_build_stamps_now(): d = build_fleet_digest(panes=[], usage=None, now=4242) assert d.at == 4242 assert d.panes == () + + +from periscope.first_mate import ( + _curate_pane, _render_delta, heartbeat_decide, +) +from periscope.activity import FirstMateMarker + + +def test_curate_pane_derives_blocked_from_newest_need_human_alert(): + d = _curate_pane( + handle="@1", name="w", session="s", is_claude=True, status_line="working", + alerts=[{"kind": "info", "ts": 10}, {"kind": "need_human", "ts": 20}], + pr=1234, ci="✓", focused_at=100, acted_at=90, now=130, + ) + assert d["blocked"] is True + assert d["idle_s"] == 30 # now - max(focused, acted) + assert d["handle"] == "@1" and d["pr"] == 1234 and d["ci"] == "✓" + + +def test_curate_pane_not_blocked_when_newest_alert_is_not_need_human(): + d = _curate_pane( + handle="@1", name="w", session="s", is_claude=True, status_line=None, + alerts=[{"kind": "need_human", "ts": 10}, {"kind": "done", "ts": 20}], + pr=None, ci=None, focused_at=0, acted_at=0, now=5, + ) + assert d["blocked"] is False + assert d["idle_s"] == 5 # max(0,0)=0 -> now-0 + + +def test_render_delta_mentions_changed_panes_and_budget(): + cur = FleetDigest(panes=(PaneDigest("@1","w","s","run",True,None,None,0),), + budget_pct=71, budget_resets_at=None, at=2) + text = _render_delta(cur, "@1 blocked") + assert "@1" in text and "71" in text + + +def test_heartbeat_decide_pushes_on_divergence_when_marker_present(): + marker = FirstMateMarker(pane_id="%9", session_id=None, updated_at=1) + prev = None + cur = FleetDigest(panes=(), budget_pct=50, budget_resets_at=None, at=2) + push = heartbeat_decide(prev=prev, cur=cur, marker=marker) + assert push is not None and push.pane_id == "%9" and push.content + + +def test_heartbeat_decide_none_when_no_marker(): + cur = FleetDigest(panes=(), budget_pct=50, budget_resets_at=None, at=2) + assert heartbeat_decide(prev=None, cur=cur, marker=None) is None + + +def test_heartbeat_decide_none_when_not_diverged(): + marker = FirstMateMarker(pane_id="%9", session_id=None, updated_at=1) + a = FleetDigest(panes=(), budget_pct=50, budget_resets_at=None, at=1) + b = FleetDigest(panes=(), budget_pct=50, budget_resets_at=None, at=2) + assert heartbeat_decide(prev=a, cur=b, marker=marker) is None + + +def test_heartbeat_decide_pushes_on_ci_red_even_if_otherwise_nominal(): + marker = FirstMateMarker(pane_id="%9", session_id=None, updated_at=1) + prev = FleetDigest(panes=(PaneDigest("@1","w","s","x",False,7,"✓",0),), + budget_pct=50, budget_resets_at=None, at=1) + cur = FleetDigest(panes=(PaneDigest("@1","w","s","x",False,7,"✗",0),), + budget_pct=50, budget_resets_at=None, at=2) + push = heartbeat_decide(prev=prev, cur=cur, marker=marker) + assert push is not None # CI ✓->✗ forces a push (also caught by pr/ci divergence) + + +def test_assemble_pane_views_uses_curate_and_skips_non_claude(monkeypatch, fresh_activity_db): + from periscope import first_mate, activity + import periscope.channels as channels + import periscope.panes as panes + import periscope.git_pr as git_pr + + # Two windows; one is not Claude and must be dropped. + panes_in = [ + ({"session": "s", "index": "1", "cwd": "/r", "pane_id": "%5", "pid": "@1"}, + {"is_claude": True}), + ({"session": "s", "index": "2", "cwd": "/r", "pane_id": "%6", "pid": "@2"}, + {"is_claude": False}), + ] + # pane_status is keyed by tmux %N (pane_id), not @periscope_id — match real shape. + monkeypatch.setattr(activity, "pane_status_lines", lambda: {"%5": ("running tests", 0, None)}) + monkeypatch.setattr(channels, "channel_state_for", + lambda pid: {"alerts": [{"kind": "need_human", "ts": 9}]}) + monkeypatch.setattr(git_pr, "cached_git_state", lambda p: {"branch": "b"}) + monkeypatch.setattr(git_pr, "cached_pr_state", lambda p, b: {"pr": 7, "ci": "✗"}) + monkeypatch.setattr(panes, "recency_stamps_for", + lambda t: {"focused_at": 100, "acted_at": 100}) + + views = first_mate.assemble_pane_views(panes_in, now=130) + assert len(views) == 1 + v = views[0] + assert v["handle"] == "%5" and v["status_line"] == "running tests" # handle = pane_id (%N) + assert v["blocked"] is True and v["pr"] == 7 and v["ci"] == "✗" + assert v["idle_s"] == 30 + + +def test_run_worker_emits_pending_push_and_advances_last_sent(monkeypatch): + import asyncio + from periscope import activity, first_mate + sent = [] + + async def fake_emit(pane_id, content, meta=None): + sent.append((pane_id, content)) + return True + + monkeypatch.setattr("periscope.channels.emit_channel_event", fake_emit) + first_mate._LAST_SENT = None + cur = first_mate.FleetDigest(panes=(), budget_pct=50, budget_resets_at=None, at=2) + last_ctx = {"_fm_push": ("%9", "delta text", cur)} + + asyncio.run(activity._emit_pending_first_mate(last_ctx)) # sync test: drive the coro + assert sent == [("%9", "delta text")] + assert first_mate._LAST_SENT is cur # advanced on ok + assert "_fm_push" not in last_ctx # consumed + first_mate._LAST_SENT = None # reset module global for other tests + + +def test_run_worker_keeps_last_sent_on_failed_emit(monkeypatch): + import asyncio + from periscope import activity, first_mate + + async def fake_emit(pane_id, content, meta=None): + return False # pane not attached + + monkeypatch.setattr("periscope.channels.emit_channel_event", fake_emit) + first_mate._LAST_SENT = None + cur = first_mate.FleetDigest(panes=(), budget_pct=50, budget_resets_at=None, at=2) + last_ctx = {"_fm_push": ("%9", "delta", cur)} + + asyncio.run(activity._emit_pending_first_mate(last_ctx)) + assert first_mate._LAST_SENT is None # NOT advanced -> next tick re-pushes + + +def test_supervisor_noop_when_marker_alive(monkeypatch, fresh_activity_db): + from periscope import first_mate, activity + import periscope.panes as panes + activity.set_first_mate(pane_id="%9", session_id=None, at=1) + monkeypatch.setattr(panes, "list_windows", lambda: [{"pane_id": "%9"}]) + called = [] + monkeypatch.setattr(first_mate, "_spawn_first_mate", lambda *, now: called.append(now)) + first_mate.supervisor_pass(now=5) + assert called == [] # alive -> no respawn + + +def test_supervisor_respawns_when_marker_missing(monkeypatch, fresh_activity_db): + from periscope import first_mate + import periscope.panes as panes + monkeypatch.setattr(panes, "list_windows", lambda: []) + called = [] + monkeypatch.setattr(first_mate, "_spawn_first_mate", lambda *, now: called.append(now)) + first_mate.supervisor_pass(now=5) + assert called == [5] # no marker -> spawn + + +def test_supervisor_respawns_when_marked_pane_dead(monkeypatch, fresh_activity_db): + from periscope import first_mate, activity + import periscope.panes as panes + activity.set_first_mate(pane_id="%9", session_id=None, at=1) + monkeypatch.setattr(panes, "list_windows", lambda: [{"pane_id": "%7"}]) # %9 gone + called = [] + monkeypatch.setattr(first_mate, "_spawn_first_mate", lambda *, now: called.append(now)) + first_mate.supervisor_pass(now=5) + assert called == [5] + + +def test_spawn_leaves_marker_unset_on_empty_pane_id(monkeypatch, fresh_activity_db): + # If display-message can't read the new window's %N, stamping pane_id="" would + # be a marker never in the live set -> the supervisor respawns every tick (a + # window/budget leak). The guard must leave the marker unset instead. + from periscope import first_mate, activity + import periscope.tmux as tmuxmod + import periscope.config as config + import periscope.channels as channels + import periscope.pids as pids + import periscope.open_ops as open_ops + + monkeypatch.setattr(config, "is_prod", lambda: True) + monkeypatch.setattr(open_ops, "_session_live", lambda name: True) + monkeypatch.setattr(tmuxmod, "_tmux_mutate", lambda *a, **k: (True, "")) + monkeypatch.setattr(config, "claude_exec", lambda: "claude") + monkeypatch.setattr(channels, "dismiss_dev_channels_consent_bg", lambda *a, **k: None) + monkeypatch.setattr(pids, "stamp_new_window", lambda t: "") + monkeypatch.setattr(tmuxmod, "tmux", lambda *a, **k: "") # display-message read fails + + first_mate._spawn_first_mate(now=1) + assert activity.get_first_mate() is None # no phantom marker -> no respawn loop + + +def test_register_bridge_project_creates_null_repo_project(clean_state, tmp_path): + import os + from periscope import first_mate, projects + first_mate.register_bridge_project(home=str(tmp_path)) + pinned = os.path.realpath(str(tmp_path)) + proj = projects.get_project(pinned) + assert proj["name"] == "bridge" + assert proj["tmux_session"] == "bridge" + assert proj.get("repo") is None # null-repo -> renders as its own named rail group + # Idempotent: a second call must not raise (create_project would ValueError + # on a duplicate pinned_dir — register must take the update path instead). + first_mate.register_bridge_project(home=str(tmp_path)) + assert projects.get_project(pinned)["tmux_session"] == "bridge" diff --git a/tests/test_first_mate_spawn.py b/tests/test_first_mate_spawn.py new file mode 100644 index 0000000..98fabc4 --- /dev/null +++ b/tests/test_first_mate_spawn.py @@ -0,0 +1,33 @@ +import shutil +import pytest + +from periscope import first_mate, activity + +needs_tmux = pytest.mark.skipif(not shutil.which("tmux"), reason="tmux not installed") + + +@needs_tmux +def test_supervisor_spawns_and_respawns(tmux_test_server, fresh_activity_db, monkeypatch): + # tmux_test_server sets PERISCOPE_TMUX_SOCKET (isolated -L) + PERISCOPE_CLAUDE_EXEC=cat + monkeypatch.setattr("periscope.config.is_prod", lambda: True) # allow spawn under test + from periscope.panes import list_windows + + # First pass: no marker -> spawns a first-mate window + marks it. + first_mate.supervisor_pass(now=1) + m1 = activity.get_first_mate() + assert m1 is not None + live = {w["pane_id"] for w in list_windows()} + assert m1.pane_id in live + + # Second pass: marker alive -> idempotent, no new window. + before = len(list_windows()) + first_mate.supervisor_pass(now=2) + assert len(list_windows()) == before + assert activity.get_first_mate().pane_id == m1.pane_id + + # Kill the marked pane -> third pass respawns + re-marks. + from periscope.tmux import _tmux_mutate + _tmux_mutate("kill-window", "-t", f"{first_mate.FIRST_MATE_SESSION}:{first_mate.FIRST_MATE_WINDOW}") + first_mate.supervisor_pass(now=3) + m3 = activity.get_first_mate() + assert m3 is not None and m3.pane_id in {w["pane_id"] for w in list_windows()}