fix(web): cap live session subscriptions to reduce lag with many sessions#1320
Conversation
…ions Every opened session stayed subscribed to its WebSocket event stream across reconnects, so opening hundreds of sessions turned background events into a constant reducer and sidebar recompute storm. Keep only the four most-recently-opened sessions subscribed; evicted sessions resume from their tracked cursor on re-open.
🦋 Changeset detectedLatest commit: a12b756 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53b633790b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const tail = wsSubscriptionOrder.at(-1); | ||
| if (tail === undefined || tail === rawState.activeSessionId) break; | ||
| wsSubscriptionOrder.pop(); | ||
| eventConn?.unsubscribe(tail); |
There was a problem hiding this comment.
Keep evicted cursors from skipping missed events
When this unsubscribes an older session, that session can still receive global session events (for example event.session.status_changed / session.meta.updated are broadcast to all connections), and reduceAppEvent advances lastSeqBySession for every delivered event. If an evicted session emits per-session durable events and then a global status/meta event, the UI cursor jumps past the missed events; subscribeToSessionEvents() later resumes from that advanced cursor, so the daemon will not replay the skipped transcript/task updates. This leaves reopened sessions stale unless eviction forces a snapshot or global events stop advancing cursors for sessions that are no longer subscribed.
Useful? React with 👍 / 👎.
Some session events (status_changed, meta_updated, ...) are broadcast to every connection and still advance lastSeqBySession for an unsubscribed session. If an evicted session emits per-session durable events and then a global event, the cursor jumps past the missed events, so resuming from it later would skip them and leave the reopened session stale. Track evicted sessions and reset their cursor on the next re-subscribe so the daemon replays or snapshots what was missed.
1 similar comment
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8142210a6d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (sessionsWithStaleCursor.delete(sessionId)) { | ||
| delete rawState.lastSeqBySession[sessionId]; | ||
| delete epochBySession[sessionId]; |
There was a problem hiding this comment.
Refresh evicted sessions via snapshot instead of seq 0 replay
When an evicted session is reopened, deleting its cursor makes subscribe() send {seq: 0}. For any short/normal session whose journal gap is within the server replay cap, getBufferedSince replays all durable events instead of returning resync_required; the raw projector generates fresh assistant/tool message ids for those replayed turns, so an already-loaded transcript gains duplicate/empty historical messages just by opening a fifth session and returning. Force a snapshot/resync for stale cursors, or keep a safe cursor, instead of resuming from zero.
Useful? React with 👍 / 👎.
| const tail = wsSubscriptionOrder.at(-1); | ||
| if (tail === undefined || tail === rawState.activeSessionId) break; |
There was a problem hiding this comment.
Continue evicting when the active session is last
The active session is not guaranteed to be at the front: first-time opens call retainWsSubscription() only after syncSessionFromSnapshot() awaits, so rapid clicks can complete out of order. If the current active session completed first and sits at the tail, this break leaves every older completion subscribed and the array can grow far beyond 4, recreating the lag for the multi-session/slow-snapshot case this cap targets. Skip over the active id or move it to the front before enforcing the cap.
Useful? React with 👍 / 👎.
Two fixes for the subscription cap: - Re-opening a session that was evicted now rebuilds it from a snapshot instead of resuming from seq 0. Replaying from zero made the projector regenerate assistant/tool message ids, which duplicated the already-loaded transcript; resuming from the kept cursor could skip per-session events that arrived while unsubscribed. - Eviction now skips the active session wherever it sits in the list, instead of breaking when it lands at the tail. First-time opens retain only after an awaited snapshot, so rapid clicks can complete out of order and leave the active session at the tail, which previously let the list grow past the cap.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6b58c69f99
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (sessionsWithStaleCursor.delete(sessionId)) { | ||
| return syncSessionFromSnapshot(sessionId); |
There was a problem hiding this comment.
Preserve stale cursor marker until snapshot succeeds
When an evicted session is reopened and getSessionSnapshot fails transiently, this delete consumes the only marker that forces a safe rebuild. syncSessionFromSnapshot returns 'failed' for non-404 errors without subscribing, so the now-active session can be left without live events; after switching away and back it will resume from rawState.lastSeqBySession, which this patch notes may have jumped past missed per-session events while evicted. Keep the marker until the snapshot returns 'ok' (or otherwise fall back to a safe resync path) so a temporary snapshot failure cannot permanently skip events.
Useful? React with 👍 / 👎.
Re-opening an evicted session deleted the stale-cursor marker before the snapshot ran. If the snapshot failed transiently, the marker was gone and a later re-open would fall back to subscribeToSessionEvents, resuming from a cursor that may have skipped per-session events while evicted. Read the marker instead of deleting it, and let syncSessionFromSnapshot clear it once the snapshot succeeds.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a12b7568f1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (victimIdx === -1) break; | ||
| const [victim] = wsSubscriptionOrder.splice(victimIdx, 1); | ||
| if (victim === undefined) break; | ||
| eventConn?.unsubscribe(victim); |
There was a problem hiding this comment.
Keep blocking request events for evicted sessions
When a user has opened more than four sessions, this removes the LRU session from the server-side subscription entirely. The server only broadcasts isGlobalSessionEvent frames to all connections (packages/server/src/services/gateway/wsBroadcastService.ts:112-114), and approval/question request frames are not in that global set (packages/server/src/services/gateway/wsBroadcastService.ts:239-256), so an evicted background session that later reaches event.question.requested or event.approval.requested will not populate pending badges or trigger onQuestionRequested notifications/sounds until the user manually reopens it. That can leave an agent blocked without the existing attention signal in exactly the many-session scenario this cap targets.
Useful? React with 👍 / 👎.
Related Issue
No linked issue — this comes from a user report that the web UI gets sluggish once they have a hundred or so sessions.
Problem
Every session the user opens subscribes to its WebSocket event stream, and the socket keeps all subscriptions across reconnects (re-sending them in
client_hello). Subscriptions were only dropped on archive / delete, never on session switch. After opening many sessions, every background session's status / meta / usage event flows through the reducer and dirties the sidebar computeds, so the whole UI slows down as the session count grows.What changed
Cap the number of live WebSocket session subscriptions with a small MRU list (4). The active session is always retained; when the cap is exceeded, the least-recently-opened subscription is evicted. Eviction only drops the live subscription — the per-session seq / epoch cursor is kept, so re-opening an evicted session resumes from the cursor (the daemon replays missed durable events, or answers
resync_required). Reconnects also get cheaper becauseclient_helloonly carries the retained subscriptions.Trade-off: a background session that has fallen out of the 4-slot window no longer lights up its unread dot / completion notification in real time; it syncs when the user switches back to it.
Checklist
gen-changesetsskill, or this PR needs no changeset.gen-docsskill, or this PR needs no doc update.