feat(slack-app): stream live agent status to Slack thread (chat.startStream)#64293
Draft
VojtechBartos wants to merge 12 commits into
Draft
feat(slack-app): stream live agent status to Slack thread (chat.startStream)#64293VojtechBartos wants to merge 12 commits into
VojtechBartos wants to merge 12 commits into
Conversation
Replace the static "Working on task…" placeholder with a single live status
message that gets chat_update'd as the agent works ("Reading api.py" →
"Running tests" → completion).
Architecture: relay_sandbox_events consumes the new _posthog/status
notification (emitted by the agent in PostHog/code) and signals the parent
ProcessTaskWorkflow on per-turn boundaries. The parent spawns a per-turn
SlackStatusRelayWorkflow child workflow that owns a debounce + throttle
flusher (1s debounce, ≥2s throttle, one in-flight activity, idle timeout
5m, execution ceiling 1h) and dispatches update_slack_status activities.
Region-aware feature flag posthog-code-slack-agent-status gates the whole
thing; requires chat:write on the integration. With the flag off the
legacy placeholder path is unchanged.
Contributor
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
products/tasks/backend/temporal/process_task/slack_status_relay.py:64-81
**Last pending status silently dropped at turn boundary**
When `complete_turn` and a recent `agent_status_update` signal arrive in the same scheduler tick, `wait_condition` returns with both `_turn_complete = True` and `_pending_text != None` already set. The `if self._turn_complete: break` guard fires immediately, bypassing the debounce/dispatch path and discarding the pending text. The last agent status message before a turn ends is therefore silently lost.
To flush the final status, a one-shot dispatch before `break` would prevent the loss — e.g. check `self._pending_text` after `wait_condition` returns, and dispatch if `_pending_text` is set and differs from `_last_dispatched_text`, even when `_turn_complete` is True.
### Issue 2 of 2
products/tasks/backend/temporal/process_task/activities/evaluate_slack_streaming_gate.py:32
Loading `Integration` without `select_related('team')` causes an extra DB query when `should_stream_slack_status` accesses `integration.team.organization_id`, giving an N+1 roundtrip on every gate evaluation.
```suggestion
integration = Integration.objects.select_related("team").get(id=input.integration_id)
```
Reviews (1): Last reviewed commit: "feat(slack-app): stream live agent statu..." | Re-trigger Greptile |
| from products.slack_app.backend.services.agent_status import should_stream_slack_status | ||
|
|
||
| try: | ||
| integration = Integration.objects.get(id=input.integration_id) |
Contributor
There was a problem hiding this comment.
Loading
Integration without select_related('team') causes an extra DB query when should_stream_slack_status accesses integration.team.organization_id, giving an N+1 roundtrip on every gate evaluation.
Suggested change
| integration = Integration.objects.get(id=input.integration_id) | |
| integration = Integration.objects.select_related("team").get(id=input.integration_id) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: products/tasks/backend/temporal/process_task/activities/evaluate_slack_streaming_gate.py
Line: 32
Comment:
Loading `Integration` without `select_related('team')` causes an extra DB query when `should_stream_slack_status` accesses `integration.team.organization_id`, giving an N+1 roundtrip on every gate evaluation.
```suggestion
integration = Integration.objects.select_related("team").get(id=input.integration_id)
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Switch the per-turn status flow from chat_update on a single message to Slack's native streaming methods (chat.startStream / appendStream / stopStream) with task_display_mode='plan'. Renders as the collapsible plan block in the channel thread, with task_update chunks transitioning each step from in_progress to complete as the agent moves between actions. SlackThreadHandler grows three stream methods (start / append / stop) to replace the single update_status. The per-turn child workflow opens the stream on first signal, appends step transitions for each coalesced status, and finalizes in a try/finally so a cancellation or idle timeout still closes the stream cleanly. Still only requires chat:write (no assistant:write needed for the channel surface). Rate limits (debounce 1s, throttle ≥2s, one in-flight, idle 5m, execution 1h) carry over unchanged.
Two refinements on top of the streaming-plan-block path: 1. Plan-block steps now show the bare tool name (e.g. "Read", "Bash") as the title and a short args preview (file path / command / query) as the details line, matching the Wordsmith / posthog-code agent UI. The `_posthog/status` notification from the agent carries tool_name and tool_args_preview; the orchestrator forwards them through as an agent_status_update payload dict. 2. Agent narrative text (assistant message chunks from session/update events) streams into the same Slack message as markdown_text chunks, appearing under the plan block exactly like the screenshot. A new agent_text_delta signal wires the relay activity → parent workflow → per-turn child workflow; the child's debounce/throttle flusher emits step transitions and narrative deltas in a single appendStream call. Single-channel-message UX: one streamed chat message per turn carries both the thinking plan block and the streaming answer.
When the streaming path is on, the follow-up turn's assistant text has already streamed into the per-turn plan-block message via markdown_text chunks. The legacy posthog-code-agent-relay workflow would post the same text as a separate thread message — duplicating what the user just saw. Persist the gate decision to TaskRun.state under STREAMING_STATE_KEY when evaluate_slack_streaming_gate runs, then have forward_pending_message's _enqueue_pending_reply_relay early-return when that flag is set. Delivery-failure path (_enqueue_pending_delivery_failure_relay) still fires unconditionally — that's a critical error the user needs to see, and the streamed message can't carry it after chat.stopStream has been called.
Add a `kind` field to TaskRunRelayMessageRequestSerializer with values 'reply' (default) or 'question'. When the streaming plan-block path is active for a run (TaskRun.state[STREAMING_STATE_KEY] is True), the relay_message endpoint returns skipped for kind='reply' — the agent's narrative has already streamed into the per-turn plan-block message and posting it again duplicates the entire reply as a separate message. Questions always relay; they're a pause-and-wait interaction the user needs to see. The paired PostHog/code change tags the agent-server's initial-response relay as kind='reply' (default) and the question relay as kind='question'.
Contributor
|
Size Change: 0 B Total Size: 64.4 MB ℹ️ View Unchanged
|
Slack's chat.startStream `recipient_user_id` is purely routing metadata — it doesn't trigger a notification. Append a markdown_text chunk with <@user_id> right before chat.stopStream so the original mention author gets pinged once when the message is finalized. The mention rides on the same streamed plan-block message; no separate ping post. User id sourced from SlackThreadContext.mentioning_slack_user_id — already plumbed through to the handler — so no signal-payload or activity-input changes are needed.
… arrives The child workflow refused to call chat.startStream until an agent_status_update signal arrived — but when the sandbox runs an @posthog/agent that doesn't emit _posthog/status notifications (e.g. the npm-published version that predates the paired agent PR), every turn produces only agent_text_delta signals. The markdown buffer accumulated forever and nothing reached Slack. Synthesize a 'Thinking' placeholder step on first flush when no real step is pending, so the plan block always opens and the narrative body can stream regardless of whether the agent is emitting tool-status notifications.
Earlier flush logic deduplicated steps on (title, details), so two calls to the same tool inside one debounce window collapsed into one step on the plan block. Worse, the singleton pending_step would overwrite any earlier signal in the same window — a burst of N tools produced 1 step. Switch _pending_step to a queue. On flush, build a chunks list that marks the previously-active step complete, every intermediate queued step complete, and the last queued step in_progress (with a fresh generated id per step). All transitions plus any pending markdown ride on a single chat.appendStream call, so a burst becomes one API request with N+1 chunks rather than N separate calls. The handler API drops the (complete_*, new_*) shape in favor of a list[task_update] + markdown_text — the workflow builds the list since it owns the transition pattern.
Consume the new tool_intent field from PR A's _posthog/status payload. The orchestrator decides per-step whether to render intent as the plan-block details or leave it as body markdown, with a 256-char threshold (Slack's task_update.details limit): - intent ≤ 256: use as step details, signal clear_buffer=True so the child workflow drops its markdown buffer — the same prose was also streamed via agent_message_chunk deltas and would otherwise duplicate. - intent > 256 or absent: fall back to tool_args_preview for details, leave the markdown buffer alone so the prose still surfaces in the message body. The child workflow respects clear_buffer in its agent_status_update signal handler. Existing turn behavior with older agent images (no tool_intent at all) is unchanged — they emit empty intent → fallback to args_preview, no buffer clearing.
…ils" This reverts commit 5d46c80.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Slack threads triggered by PostHog Code @-mentions show only a static "Working on task…" placeholder during multi-minute runs. We want a live status line that updates as the agent works — matching Slack's agent design for channel mentions.
Slack's full assistant UI (AGENT badge, collapsible plan block) is gated to DM assistant containers, but
chat.startStream/chat.appendStream/chat.stopStreamare available in channels and supporttask_display_mode: "plan"— same primitive Claude uses for its in-channel "Summarising findings…" status. Only requireschat:write(noassistant:writeneeded).Architecture
Changes
SlackThreadHandlermethods:start_status_stream,append_status_step,stop_status_stream. Each wraps the correspondingslack_sdkstream call.start_slack_status_stream(returns the streamtsorNone),append_slack_status_step,stop_slack_status_stream. Failures logged + swallowed.SlackStatusRelayWorkflowper-turn child. Tracks_stream_ts,_current_task_id,_current_task_title. Opens the stream on first signal, appends step transitions, finalizes intry / finallyso cancellation or idle timeout still closes the stream.relay_sandbox_eventsconsumes the new_posthog/statusnotification (paired with feat(agent): emit status notifications for orchestrator streaming code#2732) and emits three signal types:turn_started,agent_status_update,turn_completed.ProcessTaskWorkflowgains three signal handlers that own the child workflow lifecycle. Static "Working on task…" placeholder is skipped when the streaming gate is on.evaluate_slack_streaming_gateactivity wraps the region-aware feature flagposthog-code-slack-agent-status+chat:writescope check.TASKS_TASK_QUEUE.Rate-limit math:
chat.appendStreamshareschat.update's tier-3 ceilings (~50/min). Debounce 1 s + throttle ≥ 2 s caps us at ≤ 30 appendStream calls/min/turn — comfortably under.How did you test this code?
Agent-authored. Follow-up tests planned (not yet committed):
End-to-end verification with the flag enabled is paired with PostHog/code; once both deploy I'll enable for one team locally and confirm the plan block renders in the channel thread.
🤖 Agent context
Autonomy: Human-driven (agent-assisted)
Model: Claude Opus 4.7. Paired with PostHog/code#2732 which emits the upstream
_posthog/statusevents. Decisions:chat.startStream(over DM-assistant surface withassistant.threads.setStatus) — keeps the @-mention flow which is where PostHog Code users actually live; channel streaming gets us the native plan-block UI without needingassistant:writegranted to existing installs.