Skip to content

refactor(automations): run dispatch inline, drop the thread-gate hop#3919

Merged
pedrofrxncx merged 1 commit into
mainfrom
refactor/automations-inline-dispatch
Jun 15, 2026
Merged

refactor(automations): run dispatch inline, drop the thread-gate hop#3919
pedrofrxncx merged 1 commit into
mainfrom
refactor/automations-inline-dispatch

Conversation

@pedrofrxncx

@pedrofrxncx pedrofrxncx commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #3918 (now merged). Addresses the "why two dispatch queues, they feel like the same thing" smell.

Before, an automation fire passed through both partitioned queues:

cron/event → fireAutomationWorkflow      [automations queue: ≤10 per org]
               └─ awaitThreadRun → threadGateWorkflow   [thread-gate queue: 1 per thread]  ← extra hop + child workflow

But each fire creates a fresh thread (taskId from createRunThreadStep), so the thread-gate's per-thread serialization is a no-op for automations — they routed through it only to reuse the dispatch body.

This extracts that body as runDispatchSteps (track-started → dispatch → track-failed) and calls it directly from fireAutomationWorkflow, which already holds the org's queue slot.

Result

The two queues now have crisp, non-overlapping roles:

  • thread-gate — user-message executor, serialized per thread (where ordering actually matters).
  • automations — org-capped automation executor.

No shared hop, one fewer workflow per fire, both guarantees preserved (per-org fairness + per-thread ordering for users).

Why not fully merge to one queue?

Considered it: you'd either lose per-org fairness, or merge interactive user traffic with background cron fan-out into one pool — letting a cron storm delay user chats. Two queues give that isolation; the only real redundancy was the hop, which this removes. (DBOS gives one partition key + one cap per queue, and these are two genuine axes: orgId fan-out vs threadId order.)

Safety

  • Dispatch step keeps its exact config (retriesAllowed: false); recovery semantics are unchanged — it just lives in the fire workflow's journal now instead of a separate child. ⚠️ Reviewers: the one thing to confirm on staging is automation dispatch recovery on pod-death-mid-stream, since the non-retriable step changed host workflow. Config is identical, but this path is sensitive (recent decopilot-recovery work touches it).
  • awaitThreadRun had no remaining callers and is removed; enqueueThreadRun (user POST) unchanged.
  • Both analytics steps already no-op for source: "automation", so the shared body is correct for both callers.

Testing

  • bun run --cwd apps/mesh check — clean.
  • bun test apps/mesh/src/dispatch-queue apps/mesh/src/automations — 31 pass.
  • bun run fmt applied.

Summary by cubic

Runs automation dispatch inline in fireAutomationWorkflow and removes the extra thread-gate queue hop. This drops one child workflow per fire while keeping per-org fairness and per-thread ordering for user messages.

  • Refactors
    • Extracted dispatch executor to runDispatchSteps (track-started → dispatch → track-failed).
    • fireAutomationWorkflow now calls runDispatchSteps directly; automations skip thread-gate.
    • threadGateWorkflow wraps the same body for user messages only.
    • Removed awaitThreadRun; enqueueThreadRun remains.
    • Preserved dispatch step config (retriesAllowed: false) and recovery semantics.

Written for commit c32ab0b. Summary will update on new commits.

Review in cubic

Automation fires went through two partitioned queues: the per-org
`automations` queue (fan-out fairness) AND the `thread-gate` queue
(per-thread serialization). But each fire creates a fresh thread, so the
per-thread gate never did anything for automations — they routed through it
only to reuse the dispatch-execution body, paying a second queue hop and an
extra child workflow per fire.

Extract that body as `runDispatchSteps` (track-started → dispatch →
track-failed) and call it directly from `fireAutomationWorkflow`, which
already holds the org's queue slot. `threadGateWorkflow` now just wraps the
same body behind its per-thread slot for user messages.

Result: the two queues have crisp, non-overlapping roles — `thread-gate` =
user-message executor (serialized per thread); `automations` = org-capped
automation executor. No shared hop, one fewer workflow per fire, both
guarantees preserved (per-org fairness + per-thread ordering for users).

The dispatch step keeps its exact config (`retriesAllowed: false`), so its
recovery semantics are unchanged — it just lives in the fire workflow's
journal now instead of a child. `awaitThreadRun` had no other callers and is
removed.
@pedrofrxncx pedrofrxncx merged commit bf76d7d into main Jun 15, 2026
15 checks passed
@pedrofrxncx pedrofrxncx deleted the refactor/automations-inline-dispatch branch June 15, 2026 12:54
decocms Bot pushed a commit that referenced this pull request Jun 15, 2026
PR: #3919 refactor(automations): run dispatch inline, drop the thread-gate hop
Bump type: patch

- decocms (apps/mesh/package.json): 3.18.11 -> 3.18.12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant