Skip to content

Add DBOS projector uplink pipeline + NATS tunnel transport#3854

Open
tlgimenes wants to merge 35 commits into
mainfrom
tlgimenes/dbos-primitives-overview
Open

Add DBOS projector uplink pipeline + NATS tunnel transport#3854
tlgimenes wants to merge 35 commits into
mainfrom
tlgimenes/dbos-primitives-overview

Conversation

@tlgimenes

@tlgimenes tlgimenes commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What is this contribution about?

Two related changes to the Studio ↔ desktop-link path.

1. @decocms/tunnel — fetch-like HTTP over core NATS with streaming request/response bodies. Studio and the link daemon connect through POST /api/links/session using scoped, short-lived NATS credentials; JetStream is kept out of the tunnel transport.

2. Optimistic link presence — replaces the old heartbeat / KV-claim presence (which produced "CLI running but UI says offline" false-negatives) with a live probe:

  • The daemon serves GET /api/links/status over the tunnel ({ hostname, capabilities, cliVersion }).
  • The frontend polls it (via LINK_CURRENT_GET / /api/links/me, ~5s) to drive the desktop indicator, feature-gating, and the sandboxProviderKind it sends.
  • The backend is optimistic — no liveness gate on dispatch. resolveDispatchTarget just normalizes the kind, POST /messages no longer 409s on an offline desktop, and a work-publish that can't reach the daemon fails the run (forceFailIfInProgress) instead of hanging.
  • Deleted: the presence heartbeat loop, TunnelPresenceSubscriber, the studio_links KV claim registry, resolve-default-provider-kind, and the links.presence.* publish + its credential grant. A tunnel inter-frame idle timeout replaces the claim-watch as the in-flight abort.

Design + plan: docs/superpowers/specs/2026-06-12-link-presence-tunnel-status-design.md.

How to Test

  1. bun run fmt && bun run check && bun run lint
  2. bun test apps/mesh/src/links apps/mesh/src/link-daemon apps/mesh/src/tools/links apps/mesh/src/sandbox packages/tunnel/src
  3. E2E (apps/mesh/e2e/tests/link-tunnel.spec.ts): indicator flips online/offline via the live probe; an offline user-desktop send is accepted (202) and the run settles failed (no 409).

Migration Notes

  • Public tunnel deploys: set tunnel.nats.publicUrl, tunnel.nats.publicEnabled, tunnel.nats.sessionTtlSeconds, plus NATS_OPERATOR_JWT / NATS_ACCOUNT_JWT / NATS_ACCOUNT_SIGNING_KEY to mint daemon sessions; expose NATS websockets in the NATS subchart.
  • Behavior change: an offline desktop is now surfaced via the frontend probe (compose disabled) + an optimistic fail-fast run error — not a 409 on POST /messages.

Review Checklist

  • PR title is clear and descriptive
  • Changes are tested and working
  • Documentation is updated (if needed)
  • No breaking changes

Summary by cubic

Switches Studio↔desktop to an HTTP‑over‑NATS tunnel and makes link ingest stateless while completing the ingest → file‑backed JetStream → DBOS projector pipeline. Adds run‑status stages, a cross‑pod liveness reaper, and fixes a stale “running” indicator after stream completion.

  • New Features

    • Tunnel session: adds @decocms/tunnel and @nats-io/jwt. POST /api/links/session mints host‑scoped NATS creds or a token and returns connection.urls, credentials/token, expiresAt, and tunnelHostname (503 when disabled). Allowed connection types derive from the public URL: WEBSOCKET for wss:// (prod), STANDARD+WEBSOCKET for nats:// (dev). Session validates the NATS public URL/endpoint.
    • Tunnel‑only link: the daemon serves GET /api/links/status and handles work/control over the tunnel; all long‑poll work/control/proxy routes were removed. The frontend probes /api/links/me through the MCP path to keep the desktop indicator accurate.
    • Stateless ingest: each relay POST is self‑contained. ingestRun publishes raw chunks to file‑backed DECOPILOT_STREAMS with seq‑keyed Nats‑Msg‑Id = ${runId}:${fenceToken}:${seq}, then a fenced {done, finalSeq}. Retention ≥30 min, dedup window ≥2 min. A throttle writes the contiguous ack floor to threads.run_acked_seq (monotonic CAS) so reconnects resume at acked+1; replay dedup happens in ingest.
    • DBOS projector uplink: the consumer enqueues a projectRunWorkflow per (runId, fenceToken) on the fenced done. The workflow reconstructs from JetStream (handles fragments) and is the sole writer of parts/title/terminal status. Includes deterministic leader election, lag/poison metrics, ignore of unfenced legacy tails, and capture of failure text/kind to threads.failure_reason/failure_kind. Reactor no longer purges on live completion; cleanup moved into the workflow.
    • Run‑status stages: emit data-run-status chunks (e.g. “starting-run”, “preparing-tools”) near dispatch for eligible runs; kept monotonic, published behind gates, and ignored by the projector.
    • Cross‑pod liveness reaper: a DBOS workflow scans for stale in_progress runs and force‑fails them after a timeout so uncapped gates don’t hang on dead desktops.
    • UI: clearer desktop presence/agent copy, the mode picker shows runtime location, the mode tooltip includes the runtime, and the running state clears on stream finish.
    • Internal: renamed the tunnel NATS subject prefix (no external behavior change).
  • Migration

    • Helm: set tunnel.nats.publicUrl, tunnel.nats.publicEnabled, and tunnel.nats.sessionTtlSeconds (populates NATS_PUBLIC_URL, NATS_TUNNEL_PUBLIC_ENABLED, NATS_TUNNEL_SESSION_TTL_SECONDS). Provide operator auth for daemon sessions: NATS_OPERATOR_JWT, NATS_ACCOUNT_JWT, NATS_ACCOUNT_SIGNING_KEY; expose NATS WebSockets when using wss://. Mount the NATS cluster creds file and set NATS_CREDS so the app and cluster authenticate with the same creds. Helm exposes these NATS tunnel secrets to the app.
    • Run DB migrations adding threads.failure_reason, threads.failure_kind, and threads.run_acked_seq.
    • Remove deprecated gates: STREAM_OF_RECORD_V2_PERCENT and LINK_PUBLISH_THEN_CONSUME. Behavior change: offline desktops surface via the frontend probe + a fail‑fast run error (no 409 on POST /messages). New threads write v2 only; v1 remains read‑only.

Written for commit dad7c29. Summary will update on new commits.

Review in cubic

@tlgimenes tlgimenes force-pushed the tlgimenes/dbos-primitives-overview branch 3 times, most recently from 9600332 to c05c8cf Compare June 12, 2026 19:47
@tlgimenes tlgimenes changed the title Add core NATS tunnel transport Add core NATS tunnel transport + optimistic link presence Jun 13, 2026
tlgimenes added a commit that referenced this pull request Jun 16, 2026
…nified DBOS-resumable projector pipeline

Squashes the tlgimenes/dbos-primitives-overview branch into one commit atop
origin/main for a linear history (PR #3854 squash-merges anyway).

- @decocms/tunnel: fetch-over-NATS transport (subject/protocol/stream/nats).
- Optimistic link presence: frontend probes /api/links/status; backend
  trusts + optimistically fetches; legacy heartbeat/claim subsystem removed.
- Unified ingest -> JetStream(DECOPILOT_STREAMS) -> projector pipeline:
  ingestRun publishes raw chunks + a fence-scoped {done}; the durable,
  DBOS-resumable projector is the SOLE DB writer (parts + title + terminal
  status), reconstructing each run from file-backed JetStream so a pod crash
  mid-run recovers. Direct cutover: legacy inline persistence + the v1 message
  write path + the LINK_DURABLE_PROJECTOR/publishThenConsume flags are deleted;
  new threads are v2; v1 read-compat is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tlgimenes tlgimenes force-pushed the tlgimenes/dbos-primitives-overview branch 6 times, most recently from 1d0bec4 to 86bb4ad Compare June 17, 2026 17:53
@tlgimenes tlgimenes changed the title Add core NATS tunnel transport + optimistic link presence Add DBOS projector uplink pipeline + NATS tunnel transport Jun 17, 2026
@tlgimenes tlgimenes force-pushed the tlgimenes/dbos-primitives-overview branch from a3a22f3 to 1495c27 Compare June 18, 2026 15:32
@tlgimenes tlgimenes force-pushed the tlgimenes/dbos-primitives-overview branch from eadc603 to 29bd268 Compare June 18, 2026 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant