Add DBOS projector uplink pipeline + NATS tunnel transport#3854
Open
tlgimenes wants to merge 35 commits into
Open
Add DBOS projector uplink pipeline + NATS tunnel transport#3854tlgimenes wants to merge 35 commits into
tlgimenes wants to merge 35 commits into
Conversation
9600332 to
c05c8cf
Compare
tlgimenes
added a commit
that referenced
this pull request
Jun 16, 2026
…nified DBOS-resumable projector pipeline Squashes the tlgimenes/dbos-primitives-overview branch into one commit atop origin/main for a linear history (PR #3854 squash-merges anyway). - @decocms/tunnel: fetch-over-NATS transport (subject/protocol/stream/nats). - Optimistic link presence: frontend probes /api/links/status; backend trusts + optimistically fetches; legacy heartbeat/claim subsystem removed. - Unified ingest -> JetStream(DECOPILOT_STREAMS) -> projector pipeline: ingestRun publishes raw chunks + a fence-scoped {done}; the durable, DBOS-resumable projector is the SOLE DB writer (parts + title + terminal status), reconstructing each run from file-backed JetStream so a pod crash mid-run recovers. Direct cutover: legacy inline persistence + the v1 message write path + the LINK_DURABLE_PROJECTOR/publishThenConsume flags are deleted; new threads are v2; v1 read-compat is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1d0bec4 to
86bb4ad
Compare
a3a22f3 to
1495c27
Compare
eadc603 to
29bd268
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is this contribution about?
Two related changes to the Studio ↔ desktop-link path.
1.
@decocms/tunnel— fetch-like HTTP over core NATS with streaming request/response bodies. Studio and the link daemon connect throughPOST /api/links/sessionusing scoped, short-lived NATS credentials; JetStream is kept out of the tunnel transport.2. Optimistic link presence — replaces the old heartbeat / KV-claim presence (which produced "CLI running but UI says offline" false-negatives) with a live probe:
GET /api/links/statusover the tunnel ({ hostname, capabilities, cliVersion }).LINK_CURRENT_GET//api/links/me, ~5s) to drive the desktop indicator, feature-gating, and thesandboxProviderKindit sends.resolveDispatchTargetjust normalizes the kind,POST /messagesno longer 409s on an offline desktop, and a work-publish that can't reach the daemon fails the run (forceFailIfInProgress) instead of hanging.TunnelPresenceSubscriber, thestudio_linksKV claim registry,resolve-default-provider-kind, and thelinks.presence.*publish + its credential grant. A tunnel inter-frame idle timeout replaces the claim-watch as the in-flight abort.Design + plan:
docs/superpowers/specs/2026-06-12-link-presence-tunnel-status-design.md.How to Test
bun run fmt && bun run check && bun run lintbun test apps/mesh/src/links apps/mesh/src/link-daemon apps/mesh/src/tools/links apps/mesh/src/sandbox packages/tunnel/srcapps/mesh/e2e/tests/link-tunnel.spec.ts): indicator flips online/offline via the live probe; an offlineuser-desktopsend is accepted (202) and the run settlesfailed(no 409).Migration Notes
tunnel.nats.publicUrl,tunnel.nats.publicEnabled,tunnel.nats.sessionTtlSeconds, plusNATS_OPERATOR_JWT/NATS_ACCOUNT_JWT/NATS_ACCOUNT_SIGNING_KEYto mint daemon sessions; expose NATS websockets in the NATS subchart.409onPOST /messages.Review Checklist
Summary by cubic
Switches Studio↔desktop to an HTTP‑over‑NATS tunnel and makes link ingest stateless while completing the ingest → file‑backed JetStream → DBOS projector pipeline. Adds run‑status stages, a cross‑pod liveness reaper, and fixes a stale “running” indicator after stream completion.
New Features
@decocms/tunneland@nats-io/jwt.POST /api/links/sessionmints host‑scoped NATS creds or a token and returnsconnection.urls,credentials/token,expiresAt, andtunnelHostname(503 when disabled). Allowed connection types derive from the public URL: WEBSOCKET forwss://(prod), STANDARD+WEBSOCKET fornats://(dev). Session validates the NATS public URL/endpoint.GET /api/links/statusand handles work/control over the tunnel; all long‑poll work/control/proxy routes were removed. The frontend probes/api/links/methrough the MCP path to keep the desktop indicator accurate.ingestRunpublishes raw chunks to file‑backedDECOPILOT_STREAMSwith seq‑keyedNats‑Msg‑Id = ${runId}:${fenceToken}:${seq}, then a fenced{done, finalSeq}. Retention ≥30 min, dedup window ≥2 min. A throttle writes the contiguous ack floor tothreads.run_acked_seq(monotonic CAS) so reconnects resume atacked+1; replay dedup happens in ingest.projectRunWorkflowper(runId, fenceToken)on the fenced done. The workflow reconstructs from JetStream (handles fragments) and is the sole writer of parts/title/terminal status. Includes deterministic leader election, lag/poison metrics, ignore of unfenced legacy tails, and capture of failure text/kind tothreads.failure_reason/failure_kind. Reactor no longer purges on live completion; cleanup moved into the workflow.data-run-statuschunks (e.g. “starting-run”, “preparing-tools”) near dispatch for eligible runs; kept monotonic, published behind gates, and ignored by the projector.in_progressruns and force‑fails them after a timeout so uncapped gates don’t hang on dead desktops.Migration
tunnel.nats.publicUrl,tunnel.nats.publicEnabled, andtunnel.nats.sessionTtlSeconds(populatesNATS_PUBLIC_URL,NATS_TUNNEL_PUBLIC_ENABLED,NATS_TUNNEL_SESSION_TTL_SECONDS). Provide operator auth for daemon sessions:NATS_OPERATOR_JWT,NATS_ACCOUNT_JWT,NATS_ACCOUNT_SIGNING_KEY; expose NATS WebSockets when usingwss://. Mount the NATS cluster creds file and setNATS_CREDSso the app and cluster authenticate with the same creds. Helm exposes these NATS tunnel secrets to the app.threads.failure_reason,threads.failure_kind, andthreads.run_acked_seq.STREAM_OF_RECORD_V2_PERCENTandLINK_PUBLISH_THEN_CONSUME. Behavior change: offline desktops surface via the frontend probe + a fail‑fast run error (no 409 onPOST /messages). New threads write v2 only; v1 remains read‑only.Written for commit dad7c29. Summary will update on new commits.