From 61a35545dae3f9420929020e99667d234913f8bb Mon Sep 17 00:00:00 2001 From: Tim Beyer Date: Wed, 13 May 2026 23:07:19 +0200 Subject: [PATCH 1/4] docs(task): plan to make clawctl create truly idempotent MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `clawctl create --config ` against an existing instance was not truly idempotent: `bootstrapOpenclaw` re-ran `openclaw onboard`, rotated the gateway auth token on every invocation, and re-sent the bootstrap prompt. `patchAuthProfiles` also no-op'd when the configured provider differed from what was already on disk, leaving the prior provider's profile bound. Plan two narrow fixes in the existing path — gate first-run-only steps on the `data/config` sentinel and generalise `patchAuthProfiles` to converge on the configured provider — rather than adding a parallel reconfigure command. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../TASK.md | 156 ++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 tasks/2026-05-13_2253_clawctl-create-truly-idempotent/TASK.md diff --git a/tasks/2026-05-13_2253_clawctl-create-truly-idempotent/TASK.md b/tasks/2026-05-13_2253_clawctl-create-truly-idempotent/TASK.md new file mode 100644 index 0000000..85e0a3b --- /dev/null +++ b/tasks/2026-05-13_2253_clawctl-create-truly-idempotent/TASK.md @@ -0,0 +1,156 @@ +# Make `clawctl create` truly idempotent + +## Status: In Progress + +## Scope + +In: + +- Gate first-run-only steps in `bootstrapOpenclaw` on the existing + `data/config` sentinel so re-running `clawctl create --config ` + against an existing instance doesn't re-run `openclaw onboard`, + rotate the gateway auth token, or re-send the bootstrap prompt. +- Generalise `patchAuthProfiles` so a provider change between runs + cleanly evicts the prior `:default` profile and binds the new one. +- Unit tests for the auth-profile swap logic. + +Out: + +- No new CLI command. The fixes live in the existing path so + `clawctl create --config ` becomes the apply-state operation + by virtue of being idempotent. +- No changes to `provisionVM` / `claw provision …` — capabilities are + already idempotent per project convention. +- No changes to `buildOnboardCommand` or the `PROVIDERS` registry. + +## Context + +The project convention is "all provisioning is idempotent" and "the +project directory is the source of truth". The intended workflow when +changing an instance's config is: edit `clawctl.json`, re-run +`clawctl create`, watch state converge. In practice that path was +_almost_ right — capabilities are idempotent and `provisionVM` skips +Lima VM creation if the VM exists — but `bootstrapOpenclaw` conflated +first-run-only steps with apply-state steps: + +- `openclaw onboard` ran unconditionally on every invocation. +- The gateway auth token was generated fresh (`randomBytes(24)`) on + every run and pushed through `openclaw config set gateway.auth.token`, + rotating a token that may be wired into remote tooling. +- The bootstrap prompt re-sent on every reapply. +- `patchAuthProfiles` looked up `:default` and patched + it in place. When the provider type had _changed_ relative to what was + on disk it logged "Profile not found — skipping" and the prior + provider's profile stayed bound. + +Together those mean a config edit + re-run can produce incorrect state +(rotated gateway token, wrong provider still active). The fix is two +narrow patches in the existing path; a parallel "reconfigure" command +would split the source-of-truth model and is rejected. + +Key invariant: re-running `clawctl create` must converge state without +touching anything that isn't a function of the current clawctl.json — +gateway token preserved, capability state re-applied (no-op when +already installed), auth profile for a different provider replaced. + +## Plan + +### Fix 1 — Gate first-run-only steps in `bootstrapOpenclaw` + +Detect "already onboarded" via `${PROJECT_MOUNT_POINT}/data/config` +existing — the same sentinel onboard's own fault-tolerance check +already uses. + +- Sentinel absent → run `openclaw onboard`, generate fresh gateway + token, send bootstrap prompt (when configured). First-run path. +- Sentinel present → skip onboard. Read the existing + `gateway.auth.token` from `data/config` and reuse it. Skip the + bootstrap prompt. Continue with all apply-state steps: + `openclaw models set`, `openclaw config set …`, channels, secret + migration, daemon restart, doctor, bootstrap-phase capability hooks. + +Implementation: read `data/config` once at the top of +`bootstrapOpenclaw`, branch on existence, thread the existing-or-fresh +token through. Token precedence: explicit `config.network.gatewayToken` +override > existing on-disk token > fresh `randomBytes(24)`. + +### Fix 2 — Generalise `patchAuthProfiles` for provider changes + +Refactor into a pure transformation +`applyAuthProfileSwap(authProfiles, newProviderType, apiKeyPath)` that +takes the parsed file and returns the converged one. The VM-IO wrapper +reads, calls the pure function, writes back. + +Pure behaviour: + +1. New profile key: `:default`. +2. New profile: + `{ type: "token", provider: newProviderType, tokenRef: makeSecretRef(apiKeyPath) }`. + If a same-key profile exists, preserve unknown extra fields but + normalise the canonical ones (`type`, `provider`, `tokenRef`). +3. Evict any other-provider `:default` profile whose `provider` field + is set and differs from the new provider. Conservative: leave + profiles whose `provider` field is unset, and non-`:default` keys, + alone (forward-compat with profile shapes we don't recognise). +4. Reset `lastGood = { [newProviderType]: }`. +5. Filter `usageStats` to keys still in `profiles`. + +Re-apply semantics: + +- Same provider, same apiKey path → no-op. +- Different provider → swap cleanly. +- No `apiKey` resolved → IO wrapper returns early (correct for + inline-plaintext or no-key flows). + +### Rejected alternatives + +- A parallel `clawctl reconfigure` command. Would duplicate the + existing path and split the source-of-truth model. +- Re-running `openclaw onboard --force`. Onboard issues the gateway + auth token and configures the daemon; even if it accepted a force + flag it's doing more than what's needed. +- Fix 1 only, without Fix 2. When the provider changed the prior + profile would remain bound and the new provider would have no + credentials. Both fixes are needed. +- Touching `patchMainConfig`. Verified re-runnable as-is — it + overwrites `secrets.providers.infra` with the same value and + replaces channel secret fields with structurally equivalent + SecretRefs. No double-encoding because it sets the value rather than + transforming it. + +## Steps + +- [x] Get timestamp, create task dir, branch. +- [x] Commit task first. +- [x] Refactor `patchAuthProfiles` into pure + IO layers. +- [x] Unit tests for `applyAuthProfileSwap`: fresh-slate, + same-provider re-apply (no-op), provider switch (removes old, + adds new, resets `lastGood`, filters `usageStats`), conservative + handling of unknown-provider/non-`:default` profiles, no input + mutation. +- [x] Gate first-run-only steps in `bootstrapOpenclaw`. Read existing + `data/config` if present; reuse `gateway.auth.token`. +- [x] Run `bun test`, `bun run lint`, `bun run format:check`. +- [ ] End-to-end smoke on an existing instance: re-run + `clawctl create` against a clawctl.json with a different + provider, assert `gateway.auth.token` byte-for-byte preserved + against a `data/config.bak.*` snapshot, assert prior auth profile + cleanly evicted, `openclaw doctor` green. +- [ ] Open PR. + +## Notes + +- The sentinel `${PROJECT_MOUNT_POINT}/data/config` was already used + by onboard's own fault-tolerance check, so we're piggy-backing on an + invariant openclaw itself relies on. +- Gateway token precedence is intentional: explicit + `config.network.gatewayToken` wins so a user who wants to rotate can + set the field; default behaviour preserves. +- `readExistingGatewayToken` swallows JSON parse errors and + missing-field cases and returns `undefined`, falling through to fresh + generation, so a corrupt or partially-written `data/config` doesn't + wedge the reapply path. + +## Outcome + +(Written at resolution.) From 504ca0a92c64f0f41f4a5cd26e4253cc81922400 Mon Sep 17 00:00:00 2001 From: Tim Beyer Date: Wed, 13 May 2026 23:07:25 +0200 Subject: [PATCH 2/4] fix(host-core): handle provider switch in patchAuthProfiles MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Extract the pure transformation as `applyAuthProfileSwap`. On first run it still migrates plaintext `token` → `tokenRef`. When the configured provider differs from what's already on disk it now evicts the prior provider's `:default` profile, adds the new one, resets `lastGood`, and drops orphaned `usageStats` entries. Previously the function looked up `:default` and patched it in place — when the provider type had changed relative to the existing file it would log "Profile not found — skipping" and leave the dead profile active. Conservative: only evicts `:default` profiles whose `provider` field is set and differs from the new provider. Unknown/legacy profile shapes are preserved. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/host-core/src/infra-secrets.test.ts | 286 +++++++++++++++---- packages/host-core/src/infra-secrets.ts | 102 +++++-- 2 files changed, 308 insertions(+), 80 deletions(-) diff --git a/packages/host-core/src/infra-secrets.test.ts b/packages/host-core/src/infra-secrets.test.ts index 5cb32e3..515d83e 100644 --- a/packages/host-core/src/infra-secrets.test.ts +++ b/packages/host-core/src/infra-secrets.test.ts @@ -1,12 +1,15 @@ import { describe, test, expect } from "bun:test"; import { sanitizeKey } from "./secrets-sync.js"; +import { applyAuthProfileSwap } from "./infra-secrets.js"; /** * Unit tests for infra-secrets logic. * - * The patchMainConfig and patchAuthProfiles functions do VM I/O (shellExec) - * so they're tested in VM integration tests. Here we test the pure logic - * that underpins them: key sanitization and SecretRef construction. + * patchMainConfig and patchAuthProfiles do VM I/O (driver.exec) so they're + * exercised in VM integration tests. Here we drive the pure transformations + * underpinning them: key sanitization, SecretRef construction, and the + * auth-profile swap that handles both first-run migration and + * provider-switch reconfiguration. */ describe("SecretRef construction", () => { @@ -19,87 +22,244 @@ describe("SecretRef construction", () => { const id = `/${sanitizeKey(["channels", "telegram", "botToken"])}`; expect(id).toBe("/channels_telegram_bottoken"); }); +}); + +describe("applyAuthProfileSwap", () => { + const apiKeyPath = ["provider", "apiKey"]; + const expectedRef = { + source: "file", + provider: "infra", + id: "/provider_apikey", + }; - test("produces correct ref shape", () => { - const path = ["provider", "apiKey"]; - const ref = { - source: "file" as const, - provider: "infra" as const, - id: `/${sanitizeKey(path)}`, + test("first-run migration: plaintext token → tokenRef", () => { + const input = { + version: 1, + profiles: { + "zai:default": { + type: "token", + provider: "zai", + token: "sk-plaintext", + }, + }, + lastGood: { zai: "zai:default" }, + usageStats: { "zai:default": { errorCount: 0 } }, }; - expect(ref).toEqual({ - source: "file", - provider: "infra", - id: "/provider_apikey", + + const out = applyAuthProfileSwap(input, "zai", apiKeyPath); + + expect(out.profiles).toEqual({ + "zai:default": { + type: "token", + provider: "zai", + tokenRef: expectedRef, + }, }); + expect(out.lastGood).toEqual({ zai: "zai:default" }); + expect(out.usageStats).toEqual({ "zai:default": { errorCount: 0 } }); }); -}); -describe("config patching logic", () => { - test("file provider config has correct shape", () => { - const provider = { - source: "file", - path: "~/.openclaw/secrets/infrastructure.json", - mode: "json", + test("same-provider re-apply is a no-op on tokenRef structure", () => { + const input = { + version: 1, + profiles: { + "zai:default": { + type: "token", + provider: "zai", + tokenRef: expectedRef, + }, + }, + lastGood: { zai: "zai:default" }, + usageStats: { "zai:default": { errorCount: 0, lastUsed: 12345 } }, }; - expect(provider.source).toBe("file"); - expect(provider.mode).toBe("json"); - expect(provider.path).toContain("infrastructure.json"); + + const out = applyAuthProfileSwap(input, "zai", apiKeyPath); + + expect(out).toEqual(input); }); - test("auth profile patch replaces token with tokenRef", () => { - // Simulate the transformation patchAuthProfiles performs - const profile: Record = { - type: "token", - provider: "zai", - token: "sk-plaintext", + test("provider switch: anthropic → zai removes old, adds new", () => { + const input = { + version: 1, + profiles: { + "anthropic:default": { + type: "token", + provider: "anthropic", + token: "sk-ant-oat01-dead", + }, + }, + lastGood: { anthropic: "anthropic:default" }, + usageStats: { + "anthropic:default": { errorCount: 0, lastUsed: 12345 }, + }, }; - // The operation: delete token, add tokenRef - delete profile.token; - profile.tokenRef = { - source: "file", - provider: "infra", - id: "/provider_apikey", + const out = applyAuthProfileSwap(input, "zai", apiKeyPath); + + expect(out.profiles).toEqual({ + "zai:default": { + type: "token", + provider: "zai", + tokenRef: expectedRef, + }, + }); + expect(out.lastGood).toEqual({ zai: "zai:default" }); + expect(out.usageStats).toEqual({}); + }); + + test("provider switch preserves extra fields on existing same-key profile", () => { + const input = { + version: 1, + profiles: { + "zai:default": { + type: "token", + provider: "zai", + token: "stale-plaintext", + customMetadata: { addedByAgent: true }, + }, + "anthropic:default": { + type: "token", + provider: "anthropic", + token: "sk-ant-oat01-dead", + }, + }, + lastGood: { anthropic: "anthropic:default" }, + usageStats: { + "anthropic:default": { errorCount: 0 }, + "zai:default": { errorCount: 2, lastUsed: 99 }, + }, }; - expect(profile.token).toBeUndefined(); - expect(profile.tokenRef).toEqual({ - source: "file", - provider: "infra", - id: "/provider_apikey", + const out = applyAuthProfileSwap(input, "zai", apiKeyPath); + + expect(out.profiles).toEqual({ + "zai:default": { + type: "token", + provider: "zai", + tokenRef: expectedRef, + customMetadata: { addedByAgent: true }, + }, + }); + expect(out.lastGood).toEqual({ zai: "zai:default" }); + expect(out.usageStats).toEqual({ + "zai:default": { errorCount: 2, lastUsed: 99 }, }); - // Original fields preserved - expect(profile.type).toBe("token"); - expect(profile.provider).toBe("zai"); }); - test("telegram botToken gets replaced with SecretRef", () => { - // Simulate the transformation patchMainConfig performs - const config: Record = { - channels: { - telegram: { - enabled: true, - botToken: "123:ABC", - dmPolicy: "allowlist", + test("fresh-slate: empty profiles object adds the new one", () => { + const input = { + version: 1, + profiles: {}, + }; + + const out = applyAuthProfileSwap(input, "zai", apiKeyPath); + + expect(out.profiles).toEqual({ + "zai:default": { + type: "token", + provider: "zai", + tokenRef: expectedRef, + }, + }); + expect(out.lastGood).toEqual({ zai: "zai:default" }); + }); + + test("missing profiles object: creates one and adds the new profile", () => { + const input = { version: 1 }; + + const out = applyAuthProfileSwap(input, "zai", apiKeyPath); + + expect(out.profiles).toEqual({ + "zai:default": { + type: "token", + provider: "zai", + tokenRef: expectedRef, + }, + }); + }); + + test("conservative: keeps profile with unset provider field", () => { + // Forward-compat: a future profile shape we don't recognise must not be + // evicted just because we're swapping providers. + const input = { + version: 1, + profiles: { + "anthropic:default": { + type: "token", + provider: "anthropic", + token: "sk-ant-dead", + }, + "legacy:default": { + type: "exotic", + // no `provider` field }, }, }; - const telegram = (config.channels as Record>).telegram; - telegram.botToken = { - source: "file", - provider: "infra", - id: "/telegram_bottoken", + const out = applyAuthProfileSwap(input, "zai", apiKeyPath); + const profiles = out.profiles as Record; + + expect(profiles["zai:default"]).toBeDefined(); + expect(profiles["anthropic:default"]).toBeUndefined(); + expect(profiles["legacy:default"]).toEqual({ type: "exotic" }); + }); + + test("conservative: keeps non-:default profile for a different provider", () => { + const input = { + version: 1, + profiles: { + "anthropic:default": { + type: "token", + provider: "anthropic", + token: "sk-ant-dead", + }, + "anthropic:secondary": { + type: "token", + provider: "anthropic", + token: "secondary-key", + }, + }, }; - expect(telegram.botToken).toEqual({ + const out = applyAuthProfileSwap(input, "zai", apiKeyPath); + const profiles = out.profiles as Record; + + expect(profiles["zai:default"]).toBeDefined(); + expect(profiles["anthropic:default"]).toBeUndefined(); + // Non-:default profile preserved — only :default is treated as the + // active-binding slot we're managing. + expect(profiles["anthropic:secondary"]).toBeDefined(); + }); + + test("does not mutate input", () => { + const input = { + version: 1, + profiles: { + "anthropic:default": { + type: "token", + provider: "anthropic", + token: "sk-ant", + }, + }, + lastGood: { anthropic: "anthropic:default" }, + }; + const snapshot = JSON.parse(JSON.stringify(input)); + + applyAuthProfileSwap(input, "zai", apiKeyPath); + + expect(input).toEqual(snapshot); + }); +}); + +describe("patchMainConfig logic (file provider shape)", () => { + test("file provider config has correct shape", () => { + const provider = { source: "file", - provider: "infra", - id: "/telegram_bottoken", - }); - // Other fields preserved - expect(telegram.enabled).toBe(true); - expect(telegram.dmPolicy).toBe("allowlist"); + path: "~/.openclaw/secrets/infrastructure.json", + mode: "json", + }; + expect(provider.source).toBe("file"); + expect(provider.mode).toBe("json"); + expect(provider.path).toContain("infrastructure.json"); }); }); diff --git a/packages/host-core/src/infra-secrets.ts b/packages/host-core/src/infra-secrets.ts index 79ef257..4f905c3 100644 --- a/packages/host-core/src/infra-secrets.ts +++ b/packages/host-core/src/infra-secrets.ts @@ -3,7 +3,7 @@ * * Two operations on known JSON paths: * 1. Patch main config — add file provider, replace channel secrets with SecretRefs - * 2. Patch auth-profiles.json — replace token with tokenRef + * 2. Patch auth-profiles.json — converge active provider profile to the configured one */ import type { VMDriver, OnLine } from "./drivers/types.js"; import type { ResolvedSecretRef } from "./secrets.js"; @@ -102,10 +102,89 @@ export async function patchMainConfig( } /** - * Patch auth-profiles.json to use tokenRef instead of plaintext token. + * Pure transformation: converge an auth-profiles.json structure on the + * configured provider. * - * Finds the profile matching `:default` and replaces - * `token` with `tokenRef` pointing to the file provider. + * Behaviour: + * - Adds (or refreshes) `:default` with the file-provider + * tokenRef. Preserves any extra fields on an existing same-key profile. + * - Removes other-provider `:default` profiles whose `provider` field is set + * and differs from `newProviderType` (so an anthropic→zai swap cleanly + * evicts the dead anthropic profile). Conservative: leaves profiles whose + * `provider` field is unset or whose key doesn't end in `:default`. + * - Resets `lastGood` to point at the new provider only. + * - Filters `usageStats` to keys still in `profiles`. + * + * Pure / no I/O — drives `patchAuthProfiles` and is unit-tested in isolation. + */ +export function applyAuthProfileSwap( + authProfiles: Record, + newProviderType: string, + apiKeyPath: string[], +): Record { + const result = structuredClone(authProfiles); + const profiles = (result.profiles ?? (result.profiles = {})) as Record< + string, + Record + >; + const newProfileKey = `${newProviderType}:default`; + const newTokenRef = makeSecretRef(apiKeyPath); + + // Add or refresh the new provider's profile. + const existing = profiles[newProfileKey]; + if (existing && typeof existing === "object") { + // Same-key profile already there — preserve extra fields, but ensure + // type/provider/tokenRef are canonical and any plaintext token is removed. + const merged: Record = { ...existing }; + delete merged.token; + merged.type = "token"; + merged.provider = newProviderType; + merged.tokenRef = newTokenRef; + profiles[newProfileKey] = merged; + } else { + profiles[newProfileKey] = { + type: "token", + provider: newProviderType, + tokenRef: newTokenRef, + }; + } + + // Evict conflicting other-provider :default profiles. + for (const [key, profile] of Object.entries(profiles)) { + if (key === newProfileKey) continue; + if (!key.endsWith(":default")) continue; + if (!profile || typeof profile !== "object") continue; + const provider = (profile as Record).provider; + if (typeof provider !== "string") continue; + if (provider === newProviderType) continue; + delete profiles[key]; + } + + // Reset lastGood to the converged profile only. + result.lastGood = { [newProviderType]: newProfileKey }; + + // Drop usageStats for profiles that no longer exist. + if (result.usageStats && typeof result.usageStats === "object") { + const filtered: Record = {}; + for (const [key, value] of Object.entries(result.usageStats as Record)) { + if (key in profiles) filtered[key] = value; + } + result.usageStats = filtered; + } + + return result; +} + +/** + * Patch auth-profiles.json to bind the configured provider's credentials. + * + * On first run (after `openclaw onboard`) this replaces the plaintext `token` + * field with a `tokenRef` pointing at the file provider. On re-apply (e.g. + * switching from one provider to another via a clawctl.json edit) it evicts + * the prior provider's profile and adds the new one. + * + * Skips entirely when no `provider.apiKey` is in `resolvedMap` (e.g. inline + * plaintext or no-key configurations). */ export async function patchAuthProfiles( driver: VMDriver, @@ -131,19 +210,8 @@ export async function patchAuthProfiles( } const authProfiles = await readVMJson(driver, vmName, AUTH_PROFILES_PATH); - const profiles = (authProfiles.profiles ?? {}) as Record>; - const profileKey = `${providerType}:default`; - const profile = profiles[profileKey]; - - if (!profile) { - onLine?.(`Profile "${profileKey}" not found — skipping auth-profiles patch`); - return; - } - - // Replace token with tokenRef - delete profile.token; - profile.tokenRef = makeSecretRef(apiKeyRef.path); + const updated = applyAuthProfileSwap(authProfiles, providerType, apiKeyRef.path); - await writeVMJson(driver, vmName, AUTH_PROFILES_PATH, authProfiles); + await writeVMJson(driver, vmName, AUTH_PROFILES_PATH, updated); onLine?.("auth-profiles.json patched"); } From 6ed53cfae70f2ec4df973b30b9072eaea175906c Mon Sep 17 00:00:00 2001 From: Tim Beyer Date: Wed, 13 May 2026 23:07:33 +0200 Subject: [PATCH 3/4] fix(host-core): gate first-run-only steps in bootstrapOpenclaw MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `clawctl create` re-runs `bootstrapOpenclaw` on an existing instance (provisionVM detects the VM exists and skips Lima creation but still calls bootstrap). That meant: - `openclaw onboard` ran every time — onboard issues the gateway auth token and configures the daemon, so re-running it would rotate credentials that may be wired into remote tooling. - The gateway token was regenerated each run via `randomBytes(24)` and pushed through `openclaw config set gateway.auth.token`, rotating it on every reapply. - The bootstrap prompt re-sent on every reapply. Use `${PROJECT_MOUNT_POINT}/data/config` (the file onboard creates) as the "already onboarded" sentinel. On reapply, skip onboard, read the existing `gateway.auth.token` from data/config, and skip the bootstrap prompt. Continue with all the apply-state steps (`openclaw config set`, `models set`, channels, secret migration, daemon restart, doctor, bootstrap-phase capability hooks) so a clawctl.json edit converges state. Together with the patchAuthProfiles provider-switch fix this makes `clawctl create` idempotent in the strong sense. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/host-core/src/bootstrap.ts | 110 +++++++++++++++++++++------- 1 file changed, 83 insertions(+), 27 deletions(-) diff --git a/packages/host-core/src/bootstrap.ts b/packages/host-core/src/bootstrap.ts index 3e012a9..02fbe38 100644 --- a/packages/host-core/src/bootstrap.ts +++ b/packages/host-core/src/bootstrap.ts @@ -2,7 +2,7 @@ import { randomBytes } from "crypto"; import { mkdir, rm } from "fs/promises"; import { join } from "path"; import type { VMDriver, OnLine } from "./drivers/types.js"; -import { GATEWAY_PORT, CLAW_BIN_PATH, CHANNEL_REGISTRY } from "@clawctl/types"; +import { GATEWAY_PORT, CLAW_BIN_PATH, CHANNEL_REGISTRY, PROJECT_MOUNT_POINT } from "@clawctl/types"; import type { InstanceConfig, ChannelDef } from "@clawctl/types"; import { buildOnboardCommand } from "./providers.js"; import { patchMainConfig, patchAuthProfiles } from "./infra-secrets.js"; @@ -11,6 +11,43 @@ import { redactSecrets } from "./redact.js"; import { getTailscaleHostname } from "./tailscale.js"; import type { ResolvedSecretRef } from "./secrets.js"; +const OPENCLAW_CONFIG_PATH = `${PROJECT_MOUNT_POINT}/data/config`; + +/** + * True if openclaw has already been onboarded on this instance. data/config + * is created by `openclaw onboard`, so its presence on the mount is the + * sentinel: it means the gateway auth token has already been issued and the + * daemon is configured. Re-running onboard would rotate that token and + * re-issue credentials, so subsequent `clawctl create` invocations skip + * onboard and only apply the post-onboard config delta. + */ +async function isAlreadyOnboarded(driver: VMDriver, vmName: string): Promise { + const r = await driver.exec(vmName, `test -f ${OPENCLAW_CONFIG_PATH}`); + return r.exitCode === 0; +} + +/** + * Read the existing gateway.auth.token from data/config. Returns undefined + * if the file isn't readable as JSON or the field isn't a string — callers + * fall back to generating a fresh token in that case. + */ +async function readExistingGatewayToken( + driver: VMDriver, + vmName: string, +): Promise { + const result = await driver.exec(vmName, `cat ${OPENCLAW_CONFIG_PATH}`); + if (result.exitCode !== 0) return undefined; + try { + const parsed = JSON.parse(result.stdout) as Record; + const gateway = parsed.gateway as Record | undefined; + const auth = gateway?.auth as Record | undefined; + const token = auth?.token; + return typeof token === "string" && token.length > 0 ? token : undefined; + } catch { + return undefined; + } +} + export interface BootstrapResult { gatewayToken: string; dashboardUrl: string; @@ -40,34 +77,51 @@ export async function bootstrapOpenclaw( const workspaceDir = join(config.project, "data", "workspace"); await mkdir(workspaceDir, { recursive: true }); - // b) Run openclaw onboard --non-interactive (always plaintext — we migrate to - // file provider SecretRefs post-onboard) - const onboardCmd = buildOnboardCommand(provider, GATEWAY_PORT); - - onLine?.(`Running openclaw onboard (${provider.type})...`); - const onboardResult = await driver.exec(vmName, onboardCmd, onLine); - if (onboardResult.exitCode !== 0) { - // Onboard may exit non-zero due to gateway startup timing (websocket close - // before the service is fully ready). Check if config was actually written — - // if so, onboard did its job and we can continue. The daemon restart (step g) - // and openclaw doctor (step i) will verify the gateway later. - const configCheck = await driver.exec(vmName, "test -f /mnt/project/data/config"); - if (configCheck.exitCode !== 0) { - throw new Error( - `openclaw onboard failed (exit ${onboardResult.exitCode}): ${onboardResult.stderr}`, - ); + // Detect prior onboard. data/config is openclaw's own state file; if it + // exists the gateway auth token has already been issued and the daemon + // configured. Re-running onboard would rotate the token, so on re-apply we + // skip onboard and just thread the existing token through the apply-state + // steps below. This makes `clawctl create` idempotent in the strong sense: + // first run bootstraps, subsequent runs apply the clawctl.json diff. + const alreadyOnboarded = await isAlreadyOnboarded(driver, vmName); + + if (!alreadyOnboarded) { + // b) Run openclaw onboard --non-interactive (always plaintext — we migrate to + // file provider SecretRefs post-onboard) + const onboardCmd = buildOnboardCommand(provider, GATEWAY_PORT); + + onLine?.(`Running openclaw onboard (${provider.type})...`); + const onboardResult = await driver.exec(vmName, onboardCmd, onLine); + if (onboardResult.exitCode !== 0) { + // Onboard may exit non-zero due to gateway startup timing (websocket close + // before the service is fully ready). Check if config was actually written — + // if so, onboard did its job and we can continue. The daemon restart (step g) + // and openclaw doctor (step i) will verify the gateway later. + const configCheck = await driver.exec(vmName, `test -f ${OPENCLAW_CONFIG_PATH}`); + if (configCheck.exitCode !== 0) { + throw new Error( + `openclaw onboard failed (exit ${onboardResult.exitCode}): ${onboardResult.stderr}`, + ); + } + onLine?.("Onboard exited with warnings but config was written — continuing"); } - onLine?.("Onboard exited with warnings but config was written — continuing"); - } - // OpenClaw's onboard creates a nested .git in the workspace — remove it. - // The project repo tracks data/workspace/ directly; no nested repos. - const wsGit = join(workspaceDir, ".git"); - await rm(wsGit, { recursive: true, force: true }); + // OpenClaw's onboard creates a nested .git in the workspace — remove it. + // The project repo tracks data/workspace/ directly; no nested repos. + const wsGit = join(workspaceDir, ".git"); + await rm(wsGit, { recursive: true, force: true }); + } else { + onLine?.(`Skipping onboard — instance already initialized (provider: ${provider.type})`); + } // c) Post-onboard config (including gateway token — must be before daemon - // restart so the daemon picks it up) - const gatewayToken = config.network?.gatewayToken ?? randomBytes(24).toString("hex"); + // restart so the daemon picks it up). Precedence for the token: + // explicit config override > existing token on disk > fresh random. + const existingToken = alreadyOnboarded + ? await readExistingGatewayToken(driver, vmName) + : undefined; + const gatewayToken = + config.network?.gatewayToken ?? existingToken ?? randomBytes(24).toString("hex"); const secrets = [gatewayToken, ...collectChannelSecrets(config)].filter(Boolean) as string[]; const safeLog = (msg: string) => onLine?.(redactSecrets(msg, secrets)); @@ -184,10 +238,12 @@ export async function bootstrapOpenclaw( onLine?.("Warning: openclaw doctor reported issues"); } - // j) Send bootstrap prompt to agent (if configured) + // j) Send bootstrap prompt to agent (first run only, if configured). + // The bootstrap prompt seeds the agent's initial state; re-sending it on + // every `clawctl create` would re-run the seeding work each time. // Uses `openclaw agent --message` inside the VM — simpler and more reliable // than hitting the gateway HTTP API from the host. - if (config.bootstrap) { + if (!alreadyOnboarded && config.bootstrap) { const prompt = typeof config.bootstrap === "string" ? config.bootstrap From 73adb06a35b55b2638dea295e0d9b6b5dc86d6ae Mon Sep 17 00:00:00 2001 From: Tim Beyer Date: Wed, 13 May 2026 23:41:43 +0200 Subject: [PATCH 4/4] docs(infra-secrets): explain why applyAuthProfileSwap does surgery Direct editing of auth-profiles.json is the standing exception to bootstrap.ts's "delegate to openclaw" pattern. Document the rationale on the function itself, with links to the upstream issues that, once resolved, would let us replace the surgery with two delegate calls (openclaw/openclaw#16134, openclaw/openclaw#10244). Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/host-core/src/infra-secrets.ts | 29 +++++++++++++++++-- .../TASK.md | 21 +++++++++++--- 2 files changed, 43 insertions(+), 7 deletions(-) diff --git a/packages/host-core/src/infra-secrets.ts b/packages/host-core/src/infra-secrets.ts index 4f905c3..39962cf 100644 --- a/packages/host-core/src/infra-secrets.ts +++ b/packages/host-core/src/infra-secrets.ts @@ -109,12 +109,35 @@ export async function patchMainConfig( * - Adds (or refreshes) `:default` with the file-provider * tokenRef. Preserves any extra fields on an existing same-key profile. * - Removes other-provider `:default` profiles whose `provider` field is set - * and differs from `newProviderType` (so an anthropic→zai swap cleanly - * evicts the dead anthropic profile). Conservative: leaves profiles whose - * `provider` field is unset or whose key doesn't end in `:default`. + * and differs from `newProviderType`, so a provider swap via a clawctl.json + * edit cleanly evicts the prior provider's profile. Conservative: leaves + * profiles whose `provider` field is unset or whose key doesn't end in + * `:default` (forward-compat with profile shapes we don't recognise). * - Resets `lastGood` to point at the new provider only. * - Filters `usageStats` to keys still in `profiles`. * + * Why surgery rather than delegating to openclaw's CLI: + * + * The rest of bootstrap.ts delegates state mutations to `openclaw config set`, + * `openclaw models set`, `openclaw onboard`, etc. This function is the + * standing exception because openclaw's current CLI surface doesn't expose + * the operations we need: + * + * 1. `openclaw onboard` re-runs skip Model/Auth setup entirely (upstream + * openclaw/openclaw#16134), so we can't delegate provider swap to it. + * 2. `openclaw models auth paste-token` exists but only accepts plaintext — + * there is no `--token-ref` flag for the file-provider SecretRef shape we + * use, so delegating would write plaintext to auth-profiles.json that + * we'd then have to surgically migrate to a tokenRef anyway. + * 3. There is no `openclaw models auth remove` command yet — eviction of a + * prior provider's `:default` profile is tracked upstream in + * openclaw/openclaw#10244 and is currently only doable via manual edits. + * + * So this function does add + remove + tokenRef-shape in one atomic + * read-modify-write. When openclaw ships either (a) a `models auth remove` + * command, or (b) a `--token-ref` flag on `paste-token`, this can be + * replaced with two delegate calls and the surgery retired. + * * Pure / no I/O — drives `patchAuthProfiles` and is unit-tested in isolation. */ export function applyAuthProfileSwap( diff --git a/tasks/2026-05-13_2253_clawctl-create-truly-idempotent/TASK.md b/tasks/2026-05-13_2253_clawctl-create-truly-idempotent/TASK.md index 85e0a3b..04409fa 100644 --- a/tasks/2026-05-13_2253_clawctl-create-truly-idempotent/TASK.md +++ b/tasks/2026-05-13_2253_clawctl-create-truly-idempotent/TASK.md @@ -131,12 +131,15 @@ Re-apply semantics: - [x] Gate first-run-only steps in `bootstrapOpenclaw`. Read existing `data/config` if present; reuse `gateway.auth.token`. - [x] Run `bun test`, `bun run lint`, `bun run format:check`. -- [ ] End-to-end smoke on an existing instance: re-run +- [x] End-to-end smoke on an existing instance: re-run `clawctl create` against a clawctl.json with a different provider, assert `gateway.auth.token` byte-for-byte preserved - against a `data/config.bak.*` snapshot, assert prior auth profile - cleanly evicted, `openclaw doctor` green. -- [ ] Open PR. + against a `data/config.bak.*` snapshot, prior auth profile + cleanly evicted, agent answers via new provider. +- [x] Open PR. +- [x] Document why `applyAuthProfileSwap` does surgery rather than + delegate, including links to upstream issues that would let us + retire the surgery. ## Notes @@ -150,6 +153,16 @@ Re-apply semantics: missing-field cases and returns `undefined`, falling through to fresh generation, so a corrupt or partially-written `data/config` doesn't wedge the reapply path. +- The choice to do surgery in `applyAuthProfileSwap` rather than + delegate to openclaw's CLI was validated against the upstream surface: + `openclaw onboard` re-runs skip Model/Auth (upstream + openclaw/openclaw#16134); `openclaw models auth paste-token` is + plaintext-only with no `--token-ref` flag; there is no + `openclaw models auth remove` (upstream openclaw/openclaw#10244). + Delegating today would mean writing plaintext to disk and _then_ doing + surgery, which is worse than the single atomic read-modify-write here. + The docblock on `applyAuthProfileSwap` records the rationale and the + upstream issues that, once resolved, would let us retire the surgery. ## Outcome