- Fix: add --skip-git-repo-check to all codex exec invocations (#319)
- Patch bundle: perplexity stdin + nested-JSON fix (#307/#310), v9.29 migration advisory + write-intent guardrail (#312), hook hardening eliminating silent failures (#313/#314), model-config banner fix (#301/#302), cache byte-format env compat.
- Role default refresh based on April 2026 benchmarks:
architect,strategist, and newsecurity-reviewerrole now default to Claude Opus 4.7 (SWE-bench Pro 64.3 vs 57.7, MCP-Atlas tool use +9.2, LMArena #1).code-reviewerandimplementerstay on GPT-5.4 (Terminal-Bench 75.1, edge-case review).revieweris preserved as an alias forcode-reviewer. - New opt-in
implementer-heavyrole for greenfield / large refactors / UI-heavy builds — routes to Claude Opus 4.7. Not auto-selected; callers must request it explicitly. - New
plugin/docs/GPT-5.4-PROMPTING.md— condensed OpenAI prompt guidance (reasoning effort tiers, output contracts, tool persistence,phasefield,gpt-5.4-minipatterns). Referenced from Codex dispatchers and code-reviewer persona. - Migration prompt in
/octo:setupfires once for users upgrading from ≤9.28: explains the routing change, surfaces the Opus 4.7 cost impact (~2x GPT-5.4), offersOCTOPUS_LEGACY_ROLES=1opt-out to restore v9.28 mapping.
Set OCTOPUS_LEGACY_ROLES=1 to restore the v9.28 role mapping verbatim.
- QA hardening, perplexity stdin fix (#305), review timeout scaling (#303), macOS compat, dead code removal
- fix(probe): port awk-header-guard from
spawn_agenttoprobe_single_agent— codex output was silently empty in/octo:discoverand all probe-based skills (#300) - fix(perplexity): remove
env -iwrapper for shell-function providers (perplexity, openrouter) —envcannot exec bash functions, causing exit 127 (#300)
- fix(dispatch):
claude-opusxhigh effort dispatch brokeread -raword splitting — bareCLAUDE_CODE_EFFORT_LEVEL=xhighprefix treated as binary name bytimeout; wrapped withenv(#289 follow-up) - fix(qwen): remove invalid
--no-ask-userflag fromqwen.sh— Copilot CLI cross-contamination (#279) - fix(agents): add
tools: ["All tools"]to all 10 droids andpython-propersona — subagents silently lost file/bash access (#298 BUG-001, BUG-002) - fix(skill-extract): description now notes beta status for unimplemented features (#298 BUG-003)
- fix(hooks):
user-prompt-submit.shfalls back tojqwhenpython3is absent (#298 BUG-004) - fix(security):
telemetry-webhook.shrejects non-HTTPS webhook URLs, localhost exempted (#298 FINDING-03)
- Progress counter drift for Agent Teams dispatch (#276 item 7) —
subagent-result-capture.sh(SubagentStop hook) now incrementscompleted_agentsinprogress.jsondirectly after writing the result file. Previously the Agent Teams path returned without callingupdate_agent_status, so the counter lagged behind the actual number of completed agents. - Fork PRs silently 403 on review comment post (#276 item 2) —
pr-reviewjob inclaude-octopus.ymlnow guards withgithub.event.pull_request.head.repo.full_name == github.repository. Fork PRs have no access to secrets and a read-onlyGITHUB_TOKEN; they see CodeRabbit review instead.
- 95 legacy test files migrated to
test-framework.sh(#276 item 3) — all test files now use the shared framework for consistent output formatting, unified pass/fail tracking, and a single summary block. No test logic was changed.
/octo:reviewRound 1 silent timeout (#289) —review_run()was missing theOCTOPUS_FORCE_LEGACY_DISPATCHguard that the probe phase already had. Whenorchestrate.shruns as a Bash tool subprocess, Agent TeamsAGENT_TEAMS_DISPATCH:signals are never consumed by the host, leaving all result files empty and causing a 300s "ALL Round 1 providers failed" timeout. All parallel fleet spawn sites (review_run,tangle_execute,yaml_workflow_execute) now usefleet_dispatch_begin/endhelpers instead of rawexport/unset.--bareflag breaks subprocess auth (#288) — CC v2.1.114 regression whereclaude --bare --printexits 0 but emits "Not logged in", silently poisoning every Claude agent dispatch.providers.shnow probes--bareauth at detection time and disables it when broken.doctor.shreports the failure with a clear remediation (OCTOPUS_DISABLE_BARE=1).discipline-inject.shnever fires (#288) — the secondSessionStarthook block in.claude-plugin/hooks.jsonwas missing"matcher": {}. CC's hook dispatcher silently dropped it. Also fixed the same omission inStopFailure,CwdChanged,TaskCreated, andPermissionDeniedhook blocks.cursor-agentfallback/config gaps (#282–#287) — cursor-agent was missing from three dispatch locations added in the v9.23.0 provider expansion:find_capable_fallback()indispatch.sh(models: composer-2-fast, composer-2, grok-4-20, grok-4-20-thinking),set_provider_model/reset_provider_modelwhitelists inprovider-routing.sh, andbuild_architecture_fleet()inbuild-fleet.sh.- Factory Droid install command (#277) — README had
octo@claude-octopus(wrong namespace) and a bare URL without.git. Corrected toocto@nyldn-pluginswith.gitsuffix, matching the Claude Code install path. ((VAR++))silent test abort underset -e(#276) — postfix increment evaluates to0whenVAR=0, causing bashset -eto abort 15 test files before any assertions run. Applied|| trueguard across all affected files.- BSD
sedrange with command grouping (#276) —build-factory-skills.shused GNU-onlysed -n '/pat/,/pat/{...}'syntax that fails on macOS/BSDsed. Replaced with portableawkstate machine.
- Fleet dispatch guard helpers —
fleet_dispatch_begin()/fleet_dispatch_end()inagent-sync.shwrap all parallel fleet spawn loops. Replaces the copy-pasteexport OCTOPUS_FORCE_LEGACY_DISPATCH=truepattern. A new smoke test (tests/smoke/test-fleet-dispatch-guard.sh) statically enforces that all fleet call sites use the helpers and that allhooks.jsonblocks have a"matcher"key — prevents regression of #288/#289.
scripts/lib/resilience.sh(176 LOC) andscripts/lib/run-store.sh(154 LOC) — never sourced by any production code path; only referenced by their own unit tests. Removed from shipped bundle.scripts/test-claude-octopus.sh(1,889 LOC) — orphaned legacy test runner superseded bytests/structure; was shipping to users via"scripts/"inpackage.json.
debate.sh,auto-route.sh, andaudit.share now lazy-loaded inorchestrate.sh— sourced only inside the dispatch branches that need them (grapple,auto/optimize,review/audit) rather than unconditionally on every hook invocation.
- Claude Opus 4.7 support — the
claude-opusagent type now resolves toclaude-opus-4.7when Claude Code v2.1.111+ is detected, falling back toclaude-opus-4.6otherwise. Opus 4.7 is same-priced as 4.6 ($5/$25 MTok), takes a step change on SWE-bench Pro/Verified, has 1M native context, and is adaptive-thinking only.OCTOPUS_OPUS_MODELenv var overrides the default (e.g. pin toclaude-opus-4.6for legacy behavior). xhigheffort level — Opus 4.7's new effort tier betweenhighandmax. Plugin defaults the tangle/develop and ink/deliver phases toxhighon complex work (complexity=3). Automatically falls back tohighon Opus 4.6. Override withOCTOPUS_EFFORT_OVERRIDE=low|medium|high|xhigh|max.- 17 new
SUPPORTS_*feature flags covering Claude Code v2.1.105–112 (now 154 total):SUPPORTS_PRECOMPACT_BLOCKING(2.1.105) — PreCompact hook can veto compactionSUPPORTS_PLUGIN_MONITORS(2.1.105) —monitorsmanifest key for background processesSUPPORTS_ENTER_WORKTREE_PATH(2.1.105) —pathparam on EnterWorktreeSUPPORTS_MCP_TRUNCATE_RECIPES(2.1.105) — format-specific MCP truncationSUPPORTS_PROMPT_CACHE_1H(2.1.108) —ENABLE_PROMPT_CACHING_1Henv varSUPPORTS_SESSION_RECAP(2.1.108) —/recapand auto-context on session returnSUPPORTS_BUILTIN_SLASH_VIA_SKILL(2.1.108) — model invokes built-in/review,/security-reviewSUPPORTS_TASKCREATED_HOOK(2.1.110) — newTaskCreatedhook eventSUPPORTS_PERMISSIONREQ_RECHECK(2.1.110) —updatedInputre-validated vspermissions.denySUPPORTS_PRETOOL_CTX_ON_FAIL(2.1.110) —additionalContextsurvives tool-call failureSUPPORTS_TUI_FULLSCREEN(2.1.110) —/tui fullscreenrenderingSUPPORTS_OTEL_RAW_BODIES(2.1.110) —OTEL_LOG_RAW_API_BODIESenv varSUPPORTS_POWERSHELL_TOOL(2.1.110) — Windows PowerShell tool (progressive rollout)SUPPORTS_XHIGH_EFFORT(2.1.111) — Opus 4.7 effort levelSUPPORTS_OPUS_4_7(2.1.111) — gates Opus 4.7 resolutionSUPPORTS_AUTO_MODE_GA(2.1.111) —--enable-auto-modeno longer requiredSUPPORTS_ULTRAREVIEW(2.1.111) —/ultrareviewcloud parallel review (complements/octo:review)
hooks/pre-compact.shnow blocks compaction during active workflow phases — on Claude Code v2.1.105+, when 1+ agents are in flight duringtangle/develop/ink/deliver/discover-dispatch, the hook emits{"decision":"block"}and the compaction is deferred. Opt out withOCTOPUS_PRECOMPACT_BLOCK=off. On older CC versions, hook continues to warn-only as before.task-dependency-validator.shalso fires onTaskCreated— cleaner than the existingPreToolUse(TaskCreate)registration because it runs after creation with access to the task ID. The PreToolUse entry is retained as fallback for CC <2.1.110; the validator is idempotent so firing twice is safe.- W3C trace headers propagate into external CLI subshells — when
TRACEPARENTand/orTRACESTATEare set,build_provider_envnow forwards them into theenv -iisolated shell for codex/gemini/perplexity invocations so those CLIs participate in the same distributed trace as the host Claude Code session. /octo:reviewpositioning updated — the command header now distinguishes it from Claude Code's native/reviewand the new/ultrareview(v2.1.111+ cloud parallel review). Plugin's multi-LLM review remains the right tool when provider diversity or adversarial cross-check matters./octo:setupoffersENABLE_PROMPT_CACHING_1Hopt-in when Claude Code v2.1.108+ is detected (Step 4b). Documents that this affects Claude-Claude round-trips only, not external CLI subshells.scripts/lib/agents.sheffort mapping — tangle/ink phases at complexity=3 now emitxhigh(nothigh) whenSUPPORTS_XHIGH_EFFORT=true. Effort is threaded through the subshell asCLAUDE_CODE_EFFORT_LEVEL=xhighso the user's persistent/effortsetting is not mutated.OCTOPUS_EFFORT_OVERRIDEacceptsxhighandmax— previously restricted tolow|medium|high.- Model catalog refreshed —
claude-opus-4.7added (1M context, premium tier, active);claude-opus-4.6andclaude-opus-4.6-fastmarked legacy.
- No breaking changes. Users on Claude Code <2.1.111 transparently continue on Opus 4.6 behavior. Pinning to a specific Opus version via
OCTOPUS_OPUS_MODELremains the escape hatch. - Opus 4.7 has no "fast" variant.
OCTOPUS_OPUS_MODE=fastexplicitly targetsclaude-opus-4.6 --fast— a deliberate choice over silent mapping to something like--effort low, because fast mode is a latency feature distinct from effort. - Opus 4.7 API breakages (no
temperature/top_p/top_k, nothinking_budget, new tokenizer up to 1.35× token count) are handled by Claude Code itself — the plugin invokesclaudesubshells via--model opus, so all API-layer concerns stay inside CC.
- SessionStart hook crashed for returning users —
hooks/session-start-memory.sh:96usedlocaloutside a function underset -euo pipefail, exiting 1 whenSUPPORTS_MANAGED_SETTINGS_D=trueand an existing prefs file was found. Dropped thelocalkeyword; hook now completes steps 4-5 (managed-settings fragment + claude-mem context query) instead of aborting. Also removed the overly-permissive fallback glob at:38("$MEMORY_DIR"/*/memory) that could apply another project's preferences to the current session. bypassPermissionsstring-match bypass — four hooks (codex-exec-guard.sh,scheduler-security-gate.sh,careful-check.sh,freeze-check.sh) usedgrep -q '"bypassPermissions"'which matchedfalseand commented lines, effectively making the gates always-bypassed. Removed the block entirely — these gates enforce correctness or opt-in policy the user explicitly configured (via/octo:careful,/octo:freeze, or scheduled job allowlists) and shouldn't be disabled by a global CC prompt-skip setting. Opt-out levers remain:OCTO_CAREFUL_MODE=off,OCTO_FREEZE_MODE=off.scripts/test-claude-octopus.shgreped orchestrate.sh only — 4 assertions used$SCRIPT(orchestrate.sh) instead of$SCRIPTS_ALL(orchestrate + lib/*.sh) to locate extracted functions. Switched togrep -rq ... $SCRIPTS_ALLmatching the sibling test pattern.
- Worktree credential hygiene —
hooks/worktree-setup.shnow writes.octopus-envunderumask 077+ explicitchmod 600(previously world-readable under default umask 022). Refuses worktree paths outside$HOME,/tmp,/private/tmp,/var/foldersto harden against malformed CC payloads. - All 35 hook entries now have explicit timeouts in
.claude-plugin/hooks.json(previously 15 lacked"timeout":and could hang the session indefinitely). Validators: 10s; mid hooks: 30s; session export and quality-gate: 60s. orchestrate.shreduced by 724 lines — extracteddetect_providers(118 lines) →lib/providers.sh;embrace_full_workflow(387 lines) →lib/workflows.sh;is_agent_available_v2+get_tiered_agent_v2+get_fallback_agent(219 lines) →lib/model-resolver.sh. Strict-source (no2>/dev/null || true) on those 3 critical libs so syntax errors surface instead of silently degrading.- Untrusted external CLI output now nonce-wrapped —
scripts/lib/spawn.shwraps the## Outputfence of codex/gemini/perplexity results in<!-- BEGIN-UNTRUSTED:provider=X:nonce=Y -->/<!-- END-UNTRUSTED -->boundaries so downstream synthesis prompts can distinguish provider-authored text from trusted context. Complements the existingsanitize_external_contentwrapping. sanitize_external_contentnonce fallback fixed for macOS —date +%s%Nreturns a literalNon BSD date, collapsing the fallback nonce to ~10 predictable digits. Replaced with${RANDOM}${RANDOM}${RANDOM}$(date +%s)for non-predictable uniqueness when/dev/urandomis unreadable.- Manifest cleanup — canonical
.claude-plugin/plugin.jsonand.claude-plugin/marketplace.jsonnow agree on description/keywords/author/homepage. Keywords trimmed 20→10,author.urladded, duplicatedhomepagedropped (repository field already present). Description prefix handling unchanged —release.shcontinues to strip-then-prepend on version bump.
.claude-plugin/settings.json(17OCTOPUS_*defaults) — Claude Code's plugin schema doesn't read this path; env vars are delivered viahooks.jsonenv blocks and frontmatter. Dead config, no callers..gitmodules(0-byte stray) — repo has no submodules; file produced noisygit submodulewarnings.
SECURITY.mdrefreshed: soften "no eval with user data" claim to reflect the reality thatevalis used only on scrubbed synthesized variable names inlib/model-resolver.shandlib/quality.sh. Added note thatsysadmin-safety-gate.shis defense-in-depth, not a security boundary. Supported-versions table updated to 9.22.x.
- Removed
set -euo pipefailleak from sourcedlib/memory.shthat cascaded failures across all orchestrator commands (#270, closes #269) - Added missing
PROGRESS_FILEvariable definition inorchestrate.sh, fixing crashes indiscoverandembracefor users with jq installed (#271) - Rewrote
score_result_filecounting inlib/heuristics.shwithsafe_count()helper to handlegrep -cexit-1-on-no-match correctly, fixing arithmetic syntax errors that caused silent hangs during probe synthesis (#275)
- Memory provider contract (
scripts/lib/memory.sh) — unified façade over backends; callers usememory_search,memory_observe,memory_context,memory_availableinstead of touching bridges directly. Auto-detectsmcp-memory-serviceviamcpServersconfig signature; falls back toclaude-mem. Env overrides:OCTOPUS_MEMORY_BACKEND,OCTOPUS_MEMORY_SCOPE,OCTOPUS_MEMORY_SEARCH_MERGE. Detection never spawnsuvxspeculatively — avoids accidental Torch/CUDA pull. Closes discussion in #220. - Gemini in-band model fallback (
scripts/helpers/gemini-exec.sh) — on404 / ModelNotFoundError, retries with next entry inOCTOPUS_GEMINI_FALLBACK_MODELS(default:gemini-2.5-flash). Transient errors (429, 5xx) are not retried — stays in the circuit-breaker's lane. Stdin cached to tempfile so replay works across attempts. - Agent output cap —
run_agent_syncnow truncates atOCTOPUS_AGENT_MAX_OUTPUT_BYTES(default 256 KiB, 0 disables). Tail-biased: preserves first 4 KiB + last ~252 KiB so Codex-style deliverable summaries (always at the end) survive. Banner reports original size. - Partial-writes diagnostic on timeout — when
run_agent_syncexits 124/143,find -newermtsurfaces files written before SIGTERM so users know completed deliverables exist. GNU-only check skips silently on macOS BSD find.
doctor smokesilently aborting — five converging defects: (1)((var++))underset -eo pipefailexits 1 when var=0 — changed to((++var)); (2) doubleshiftinorchestrate.shdiscarded thesmokecategory arg before it reacheddo_doctor; (3) Codex smoke test passed prompt as positional arg — codex 0.120.0 rejects it, now piped via stdin; (4) Gemini cold-start (~12–18s) exceeded hardcoded 10s smoke timeout — nowOCTOPUS_GEMINI_SMOKE_TIMEOUT(default 30s); (5)/tmp/octo-model-cache-*.jsoncould hold two concatenated JSON documents from a concurrent-write race — validated withjq -cse, discarded and rebuilt on corrupt payload.- Scheduler version hardcoded to
v8.16.0— 7 major versions stale. Newoctopus_plugin_version()inlib/common.shreads from.claude-plugin/plugin.jsonat runtime (sed fallback if jq absent).validate-release.shnow warns when no git tag matches current version.
- session.sh routes phase-completion observations through
memory_observeinstead of callingclaude-mem-bridge.shdirectly — existing claude-mem deployments unaffected; mcp-memory-service users get observations routed to their backend. - README — update and clean-reinstall steps now include
marketplace update/marketplace removecommands to prevent stale cached plugin versions. - CI: bump
actions/github-scriptv8 → v9.
- CC v2.1.89-101 sync — 15 new feature flags (137 total), PermissionDenied audit hook, session auto-titling, macOS CI matrix, BSD/GNU portability lint
- Doctor false failure on Windows/Git Bash —
jq.exeon Windows outputs CRLF line endings. Indoctor_check_hooks(), the trailingprevented quote-stripping from matching, leaving a stale"in hook script paths. The-ftest then failed, reporting a false "Hook script missing" error. Fixed by piping jq output throughtr -d ' 'before path resolution. No impact on Unix. Closes #258.
- Broken symlinks in vendor skill —
vendors/ui-ux-pro-max-skillwas a git submodule with 3 internal symlinks (.shared/ui-ux-pro-max,.claude/skills/.../scripts,.claude/skills/.../data). Claude Code's plugin installer doesn't recurse submodules, so these broke on install. Replaced the submodule with plain vendored files, resolving all symlinks to real copies. Fixes E2E B10 failure.
orchestrate.shnot found by LLM Bash tool —${CLAUDE_PLUGIN_ROOT}is only available in hook execution context, not in the LLM's Bash shell. All skill, command, persona, and OpenClaw files referenced this variable, causing multi-LLM dispatch to silently fall back to Claude-only. Replaced with${HOME}/.claude-octopus/plugin/across 104 files, with a stable symlink created by session-manager.sh at session start.- Shared template block (
skills/blocks/provider-check.md) also used${CLAUDE_PLUGIN_ROOT}with a brokendirnamefallback — fixed at source sogen-skill-docs.shpropagates correctly. - Hardcoded provider metrics —
update_metrics "provider" "codex/gemini/claude"in flow templates replaced; metrics should track actual providers used, not assume a fixed set. - RTK install URL contained upstream repo attribution (
rtk-ai) in skill-doctor.md — replaced with generic cargo install target. - README command count — "49 commands" corrected to 48.
- 15 test suites fixed — Removed 2 stale v8.x tests (testing deleted
get_agent_commandand non-existentembrace.yaml). Fixed skill-verify path lookup, hooks.json registration assertions for opt-in hooks, flow-develop self-regulation assertions, OpenClaw registry sync, skill count expectation (50→51), README badge/count checks. - CLAUDE.md — Added Enforcement Best Practices section with Validation Gate Pattern documentation.
- embrace.md — Added answer incorporation instructions for intent questions.
- EXECUTION MECHANISM enforcement — All 13 multi-LLM workflow commands now have explicit
NON-NEGOTIABLEblocks prohibiting agents from substituting Claude-native tools for orchestrate.sh dispatch. Covers embrace, discover, define, develop, deliver, multi, review, security, debate, research, factory, staged-review, prd. - Embrace chains skill invocations —
/octo:embracenow invokes/octo:discover→/octo:define→/octo:develop→/octo:deliveras sequential Skill calls. Each phase loads fresh enforcement instructions, surviving context compaction in long sessions. - Post-compaction enforcement re-injection —
post-compact.shnow detects active multi-LLM workflows and re-injects execution enforcement text after compaction drops the original skill instructions. - Workflow verification hook — New
workflow-verification.sh(SessionEnd) detects when a multi-LLM workflow ran but produced no result files, warning that orchestrate.sh dispatch may not have executed. - Interactive
/octo:model-configwizard (v4.0) — No-args invocation now shows a dashboard + AskUserQuestion menu: provider defaults, phase routing, debate/multi-LLM participants, consensus threshold, cost mode, reset. CLI-style direct arguments still work. - Never-dismiss guardrails —
/octo:setupand/octo:model-configcan no longer be silently dismissed for returning users. Both always show interactive UI. - New test suites —
test-execution-mechanism.sh(32 assertions),test-interactive-commands.sh(10 assertions) guard against enforcement regressions.
/octo:embracenot dispatching to external providers — Agent displayed workflow banner but used only Claude-native tools (Agent, WebFetch) instead of calling orchestrate.sh. Root cause: missing explicit prohibition + context compaction dropping skill instructions in long sessions./octo:setupdismissing returning users — Agent said "you're already set up" instead of showing interactive menu. Fixed with mandatory first-output-line and never-dismiss guardrails.
- First-run auto-setup — SessionStart hook detects first install and auto-prompts
/octo:setup. Marker file at~/.claude-octopus/.setup-complete. - Interactive
/octo:setupwizard — Rewritten with AskUserQuestion for provider install (Codex/Gemini/Copilot/Qwen), OAuth/API-key auth, RTK install + hook config, and work mode selection. Replaces passive instruction dump.
sys-configureskill — Now redirects to/octo:setupinstead of duplicating setup logic. "configure", "config", and "setup" all route to the same interactive wizard.
/octo:doctorinteractive remediation — Doctor now uses AskUserQuestion to offer fixes for every fixable issue: RTK install (brew/cargo), RTK hook config, missing providers, expired auth, missing deps. Batches multiple issues into multiSelect prompts.- Token optimization report — Doctor includes RTK status, hook config, compressor analytics, and octo-compress availability at the end of every run.
/octo:optimizecommand — Folded entirely into/octo:doctorwhich now handles both diagnostics and interactive remediation. 48 commands total (was 49).
- Private VPS details — Removed from
docs/DEVELOPER.md(E2E infrastructure references).
- MCP server opt-in —
octo-clawMCP server no longer auto-registers in.mcp.json, preventing permanent✘ failedstatus in/mcppanel. Now requiresOCTO_CLAW_ENABLED=trueto start. (#240, thanks @everton-dgn) - MCP security hardening — Blocked security-governing env vars (
OCTOPUS_SECURITY_V870,OCTOPUS_GEMINI_SANDBOX, etc.) from being overridden via MCP client environment. - IDE editor context — New
octopus_set_editor_contextMCP tool injects IDE state (file, selection, cursor) into orchestration. 50KB selection limit. - Self-regulation in develop loops — WTF score tracking added to
flow-develop.mdfor runaway iteration detection (hard cap: 50 iterations).
- Claude Code v2.1.87-92 sync — 13 new
SUPPORTS_*flags (122 total): PostCompact hook (v2.1.76+), Elicitation hooks (v2.1.76+),--bareflag (v2.1.87+), model capability env vars (v2.1.87+), console auth (v2.1.87+), worktree HTTP hooks (v2.1.87+), deep link 5K (v2.1.88+), session ID header (v2.1.89+), marketplace offline (v2.1.90+), plugin executables (v2.1.91+), MCP result size (v2.1.91+), disable skill shell (v2.1.91+), multiline deep links (v2.1.91+). - PostCompact context recovery — New
post-compact.shhook reads workflow state snapshot saved bypre-compact.shand re-injects phase/workflow/autonomy context after compaction. 10-minute staleness window. - Elicitation hooks —
ElicitationandElicitationResulthook events log MCP structured input for observability. - Plugin CLI executable —
bin/octopusbare command (CC v2.1.91+ auto-discoversbin/). Subcommands:doctor,version,session,fleet. - Headroom-inspired token compression —
hooks/output-compressor.shPostToolUse hook auto-detects large outputs (JSON arrays, logs, HTML, verbose text >3K chars) and injects compressed summaries.bin/octo-compressstandalone CLI for pipe-based compression (npm install 2>&1 | octo-compress). HUD "Saved" column tracks cumulative savings. - Rate limit HUD fallback —
octopus-hud.mjsuses CC-providedrate_limitsfrom stdin when OAuth API is unavailable (enterprise, API-billing, expired creds). - managed-settings.d fragment — Deploys
octopus-defaults.json(git instructions off, auto-memory dir) on session start. Atomic write with tmpfile+mv. - Token optimization command (
/octo:optimize) — RTK analysis, context usage, guided setup. 49 commands total. - RTK-aware context nudges — RTK gain stats at WARNING+CRITICAL+AUTO_COMPACT severity levels.
- HUD RTK column — Cumulative tokens saved and average compression percentage.
- 20 new doctor tips — PostCompact, bare flag, model caps, console auth, plugin executables, MCP result size, marketplace offline, disable skill shell, elicitation hooks, session ID header, deep link 5K, worktree HTTP hooks, multiline deep links, rate limit fallback, managed settings, output compressor, octo-compress CLI.
- 67-test suite —
test-cc-v2184-91-sync.shcovers all v9.19 flags, cascade blocks, hooks, executables, wiring, doctor tips, HUD fallback, orphan cleanup, hook consistency.
- Token savings (~7,300 tokens/session):
- Hook conditional
ifgates on 4 hooks (careful-check, freeze-check, telemetry, output-compressor) — skip process spawns when conditions aren't met - PostToolUse consolidation — single
post-tool-dispatch.shreplaces 3 blanket hooks - Context-reinforcement trim — 750→150 tokens (compact gate names)
- Lazy skill
paths:on 9 specialized skills — only listed when relevant files present - CLAUDE.md diet — 3,800→2,418 tokens (dev sections moved to
docs/DEVELOPER.md) - additionalContext minimization —
[🐙 Octopus]→[🐙]across all hooks
- Hook conditional
--bareflag — Allclaude -psubprocess calls use--bareon CC v2.1.87+ for faster synthesis (skips hooks/LSP/plugin sync).- Version cascade ordering — Fixed v2.1.30 and v2.1.80 block inversions in
providers.sh. Merged duplicate v2.1.33 blocks. - Hook consistency — Added
set -euo pipefailtoworktree-setup.sh,worktree-teardown.sh,config-change-handler.sh,telemetry-webhook.sh.
- HUD cache bypass — Error-cached OAuth result no longer blocks CC-provided rate limit fallback for 15 seconds.
- JSON heredoc injection —
session-start-memory.shfallback path now usesjq -n --arginstead of raw variable expansion in heredoc. - Post-compact staleness — Window raised from 5 to 10 minutes for large context compactions.
session-sync.sh— Orphaned hook (merged intosession-start-memory.sh). Removed fromhook-profile.shallowlist."executables"manifest field — Not a validplugin.jsonschema field; CC auto-discoversbin/by convention.
- Embrace workflow silent exit —
cleanup_old_results()andcleanup_cache()insemantic-cache.shused bare[[ cond ]] && cmdpatterns that returned exit code 1 underset -ewhen no files needed cleaning. Added|| trueto prevent premature script termination. (#241) - SESSION_FILE path expansion —
SESSION_FILEwas derived fromWORKSPACE_DIRat source-time inquality.sh, beforeWORKSPACE_DIRwas defined inorchestrate.sh, causing it to expand to/session.json. Re-derived afterWORKSPACE_DIRis set. (#241)
- Claude Code v2.1.84-87 sync — 9 new
SUPPORTS_*flags: skill effort frontmatter (v2.1.80+), rate limit statusline (v2.1.80+), TaskCreated hook (v2.1.84+), skill paths globs (v2.1.84+), plugin userConfig (v2.1.84+), conditional hookiffield (v2.1.85+), PreToolUse AskUserQuestion answering (v2.1.85+), skill description 250 char cap (v2.1.86+), TaskOutput deprecation (v2.1.83+). - Skill
effort:frontmatter — 10 research/analysis skills set toeffort: high, 7 quick/diagnostic skills set toeffort: low. Saves tokens on light tasks, allocates more thinking on deep work. CC v2.1.80+ reads this automatically. - Skill
paths:frontmatter — 4 skills scoped to relevant file globs (TDD → test files, doc-sync → markdown, security-framing → env/auth files, coverage-audit → test/coverage dirs). CC v2.1.84+ auto-activates matching skills. - TaskCreated discipline hook — When discipline mode is on, fires brainstorm gate reminder when tasks are created. Prevents jumping into implementation without a plan.
- Marketplace sync counts from
.claude/commands/— Source of truth for command count (was counting Codexcommands/dir which lagged).
- Windows/Git Bash compatibility — add
--skip-git-repo-checkto all Codex CLI dispatch commands; fix pipe chain stdout loss with MINGW-aware file-based capture fallback; addWORKSPACE_DIRfallback to smoke test and tier cache paths (#235) - Model resolver cross-provider routing — routing phases targeting a different provider now skipped instead of contaminating model selection (#235)
- Scope drift skill enforcement — add MANDATORY COMPLIANCE block (#236)
- Test: "Which Tentacle?" heading renamed — matches "Pick a Command by Goal" heading.
- test-codex-compat.sh — skill count pattern updated to range.
- OpenClaw registry sync —
skill-verify→skill-verification-gate, adddisciplinecommand.
- Discipline mode (
/octo:discipline on) — 8 auto-invoke gates enforced at SessionStart. 5 development gates (brainstorm, verification, review, response, investigation) + 3 knowledge work gates (context detection, structured decisions, intent locking). Off by default, persists across sessions./octo:quickbypasses all gates. - Cursor IDE plugin support —
.cursor-plugin/plugin.jsonfor Cursor marketplace compatibility. - OpenCode install guide —
.opencode/INSTALL.mdwith symlink-based skill discovery. - Codex CLI compatibility layer —
scripts/build-codex-skills.shgenerates.codex/skills/from.claude/skills/,OCTOPUS_HOSTdetects codex/gemini hosts, graceful degradation for non-Claude hosts. 80-test suite. - Verification gate skill — "Evidence before claims" iron law. Replaces and consolidates old
skill-verify. Red-green regression examples. - Review response skill — How to handle code review feedback. Verify before implementing, push back when wrong, never agree blindly.
- Two-stage post-implementation review —
flow-developnow runs spec compliance check first, code quality review second, E2E verification third — all in parallel. - Comparison table — Claude Code vs Superpowers vs Octopus in collapsible README section.
- Built with Claude badge + CI status badge + test count badge in README.
- GitHub Discussions enabled — pinned "Start Here" post with FAQ.
- 3 good-first-issue tickets created (#221, #222, #223).
- README opening rewritten — leads with the problem (blind spots) and the benefit (they surface before you ship), not a feature list.
- README headings renamed — benefit-first titles (e.g., "Top 8 Tentacles" → "8 Commands That Matter Most", "Reaction Engine" → "Built-in Reaction Engine").
- Root directory streamlined — 25 → 19 visible items. Moved CODE_OF_CONDUCT, CONTRIBUTING, PRIVACY to
docs/, templates toconfig/templates/, workflows toconfig/workflows/, assets todocs/assets/. - Marketplace description — benefit-driven copy instead of version-note changelog summary.
.claude-plugin/README.mdrewritten — 27-line internal dev note → 65-line user-facing landing page with before/after example, quickstart, common jobs table.- Star history chart moved from mid-page to bottom of README.
- What's New v9 row updated with circuit breakers, loop self-regulation, HUD, cache-aligned prompts.
- Marketplace sync —
sync-marketplace.shnow counts skills from.claude/skills/(source of truth, 51) instead ofskills/*/SKILL.md(Codex copies, 45). - CI green — docs-sync test matches renamed headings + emoji prefix, plugin expert review accepts
docs/assets/, emptyStop: []hook array removed. - Hooks.json — removed empty Stop array that caused validation failure in E2E runner.
- PostHog telemetry — unreliable hook delivery (CLAUDE_PLUGIN_ROOT not always set, events only flush on SessionEnd). PRIVACY.md already stated "no telemetry" — now that's actually true.
skill-verify— consolidated intoskill-verification-gate(examples preserved, multi-provider context added).
- Sentinel canary monitoring —
/octo:sentinelauto-detects deployments and runs post-deploy health checks: HTTP status, load time regression (flagged at >50% baseline), console error detection, and Core Web Vitals comparison. Auto-triggers after/octo:delivercompletes — no manual flags needed. - Security auto-escalation —
/octo:securitynow auto-detects Quick vs Deep mode from the git diff. Touching auth, security, CI/CD, or dependency files auto-escalates to Deep mode with secrets archaeology (git history scan for leaked credentials), CI/CD pipeline audit (GitHub Actions injection risks), skill supply chain verification, and STRIDE threat modeling. - Design shotgun —
/octo:design-ui-uxauto-dispatches to 3+ providers for parallel design variant generation when enough providers are available. Each provider produces an independent style direction; results presented as a side-by-side comparison board. Falls back to standard single-direction mode with fewer providers. - Ship pipeline —
skill-finish-branchnow always runs a multi-provider diff review before shipping (no size threshold). Adds optional version bump (patch/minor/major) and auto-generated changelog entries from commit history. - Scope drift detection — New
skill-scope-driftcompares diff against stated intent (TODOS.md, PR body, commit messages) and flags scope creep or missing requirements. Auto-integrated into/octo:reviewStep 1b — informational only, never blocks. - Dynamic fleet dispatch —
build-fleet.shenforces model family diversity across agents. Providers are spread across OpenAI, Google, Microsoft, Alibaba, and Anthropic families to avoid agreement bias from same-family models.
- Statusline identity fix — Tier 3 statusline now shows
[🐙 Octopus]instead of[🐙 Claude]. Tier 2 idle mode shows[🐙 Octopus]instead of just[🐙]. - Standardized hook prefixes — All hook
additionalContextmessages now use[🐙 Octopus]prefix. Previously varied:[Octopus Context Monitor],[Compound Task],[Octopus Strategy Rotation]. - Consolidated provider check — New
scripts/helpers/check-providers.shreplaces 7 inline copies of the 8-line provider check block across skill files. - Output helpers — New
octopus_header(),octopus_separator(),octopus_phase_banner(),octopus_complete()inlib/common.shstandardize box-drawing output. Phase banners, config display, and error boxes all use consistent 60-char width. - Compact banner mode — Set
OCTOPUS_COMPACT_BANNERS=truefor single-line activation banners instead of full provider blocks. - Clear action descriptions — Replaced whimsical tentacle messages ("Extending empathy tentacles...") with clear provider dispatch descriptions across 6 files.
- Consistent completion messages — All workflow completion messages now use
octopus_complete()helper:✓ [Workflow] complete.
- Codex compatibility layer — Host platform detection for Codex and Gemini runtimes with graceful degradation.
- PostHog telemetry removed — Unreliable hook delivery; telemetry hooks removed.
- README polish — Hero demo GIF, Built with Claude badge, streamlined comparison table.
- Silent error swallowing in provider dispatch — Added
set -o pipefailto spawn_agent subshell. Pipelineprintf | codex | teewas reporting tee's exit code (always 0), silently hiding Codex/Gemini failures. - Codex explicit stdin flag — All
codex execcommands now include-for explicit stdin reading instead of relying on auto-detection. - Gemini stdout noise filter — MCP status messages, extension loading, and keychain fallback messages no longer pollute results.
- Windows PATH space-splitting —
build_provider_env()skipsenv -icredential isolation on Windows (MINGW/MSYS/CYGWIN) whereC:\Program Filespaths break word-splitting. - Error classification expanded —
classify_error()now handles permission-denied, module-not-found, and MCP-issues patterns for proper circuit breaker response. - MANDATORY COMPLIANCE added to 9 commands/skills (factory, prd, sentinel, resume, schedule, code-review, parallel-agents, debug, writing-plans).
- PostHog telemetry reads key from settings.json when env var unset.
- Codex review dispatch — Strengthened JSON output format requirement to prevent unstructured diff dumps.
- MANDATORY COMPLIANCE audit test — New
test-mandatory-compliance.sh(38 tests) catches missing enforcement automatically.
- dispatch.sh Codex
--full-autoflag — All fourcodex execvariants inget_agent_command()now include--full-auto, preventing hangs in non-interactive execution (debate, sync dispatch, spawn). (#212, #213) - doctor hook validation false positives — Hook script path parser now handles
bash-wrapped commands and env-var prefixed commands (KEY=value script.sh), eliminating 5 false failures in/octo:doctorhooks check. (#214) - MCP server zod compatibility — Bumped
zodfrom 3.24.1 to 3.25.67 inmcp-server/package.jsonto resolveERR_PACKAGE_PATH_NOT_EXPORTEDonzod/v4subpath required by@modelcontextprotocol/sdk1.26.0. (#215)
- RTK companion detection —
/octo:setupand/octo:doctornow detect RTK (Rust Token Killer) and recommend it for 60-90% bash output compression. Context-awareness hook suggests RTK at WARNING level when not installed. Fully optional — no hard dependency. - Cache-aligned prompt construction — Restructured
spawn_agent()andrun_agent_sync()to place stable content (persona, skills, boilerplate) before variable content (timestamps, session state, provider history). Enables Claude's 90% cached-token discount on repeated prompt prefixes. - Anomaly-preserving output truncation —
guard_output()now preserves error/failure lines (ERROR, FATAL, FAIL, PANIC, Traceback, Exception, CRITICAL) when truncating large outputs. Shows head + anomalous lines with line numbers + tail instead of blind truncation. Falls back to original behavior when no anomalies found. - 3 new test suites:
test-rtk-detection.sh(17),test-cache-alignment.sh(29),test-anomaly-truncation.sh(20). 132/132 tests passing.
- test-v8.5.0 Agent Teams grep window — Widened
grep -A 400to-A 500for spawn_agent function growth from cache-alignment restructuring.
- Loop self-regulation — Configurable weights for WTF-likelihood scoring and sliding-window stuck detection. Users can override defaults (revert penalty, unrelated-files penalty, threshold, hard cap, window size) via
~/.claude-octopus/loop-config.conf. - Self-regulation wired into flow-develop — Iterative development cycles now track WTF score and pattern detection, preventing runaway implementation loops.
- Self-regulation wired into skill-debug — Debug fix loops now track WTF score alongside the existing 3-strike rule, adding quantitative drift detection to fix attempts.
- 13 new tests for configurable weights, flow-develop wiring, and skill-debug wiring (33 total in test-loop-self-regulation.sh).
- Provider Reliability Layer (CONSOLIDATED-01) — Circuit breaker state persists across sessions in
provider-state/(viaCLAUDE_PLUGIN_DATAor~/.claude-octopus/).spawn_agent()checksis_provider_available()before dispatch, records success/failure to circuit, classifies errors as transient/permanent viaclassify_error(). Transient errors (429, 500, timeouts) trigger graduated backoff; permanent errors (401, billing) open circuit immediately. Half-open probe after cooldown enables automatic recovery. - Doctor circuit breaker status —
/octo:doctornow shows open circuit breakers and provider health. - Bash 3.2 compatibility fix —
classify_error()no longer uses${var,,}(bash 4+ only).
- CC v2.1.78-83 feature detection — 8 new
SUPPORTS_*flags: StopFailure hook, PLUGIN_DATA dir, agent effort/maxTurns/disallowedTools, CwdChanged/FileChanged hooks, managed-settings.d, env scrub, initialPrompt. - CLAUDE_PLUGIN_DATA workspace —
WORKSPACE_DIRnow prefers${CLAUDE_PLUGIN_DATA}when available (CC v2.1.78+), with backward-compatible fallback to~/.claude-octopus/. - Agent
effort+maxTurnsfrontmatter — All 32 agents configured: research agentseffort: high/maxTurns: 25, balanced agentseffort: medium/maxTurns: 20, lightweight agentsmaxTurns: 15. - Agent
initialPrompt— 4 key agents auto-submit first turn: code-reviewer, security-auditor, debugger, performance-engineer. - CwdChanged hook —
hooks/cwd-changed.shre-detects project context (language, framework) on directory change. - StopFailure hook —
hooks/stop-failure-log.shlogs API errors toerror-log.jsonlfor diagnostics. - Agent Teams bridge: task dependencies —
bridge_register_task()acceptsdepends_onparameter;bridge_is_task_unblocked()blocks claiming until dependencies complete. - Agent Teams bridge: shutdown protocol —
bridge_shutdown_teammate()marks tasks asshutting_down;bridge_cleanup()warns about running tasks before archiving. - Agent Teams bridge: nested guard —
bridge_init_ledger()refuses to create a new team when an active workflow is running. - Agent Teams bridge: native discovery —
bridge_discover_native_team()reads CC's official~/.claude/teams/config. - Agent Teams enable check —
bridge_is_enabled()logs whenCLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMSis not set; doctor tip suggests enabling it. - PostHog usage analytics —
hooks/telemetry-posthog.shsends anonymous, opt-in session/workflow/error events to PostHog. Random UUID identity, PII scrubbing, local buffering with batch flush on SessionEnd. Project key embedded insettings.json— users disable withPOSTHOG_OPT_OUT=1. - 4 new test suites:
test-cc-v2183-sync.sh(39),test-shell-safe-hooks-v2183.sh(8),test-agent-teams-bridge.sh(27),test-posthog-telemetry.sh(20).
- 128/128 tests passing (was 105/128) — 18 test files updated to search
ALL_SRC(orchestrate.sh + lib/*.sh) after v9.12.0 decomposition. Fixed NODE_NO_WARNINGS grep pattern, get_agent_command_array reference, YAML quoting, grep regex syntax, statusline fallback test, HTTP hook test. - Provider detection enforcement — Added
PROVIDER_CHECK_STARTbash snippet toskill-debate.md,flow-parallel.md,skill-ui-ux-design.md(were showing hallucinated banners). - Marketplace metadata version test —
test-version-consistency.shnow cross-checks bothmetadata.versionfields to catch desyncs like the v9.10.3 incident.
- orchestrate.sh decomposition wave 2 — Moved 27 functions to lib/ modules. New lib/completions.sh. orchestrate.sh: 4,944 → 3,707 lines (-25%), 70 → 41 functions (-41%).
- Dead code removal — Removed
OLD_init_interactive_impl(),get_fallback_agent_v2()(272 lines from interactive.sh). - Fork reduction — Converted 28
echo|tr/cut/wcpatterns to bash builtins. Fixedcat|head→headin factory-spec.sh. - Provider check template block — Extracted snippet to
skills/blocks/provider-check.md. Flow templates use{{PROVIDER_CHECK}}placeholder.
- OpenCode CLI provider — multi-provider router integration
- HUD: tool activity tracking — Statusline shows active tools and counts (
◐ Edit: auth.ts │ ✓ Read ×3 │ ✓ Grep ×2). Tracks Read, Write, Edit, Bash, Grep, Glob, WebSearch, WebFetch from transcript. - HUD: enhanced todo progress — Shows active task text, not just count (
▸ Fix auth bug (2/5)). - HUD: named presets —
{"preset": "developer"}in.hud-config.jsonc. Built-in: minimal, developer, full, performance. Preset indicator in Octo column. - PRIVACY.md — Privacy policy for official Anthropic marketplace submission.
- Cowork compatibility — Added homepage field, updated keywords with "cowork", "multi-llm", all 8 provider names. Plugin was already format-compatible.
- Smart router missing multi-LLM route —
/octo:multiwas unreachable via/octo:auto. Keywords "multi", "multi-llm", "multi-provider" now route toocto:multi. - sync-marketplace.sh duplicate text — "Run /octo:setup." appeared twice in marketplace description.
- test-skill-templates.sh — Updated for removed
skills/blocks/directory. - Build artifacts — Regenerated Factory skills, OpenClaw dist, new command wrappers.
- Hardened plugin validation (PR #208) — Factory YAML frontmatter normalization,
claude plugin validatein release workflow.
- embrace.sh dispatch — Now detects all 5 CLI providers (codex, gemini, copilot, qwen, ollama) and dynamically builds dispatch strategies. 3+ available CLIs → all join the fleet. Qwen and Ollama now participate in research, review, and architecture workflows.
- Debate participants — Copilot (🟢) and Qwen (🟤) join as supplementary participants when available, alongside core four (Codex/Gemini/Sonnet/Opus).
- Smart setup prompt — Detects when legacy users have new providers (Copilot/Qwen/Ollama) and proactively informs them of extra tentacles.
- Codex mini model — Updated
gpt-5-codex-mini→gpt-5.4-miniacross dispatch, models catalog, provider routing, and docs. GPT-5.4 Mini is 2x faster and uses 30% token quota vs GPT-5.4.
- Emoji conflict — Qwen 🟠→🟤 (Sonnet keeps 🟠 as established).
- SEO: "Multi-LLM orchestration" in opening paragraph — First sentence now leads with "Multi-LLM orchestration plugin for Claude Code" and names all 8 providers. This is the Google snippet zone (~155 chars). Repo description updated to match.
- README: outcome-first opening bullets — Lead with what it does for you, not which 8 providers it uses. Defined jargon inline (personas = role-specific agents, skills = reusable workflows).
- README: condensed What's New — 14 detailed changelog rows → 3-row table by major version (v9/v8/v7) with best end-user features.
- README: simplified Quickstart — 3 commands upfront, alternatives + troubleshooting in collapsible
<details>blocks.
- Qwen CLI as 8th provider: Free-tier research via Qwen OAuth (1,000-2,000 requests/day). Fork of Gemini CLI — same dispatch pattern. Agent types:
qwen,qwen-research. Detection, doctor, health check, dispatch, model resolver, circuit breaker, workflows, preflight, and install-deps all wired. - Copilot Coding Agent native files:
.github/agents/*.agent.mdfor all 10 agents. YAML frontmatter with Copilot tool aliases (read, edit, execute, search). Makes agents discoverable by GitHub's server-side coding agent. - Gemini .toml custom commands:
.gemini/commands/octo/with 4 persona commands (research, review, architect, implement) for human interactive use. Not used in headless dispatch (stdin+slash don't compose — verified via Codex source analysis). - Gemini provider test suite: 44 tests covering dispatch, detection, doctor, health, models, circuit breaker, workflows, embrace, MCP, .toml commands, pricing, and config.
- P0: json_extract reliability — Replaced brittle regex (
"field":"value") with 3-tier fallback: jq (if available) → python3 one-liner → improved regex that handles whitespace, escaped quotes, numeric values, and missing fields. - P1: OpenRouter hardening — Added
--max-time 60timeout, HTTP status code handling (429 retry with Retry-After, 502/503/524 error messages), deduplicatedopenrouter_execute()andopenrouter_execute_model()into one core function. - P1: DeepSeek model update —
deepseek/deepseek-r1→deepseek/deepseek-r1-0528across dispatch, model-resolver, models catalog, and docs. - CC version detection tests consolidated — 4 test files merged into
test-cc-version-detection.sh(103 tests).
- Copilot dispatch broken end-to-end (#206, PR #207 by @PavelPancocha): 5 bugs that prevented Copilot from ever running in workflows despite detection:
dispatch.shreturned bash function name (copilot_execute) instead of executable —timeoutcan't exec functions. Fixed:copilot --no-ask-user.validate_agent_command()in utils.sh rejectedcopilot— not in allowlist. Fixed: addedcopilotpattern.embrace.shnever included Copilot in dispatch strategies — only checked codex/gemini. Fixed: addedhas_copilotdetection + 3/4-provider strategies.- Headless
-p ""stdin flag only appended forgemini*agents — Copilot needs it too. Fixed: extended condition tocopilot*. - Provider metrics tracking fell through to wildcard for copilot/ollama. Fixed: added explicit cases.
- Stray
}at EOF in workflows.sh — caused syntax error when sourced (CodeRabbit catch from PR #207). - Codex smoke test timeout too short — hardcoded 10s, but MCP initialization takes 20-40s. Now configurable via
OCTOPUS_CODEX_SMOKE_TIMEOUT(default: 45s).
- README tagline — "turns one model into three" → "orchestrates seven AI providers"
- SECURITY.md — supported versions 4.x → 9.x, fixed package names, added Copilot/Ollama to deps
- CONTRIBUTING.md — removed dead Python/coordinator.py refs, added real test commands, bash 3.x compat
- PR template — removed dead
coordinator.pycheck, added real test/registry/version-bump checklist - Issue templates — upgraded from markdown to YAML forms with provider dropdowns and version fields
- CODE_OF_CONDUCT.md — Contributor Covenant v2.1
- Repo topics — 12 discoverable tags (claude-code, multi-ai, ai-orchestration, etc.)
- 39 stale remote branches — all merged/orphaned branches cleaned up
- Wiki and Projects tabs — disabled (unused)
- Discussions — disabled
- Documentation consolidation: Removed 9 stale/redundant docs from plugin (archived to dev repo). Kept 7 user-facing docs + 5 provider configs. Rewrote
docs/README.mdindex. - Provider counts normalized to 7 across README.md ("Seven Providers"), ARCHITECTURE.md (Copilot no longer "aspirational"), CLAUDE.md (detection section, modular config tree), COMMAND-REFERENCE.md ("47 commands"), copilot-instructions.md.
- Debate references updated to four-way across COMMAND-REFERENCE.md (was "3-way").
config/providers/copilot/CLAUDE.md: New provider config file for GitHub Copilot CLI (was missing).
docs/CLI-REFERENCE.md— CLI flags are in orchestrate.sh--helpdocs/PLUGIN-ARCHITECTURE.md— Overlapped ARCHITECTURE.md, perpetually staledocs/FACTORY-AI.md— Factory-specific, stale countsdocs/SANDBOX-CONFIGURATION.md— Documented invalid mode (danger-full-access); valid modes are in dispatch.shdocs/NATIVE-INTEGRATION.md— Outdated v8.15 contentdocs/INTERACTIVE_QUESTIONS_GUIDE.md— Developer reference, rarely useddocs/PDF_PAGE_SELECTION.md— Belongs in document-skills plugindocs/RELEASE_AUTOMATION.md— Internal workflow, moved to dev repodocs/agent-decision-tree.md— Internal design doc, moved to dev repo
- Ollama CLAUDE.md: Corrected false "no streaming in CLI mode" claim.
- AGENTS.md: Fixed path
agents/→.claude/agents/.
- Ollama dispatch missing: Added
ollama|ollama-*case todispatch.shandollamatoAVAILABLE_AGENTS— v9.9.0 wired detection but missed the dispatch branch. - detect-providers incomplete:
detect_providers(),cmd_detect_providers(),install-deps.sh, andis_agent_available_v2()now include Perplexity, Ollama, and Copilot (were only in doctor.sh). - copilot-instructions.md wrong path:
marketplace.json→.claude-plugin/marketplace.json.
- Removed inline adversarial steps: Deleted STEP 6.5 (flow-define), STEP 3.5 (flow-develop), STEP 4.5 (flow-deliver) — superseded by centralized multi-LLM adversarial debate system (v9.4.0+v9.8.0).
- GitHub Copilot CLI as runtime provider (#198): Official
copilot -pprogrammatic mode (GA Feb 2026) with 5-tier fallback auth chain:COPILOT_GITHUB_TOKEN→GH_TOKEN→GITHUB_TOKEN→ keychain →ghCLI. Agent types:copilot,copilot-research. Zero additional cost (uses GitHub Copilot subscription). Graceful degradation when unavailable. - Ollama as local LLM provider: Primary integration via
ollama runCLI dispatch. Doctor checks CLI install + server health + model count. Added to provider health checks, circuit breaker, and model resolver (ollama*→llama3.3). SecondaryANTHROPIC_BASE_URLbridge path documented for drop-in compatibility. - Repo-level agent discovery files:
AGENTS.mdfor GitHub Copilot coding agent discovery,.github/copilot-instructions.mdfor Copilot-specific repo instructions. - Adapter integration tests (
test-adapter-flags.sh): 23 tests covering debate flag placement, quality_threshold forwarding, env var allowlists, and Copilot wiring.
- Debate flag placement in MCP/OpenClaw (CRITICAL): Both adapters placed grapple-specific flags (
-r,--mode) before the command, where orchestrate.sh's global parser consumed them incorrectly. OpenClaw's-dflag collided with the global--dirflag. AddedpostFlagsparameter to bothrunOrchestrate()andexecuteOrchestrate(). Debate flags now correctly go after the subcommand. quality_thresholdsilently ignored: Both MCP and OpenClaw accepted the parameter but never forwarded it. Now passes-qflag to orchestrate.sh when non-default.- MCP/OpenClaw env var allowlists: Added
ANTHROPIC_BASE_URL,ANTHROPIC_AUTH_TOKEN(Ollama bridge),COPILOT_GITHUB_TOKEN,GH_TOKEN,GITHUB_TOKEN(Copilot auth),PERPLEXITY_API_KEY(was missing from OpenClaw). - OpenClaw registry stale: Regenerated to 97 entries matching current skills/commands.
- OpenClaw debate description: "Three-way" → "Four-way" (Sonnet was added as 4th participant in v9.4.0).
- OpenClaw debate style param: Removed broken
styleparam (no CLI mapping) and-dflag. Replaced withmodeparam (cross-critique/blinded) matching orchestrate.sh's actual--modeflag. test-openclaw-compat.shearly abort:test_build_check_modeandtest_validate_script_passesused command substitution underset -e, causing the entire suite to abort on first failure. Fixed with&& exit_code=0 || exit_code=$?pattern.
- ARCHITECTURE.md: Updated from "three providers" to 5 core + 2 optional (Codex, Gemini, Claude, Perplexity, OpenRouter + Ollama, Copilot). Updated provider table and ASCII diagram.
- skill-copilot-provider.md v2.0: Rewritten from
gh copilot(retired) to officialcopilot -pprogrammatic mode. Documents auth chain, PAT setup, and premium request quota. - setup.md: Added Copilot CLI setup section with install and auth instructions.
- skill-doctor.md: Updated providers table to match actual doctor checks.
- test-copilot-provider.sh: Updated assertions for v2.0 skill content (37 tests).
- Adversarial debate in 9 workflows: Multi-LLM cross-checking now wired into
/octo:multi(mandatory synthesis with disagreement surfacing),/octo:spec(completeness challenge),/octo:define(requirements challenge),/octo:factory(pre-embrace scenario coverage gate),/octo:develop(pre-implementation devil's advocate),/octo:prd(draft adversarial review),/octo:staged-review(multi-LLM Stage 2 with Codex logic + Gemini security),/octo:parallel(WBS decomposition cross-check),/octo:tdd(test design review). All skippable with--fast. - Visual activation indicators on all commands: Every
/octo:*command now shows a 🐙 indicator line when activated. 19 commands and 10 skills that were missing indicators now have them. 7 skills that falsely claimedvisual_indicators_displayedin their contract now actually display one. 4 existing banners missing the 🐙 emoji prefix now include it.
- test-debate-skill.sh CI failure: Wrong helper path (
tests/smoke/test-helpers.sh→tests/helpers/test-framework.sh) caused "Missing test-helpers.sh" on every CI run. - test-packaging-integrity.sh CI failure:
set -euo pipefail+eval "source ..."subshell broke on CI when sourced scripts referenced unset runtime variables. Replaced with file-existence check that doesn't require executing sourced code.
- Windows
${USER}unbound variable crash (#201):$USERis unset on Windows (Git Bash) — Windows uses$USERNAMEinstead. All 6 occurrences in the model cache path now use${USER:-${USERNAME:-unknown}}to handle both platforms. - Codex smoke test false negative outside git repos (#202):
codex execrequires a git repository, so the smoke test always failed with "Not inside a trusted directory" when run from a non-git directory. Now creates a temp git repo for the test and cleans up after. AddedGIT_REPO_REQUIREDerror classifier for a clearer message if the workaround fails.
- Broken Skill() dispatch in 9 commands:
doctor,claw,loop,debug,deck,docs,security,staged-review,tddall usedSkill(skill: "skill-name")which failed with "Unknown skill" because the Skill tool requires plugin-qualified names. Replaced with direct file read instructions. Net -93 lines. - Factory AI manifest stale at v8.41.0: Bumped
.factory-plugin/plugin.jsonto 9.7.7 with correct command/skill counts. - HTTP webhook hook no-op: Removed the
type: httphook entry that fired with an emptyOCTOPUS_WEBHOOK_URL. The shell script fallback (telemetry-webhook.sh) already has the guard. - MCP server Node version guard: Added
check-node-version.jsthat fails fast with a clear error on Node < 18 instead of silently crashing.
- PostToolUse context-awareness scoped: Changed from blanket
{}matcher toBash|Agent|Write|Editonly. Eliminates a bash process spawn on every Read/Grep/Glob call. - SessionStart hooks consolidated (5 → 4): Merged
session-sync.shintosession-start-memory.sh, reducing process spawns per session start/resume/compact. - context-awareness.sh timeout guard: Added
timeout 3 catpattern for stdin drain consistency with other hooks.
- Dependency installer (
scripts/install-deps.sh): Newcheckandinstallmodes that auto-detect and install missing CLIs (Codex, Gemini), jq, and the statusline resolver. Reports recommended plugin status (claude-mem, document-skills) with copy-paste/plugin installcommands. - Setup dependency check:
/octo:setupnow runsinstall-deps.sh checkfirst — shows what's missing before provider detection. Offersinstallto fix everything in one shot. - Doctor deps category:
/octo:doctorgains adepscheck category and install step (Step 3) for fixing missing software dependencies.
- Statusline version goes stale on plugin update:
settings.jsoncontained a versioned cache path (e.g.,.../octo/9.6.1/hooks/...) that never updated when the plugin upgraded. Addedstatusline-resolver.sh— a version-agnostic wrapper that finds the latest cached version viasort -V. Newstatusline-auto-repair.shSessionStart hook auto-installs the resolver to~/.claude-octopus/statusline.shand patchessettings.jsonif it detects a stale versioned path.
- 3-tier adaptive statusline: Tier 1 (Node 16+ HUD with smart columns), Tier 2 (bash + jq with context bar/cost/phase), Tier 3 (pure bash with grep/cut — zero external dependencies). Works on any POSIX system regardless of installed tools.
- Node version check: Verifies Node >= 16 before attempting ESM HUD delegation. Node 14-15 users gracefully fall to Tier 2 instead of crashing on
node:protocol imports. - Removed unnecessary timeout from statusline: Claude Code cancels in-flight statusline scripts on new updates per official docs, so
timeoutguard is unnecessary (kept on hooks where it's still needed).
localoutside function —octopus-statusline.shusedlocal wt_suffixat script scope, which aborts underset -e. Broke the entire bash statusline fallback when worktrees were active. Same bug inscheduler-security-gate.shsilently bypassed file path restrictions.- Atomic credential writes —
writeBackCredentialsnow uses temp file +renameSyncwithmode: 0o600. Prevents concurrent sessions from clobbering~/.claude/.credentials.json. - Atomic cache writes —
writeUsageCacheuses temp +renameSyncto prevent torn JSON from concurrent sessions. - Python injection in context-awareness — Bridge file path was interpolated into
python3 -cstring literal. Now passed viaos.environ['BRIDGE_PATH']. - Unsafe
/tmpglob removed —context-awareness.shno longer falls back tols -t /tmp/octopus-ctx-*.json. Exits cleanly whenCLAUDE_SESSION_IDis unset. - 5 additional timeout guards —
plan-mode-interceptor.sh,scheduler-security-gate.sh,sysadmin-safety-gate.sh,telemetry-webhook.sh,agent-teams-phase-gate.shnow have thecommand -v timeoutfallback pattern. Total: 10 hooks hardened. - HUD stdin timeout —
readStdin()now usesPromise.racewith a 5s guard to prevent indefinite hang on unclosed pipes. contextBarclamp —Math.min(10, Math.max(0, ...))preventsRangeErrorif pct > 100 reaches the function.- Bridge file permissions — Written with
umask 0177(owner-only) instead of default umask.
- Smart HUD columns:
smartColumns()auto-detects context and adjusts visible columns — hides Cost for OAuth subscription users, shows Cache/Session/Changes/Tokens only when data is meaningful. Column factory pattern ensures config-ordered rendering."smart": trueis the default; set"smart": falsein.hud-config.jsoncfor manual control. - Octo brand column: New
Octo:column (always first) displays octopus icon, plugin version, and effort level dot. Model column moved to second position, Context column anchors the end. - Context bridge session_id fix: Both statusline hooks now extract
session_idfrom stdin JSON instead of relying onCLAUDE_SESSION_IDenv var (which isn't set for statusLine commands). Context-awareness hook falls back to finding the most recent bridge file when env var is missing. test-hud-smart-mode.sh— 31 tests across 5 groups covering timeout fallback, smart mode, Octo column, context bridge, and functional HUD output.
- Timeout fallback for macOS: All 6 hook files now check
command -v timeoutbefore using GNUtimeout. Falls back to plaincatwhentimeout(GNU coreutils) isn't installed — fixes silent stdin read failures on stock macOS that caused model showing "unknown" and 0% context in the statusline.
- Enhanced HUD rewrite: Full async rewrite of
octopus-hud.mjs(295 → 880 lines). Concurrent API/transcript/version fetching viaPromise.all. First call ~300-500ms, subsequent calls <10ms (all cache hits). - Rate limit tracking: 5h/7d usage from Anthropic OAuth API with color-coded percentages and reset countdown timers. Credential reading from
.credentials.jsonwith macOS Keychain fallback. Token refresh on expiry. 60s/15s cache TTLs. - Transcript-based agent tracking: Parses JSONL transcripts for running/completed agents (Task/proxy_Task tool_use blocks). Background agent tracking, stale agent detection (30 min timeout), max 100 agents in memory. Agent detail tree with
├─/└─prefixes showing type, model, elapsed time, description. - Cache hit rate: Computes cache read vs total tokens from
current_usagefields. Displayed as percentage with color coding. - Version check: Fetches latest Claude Code version from npm registry with 1h cache. Shows update indicator dot when current differs from latest.
- Configurable column system:
~/.claude-octopus/.hud-config.jsoncwith JSONC parsing (supports//comments). 14 columns available, 5 default ON. Vertical (2-row labels+values) and horizontal (single-row compact) layouts. - Tailwind color palette: Replaced basic ANSI (31-37) with 24-bit Tailwind colors — Emerald-600 for good, Amber-600 for warning, Red-600 for critical, Slate-600/700/800 for data/labels/separators.
- Updated
test-enhanced-hud.sh— 30 tests across 6 groups covering rate limit functions, display, enhanced features, Octopus preserved functions, config system, and layout support.
- Enhanced statusline: Gradient context bar (
▰▱), auto-compact warning indicators (⚠at 80%,💀at 90%), active agent name display, project state from.octo/STATE.mdwhen idle. Performance-cached with 2s TTL. - Workflow-aware context warnings:
context-awareness.shnow reads session.json and gives phase-specific advice (probe→"use /octo:quick", tangle→"split into smaller /octo:develop", ink→"focus on verification"). New 80% AUTO_COMPACT severity level. - Session handoff file:
.octo-continue.mdauto-written on PreCompact and SessionEnd. Contains workflow state, pending work, key decisions, blockers, and resume instructions. Read by/octo:resume. - Enhanced intent detection:
user-prompt-submit.shnow has HIGH/LOW confidence levels (2+ keyword hits = HIGH). HIGH confidence injects persona context (security auditor, code reviewer, debugger, TDD orchestrator hints). Provider pre-warming writesprimed_providersto session.json. - New script:
scripts/write-handoff.sh— standalone handoff file generator. - 4 new test suites: enhanced-hud (18), context-awareness-v2 (14), handoff (12), prompt-submit-v2 (12) — 56 new assertions.
- Stdin timeout guards: All 6 hook files now use
timeout 3 catinstead of barecatreads, preventing hook hangs on stdin stalls. - 50KB output guard:
guard_output()inlib/utils.shredirects oversized output to temp files with@file:pointers. Wired intoaggregate_results()andsynthesize_probe_results(). - Agent permission audit: Removed
Agenttool from 7 read-only agents (backend-architect, code-reviewer, security-auditor, performance-engineer, docs-architect, cloud-architect, database-architect). Addedreadonly: trueto 6 agents. RemovedBashfrom security-auditor. - Context bridge: Both statusline hooks (bash + Node.js HUD) now write
/tmp/octopus-ctx-$SESSION.jsonwith context usage data for cross-hook awareness. - Context awareness hook: New
hooks/context-awareness.sh(PostToolUse, blanket) warns at 65% (WARNING) and 75% (CRITICAL) context usage. Debounced every 5 tool calls with severity escalation bypass. - Structured return contracts: All 10 agent files now have
## Output Contractwith COMPLETE/BLOCKED/PARTIAL status markers and per-agent customized sections. - Contract compliance scoring:
score_result_file()Factor 5 adds up to 20 pts for structured status markers in agent output. - Compound init command:
init-workflow)dispatch case returns full environment bundle (providers, models, capabilities, files, paths) as JSON in a single call. - Smart router renamed:
/octo:octo→/octo:auto. The old/octo:octocommand remains as a legacy redirect. 40 commands total. - 6 new test suites: stdin-timeout-guards (12), output-guard (6), agent-permissions-audit (12), context-bridge (12), agent-return-contracts (32), compound-init (17) — 91 new assertions.
- Legacy
claude-octopusinstall detection in doctor and preflight — users who installed before the v9.0 rename tooctonow see a clear diagnostic with the uninstall/reinstall command. (#196)
-
Round 2 speed optimization: 26 echo|grep → bash builtins, 22
$(cat) → $ (<), $(date +%s) caching in 5 hot functions, 124 separator literals → variables. ~100 additional forks eliminated per workflow. - Combined with Round 1 (v9.4.1): orchestrate.sh goes from ~900 subshell forks per workflow to ~70 — a 92% reduction in subprocess overhead.
archive_usage_session()dead function andcost-archivecommand (deprecated with message).
- Missing file guard on
generate_factory_scenarios()—$(<)without[[ -f ]]check could abort underset -e. - Newline regression in
match_routing_rulekeyword matching —grep -qwtreated newlines as word boundaries, space-padding didn't. - Redundant dual
nocasematchblocks inparse_factory_specmerged into single block +casestatement. -
_classify_smoke_errornocasematch wrapped in subshell to prevent leak on future early returns. - Timing skew:
start_time_msinspawn_agentandprobe_single_agentrestored to fresh$(date +%s)(metrics accuracy over micro-optimization).
- Flag pruning, speed optimization (~750 fewer subshell forks), pre-existing test fixes
- Four-way AI debates: Sonnet now participates as a permanent 4th debater alongside Codex, Gemini, and Claude/Opus. Dispatched via
Agent(model: "sonnet", run_in_background: true)— runs in parallel, no added latency, no extra cost. Skill version v4.7 → v4.8. - Auto code review + E2E verification: After any
/octo:develop,/octo:embrace, or/octo:deliverworkflow completes, two Sonnet agents automatically launch in parallel — one code reviewer, one E2E tester. Findings presented before the "what next?" prompt. No manual request needed. - Monolith guard test:
tests/smoke/test-monolith-guard.sh(15 tests) enforces orchestrate.sh line count threshold, lib file existence, no function duplication, and source guards. - Test infrastructure helper:
tests/helpers/grep-octopus.shsearches acrossorchestrate.sh+lib/*.shso tests survive function extraction.
- Wave 1 decomposition: Extracted 3 new lib modules from orchestrate.sh (22,668 → 22,377 lines):
lib/utils.sh(183 lines): json_extract, json_escape, sanitize_external_content, validate_agent_command, validate_output_file, sanitize_review_id, secure_tempfilelib/similarity.sh(103 lines): jaccard_similarity, extract_headings, check_convergence, generate_bigrams, bigram_similaritylib/models.sh(129 lines): get_model_catalog, is_known_model, get_model_capability, list_models
list_models --tierparsing bug:shiftinside aforloop produced wrong results. Replaced with properwhile [[ $# -gt 0 ]]pattern.log()forward-reference in utils.sh: Extracted functions calledlog()before it was defined. Added_utils_log()fallback that uses stderr whenlog()isn't available.validate_output_filesilent failure: WhenRESULTS_DIRwas unset, validation silently rejected all files with a misleading error. Now explicitly checks and reports the missing variable.- 9 review pipeline bugs silently dropping all findings (#182-#190) — see v9.3.1 below for individual fixes.
- awk filter drops codex exec clean stdout: The output filter expected a
--------header separator thatcodex execdoesn't emit on stdout. Now detects clean stdout and passes through directly. (#182) - claude-sonnet agent
-mflag rejected: Claude CLI v2.1.76 requires--model(long form). Updatedclaude-sonnet,claude-opus, andclaude-opus-fastagent commands. (#183) - log() INFO/WARN pollutes captured output:
log()INFO and WARN levels wrote to stdout, corrupting function return values captured via$(). Now all log levels write to stderr. (#183) - check_provider_health uses removed
codex auth status: Codex CLI v0.114 removedauth status. Now checks~/.codex/auth.jsondirectly. (#184) - Claude CLI not found in non-interactive shells: When
~/.local/binisn't on PATH, the script now probes common install locations before falling back. (#185) - Round 1 findings parser feeds full markdown to jq: The parser now extracts the
## Outputsection from result files before JSON parsing, instead of feeding the entire markdown document to jq. (#186) - Gemini provider status never written: Round 1 findings collection now writes provider status events for all agent types, not just codex. (#187)
- LLM JSON wrapped in markdown fences breaks jq: Added fence stripping after
run_agent_syncin Rounds 2, 3 (debate), and 3 (synthesis). (#188) - PURPLE unbound variable crashes setup_wizard: Added
PURPLEcolor variable as alias forMAGENTA. (#189) - Round 1
waitreturns immediately: Replaced barewait(which only catches direct children) with polling for## Status:markers in result files, with 5-minute timeout. (#190)
- Search spiral guard: Research agents get a prompt-level instruction preventing search loops without synthesis. Unconditional in
probe_single_agent(), role-gated (researcher) inspawn_agent(). - Per-role token budget proportions:
get_role_budget_proportion()scalesenforce_context_budget()by role — implementers/researchers get 60%, planners/reviewers 40%, verifiers/synthesizers 25%. Prevents one chatty agent from starving others. - Heuristic learning:
record_run_pattern()records file co-occurrence from successful agent runs to~/.claude-octopus/.octo/patterns.jsonl(capped 200 entries).build_heuristic_context()injects "when modifying X, successful runs usually first read Y" hints (≤500 chars) into future prompts. Kill switch:OCTOPUS_HEURISTIC_LEARNING=off.
enforce_context_budget()now accepts an optional second parameter (role) for budget scaling.
- Codex subagent dispatch intercepted by Codex superpowers skill system: When Codex CLI has "superpowers" skills installed, its skill system intercepts octo's dispatched prompts and forces its own brainstorming workflow instead of responding directly. Fixed by prepending a user-level override preamble to all Codex dispatches that tells the model to skip skills. (#176)
- jq parse error in
code-review: Bash${1:-{}}parameter expansion appended an extra}to the JSON profile string, causing jq parse errors. Fixed by quoting the default value. (#172) - "Argument list too long" with large diffs: The review pipeline passed prompts (including embedded diffs) as CLI arguments, exceeding
ARG_MAXfor PRs with >2000 lines. All agent types now use stdin-based prompt delivery. (#173)
- smart dispatch, blind spot library, skill name fix
- brainstorm Team mode multi-LLM, COMMAND-REFERENCE.md update
- Plugin install/uninstall mismatch: Aligned
marketplace.jsonplugin name from"claude-octopus"to"octo"to matchplugin.json. Install command is nowocto@nyldn-plugins. Fixes/plugin uninstalland/plugin updatefailures.
- 6 new
SUPPORTS_*detection flags (100 total, 31version_compareblocks) from CC v2.1.76. - v2.1.76:
SUPPORTS_MCP_ELICITATION(MCP servers can request structured user input mid-task),SUPPORTS_ELICITATION_HOOKS(Elicitation and ElicitationResult hook events),SUPPORTS_WORKTREE_SPARSE_PATHS(worktree.sparsePathssetting for sparse checkout),SUPPORTS_POST_COMPACT_HOOK(PostCompact hook event fires after compaction),SUPPORTS_EFFORT_COMMAND(/effortslash command for mid-session effort adjustment),SUPPORTS_BG_PARTIAL_RESULTS(killing background agent preserves partial results). test-cc-v2176-sync.sh— tests covering declarations, detection block, logging, wiring, doctor checks, and version comments.test-command-meta-prompt.sh— 8 tests: file integrity, frontmatter, skill reference, core techniques, registration.test-command-prd-score.sh— 11 tests: file integrity, frontmatter with arguments, scoring categories A-D, 100-point framework, grade scale, registration.test-command-staged-review.sh— 9 tests: file integrity, frontmatter, no broken references, compliance block, skill reference, cross-reference validation, registration.
spawn_agent(): Debug log whenSUPPORTS_BG_PARTIAL_RESULTSconfirms background agent partial result preservation (CC v2.1.76+)./octo:doctor: Surfaces/effortcommand availability for mid-session effort adjustment (CC v2.1.76+)./octo:doctor: Checksworktree.sparsePathssetting in~/.claude/settings.jsonfor large monorepo optimization (CC v2.1.76+)./octo:doctor: Surfaces MCP elicitation capability (CC v2.1.76+)./octo:doctor: Warns about--plugin-dirbehavioral change — one path per flag in v2.1.76+ (use repeated flags for multiple dirs)./octo:doctor: Detects claude-mem companion plugin (version, "pass" status) — surfaces MCP tool availability for cross-session memory.scripts/claude-mem-bridge.sh: Integration bridge for claude-mem HTTP API —available,search,observe,contextcommands. All operations non-blocking and fault-tolerant.save_session_checkpoint(): Writes phase completion observations to claude-mem when available (non-blocking background POST).session-start-memory.sh: Queries claude-mem for recent project context at session start and surfaces it.- 6 skill/command files with claude-mem MCP tool hints:
flow-discover.md,flow-define.md,flow-develop.md,flow-deliver.md,skill-debate.md,skill-deep-research.md. /octo:octosmart router: Added claude-mem search hint for routing correction learning.
/octo:reviewdefault focus:["correctness"]→["correctness","security","architecture","tdd"]— all areas reviewed by default./octo:reviewauto-skips interactive prompts whenOCTOPUS_WORKFLOW_PHASEis set (pipeline context from/octo:develop,/octo:embrace, etc.)./octo:review: Added "All areas (Recommended)" focus option — users no longer need to select 4 options individually./octo:brainstorm: Added Solo/Team mode selection — Team mode dispatches parallel brainstorm queries to available providers for diverse AI perspectives./octo:prd: Phase 1 research now dispatches parallel queries to available providers (Codex for technical patterns, Gemini for market landscape) when multi-provider is available./octo:prd-score: Added optional "Rigorous" multi-AI scoring mode — 2-3 providers score independently, then consensus synthesis reduces single-model bias./octo:staged-review: Rewritten with mandatory compliance block, AskUserQuestion for scope selection, interactive next steps, and correct related command references./octo:model-config: Updated staleGPT-5.3-Codex-Sparkreferences toGPT-5.4to match current orchestrate.sh model mappings.
/octo:staged-review: Removed broken references to non-existent/octo:verifyand/octo:shipcommands — replaced with/octo:deliverand/octo:review./octo:review: Codex auth preflight viacheck_codex_auth_freshness()— warns user before silent fallback to claude-sonnet./octo:review: Visible⚠warnings when Codex falls back to claude-sonnet in Round 2 (verification) and Round 3 (debate gate). Users now see why Codex API usage doesn't change.
- 8 new
SUPPORTS_*detection flags (94 total, 30version_compareblocks) from CC v2.1.72 (2 untracked) and v2.1.74 (6 new). - v2.1.72:
SUPPORTS_PARALLEL_TOOL_RESILIENCE(failed Read/WebFetch/Glob no longer cancels sibling tool calls),SUPPORTS_PLAN_WITH_ARGS(/planaccepts description argument). - v2.1.74:
SUPPORTS_AUTO_MEMORY_DIR(autoMemoryDirectorysetting),SUPPORTS_FULL_MODEL_IDS(full model IDs e.g.claude-opus-4-6in agent frontmatter),SUPPORTS_SESSION_END_TIMEOUT(CLAUDE_CODE_SESSIONEND_HOOKS_TIMEOUT_MSenv var),SUPPORTS_CONTEXT_SUGGESTIONS(/contextwith actionable optimization tips),SUPPORTS_PLUGIN_DIR_OVERRIDE(--plugin-diroverrides marketplace),SUPPORTS_MANAGED_POLICY_FIX(managed policyaskrules fix). test-cc-v2174-sync.sh— 36 tests covering declarations, detection blocks, logging, wiring, and version comments.
spawn_agent(): Positive debug log whenSUPPORTS_FULL_MODEL_IDSconfirms full model ID support in agent frontmatter (CC v2.1.74+)./octo:doctor: Surfaces/contextcommand as diagnostic tool for context-heavy sessions (CC v2.1.74+)./octo:doctor: ChecksautoMemoryDirectorysetting in~/.claude/settings.json(CC v2.1.74+).
test-version-check.shTest 5:head -30→head -40— fragile against growing log line count from new flags.
- Smart router v2.0 (
/octo:octo) — Complete rewrite of the natural language workflow router. Routing coverage expanded from 8 → 17 workflows with 9 new intents: debug, security, tdd, docs, quick, design-ui-ux, prd, brainstorm, deck. - Decision tree confidence — Replaced ambiguous percentage-based scoring (
matching/total * 100 + adjustments) with explicit HIGH/MEDIUM/LOW decision tree. Single matched intent + specific target = auto-route. Same-priority conflicts = ask user. - 3-tier priority ordering — Specialized workflows (P1) > Core workflows (P2) > Build workflows (P3). "Analyze the security of our API" now correctly routes to
/octo:security(P1) over/octo:discover(P2). - Context efficiency — 382 → 204 lines (47% reduction). Deduplicated 3x-repeated routing table (docs, execution contract, examples) to single authoritative source in execution contract.
- Meta command handler —
/octo:octo helpdisplays all 17 workflows in 4 categories (Core, Engineering, Creative & Documentation, Quick). - Input length guard — Queries >500 chars truncated for intent analysis; full query passed to target workflow.
- Routing analytics — Decisions appended to
~/.claude-octopus/routing.logwith timestamp, intent, confidence, and target. - Routing memory — Auto-memory corrections on rejected suggestions enable preference learning across sessions.
test-smart-router.sh— 65 static analysis tests: routing table integrity, backing file existence for all 17 targets, P0 fix validation, decision tree verification, priority ordering, meta commands, category groupings, removed features, file size.
- P0: Broken validation routing —
Skill: "validate"invoked non-existent skill. Changed toSkill: "review". Any query with validation intent was silently failing. - Flaky
test-debug-mode-simple.sh— Tests 4 & 5 checked for "Command:" and "spawn_agent:" in--debug --dry-runoutput, but probe caching short-circuited beforespawn_agent()runs. Replaced with static analysis of orchestrate.sh source.
- Unimplemented "chain workflows" documentation (set false user expectations).
- Model override example from command docs (
OCTOPUS_CODEX_MODELin examples — minor prompt injection surface).
- Multi-agentic
/octo:research— Refactored from singleBash(orchestrate.sh probe)call (120s timeout) to parallelAgent(run_in_background=true)subagents. Each perspective callsorchestrate.sh probe-singleindependently — no timeout constraint. Claude synthesizes in-conversation instead of Gemini synthesis that frequently timed out. - User-configurable research intensity —
/octo:researchand/octo:discovernow ask intensity before launching: Quick (2 agents, 1-2 min), Standard (4-5 agents, 2-4 min), Deep (6-7 agents with web search, 3-6 min). Intensity passed via[intensity=quick|standard|deep]in Skill args. - Gemini-first launch ordering — Higher-latency Gemini agents launch first, then Codex, then Claude Sonnet, then Perplexity, reducing total wall-clock time.
probe_single_agent()— Standalone single-perspective probe function in orchestrate.sh. Handles persona application, context budget, credential isolation, auth retry, and result file writing.probe-singledispatch command — Callsprobe_single_agent()from Agent tool subagents.test-probe-single.sh— 26 static analysis tests for probe-single function, dispatch, flow-discover integration, command alignment, and backward compatibility.
test-knowledge-routing.sh— Fixed pre-existing SIGPIPE flake caused bygrep -qwithset -eo pipefail(replaced withgrep -c >/dev/nullper known gotcha).
flow-discover.mdSTEP 3.5-7 rewritten: fleet building by intensity, parallel Agent dispatch, result collection with graceful degradation (min 2 results), structured in-conversation synthesis.discover.md4-option depth → 3-option intensity question, aligned withresearch.md.test-enforcement-pattern.shscoped exceptions: flow-discover may use Agent tool (not Bash) and direct synthesis file pattern (notfind -mmin).- Backward compatible:
probe_discover(),discover|research|probedispatch, and/octo:embracepath all untouched.
readonly: truefrontmatter — Addreadonly: trueto any agent persona.mdfile to enforce read-only tool policy (blocks Write/Edit/Bash modifications). Implemented viaget_agent_readonly()with awk-based frontmatter parsing, newagent_nameparam inapply_tool_policy()andapply_persona().backend-architectadded as live example.- User-scope agents (
~/.claude/agents/) — Personal agent personas placed in~/.claude/agents/*.mdare automatically discovered for description lookup and agent listing.USER_AGENTS_DIRconstant; plugin agents take precedence on name collision. /octo:resume <agent-id>— Resume a previous Claude agent by transcript ID. Wrapsresume_agent()via newagent-resumedispatch case. RequiresSUPPORTS_CONTINUATION(CC v2.1.55+) andSUPPORTS_STABLE_AGENT_TEAMS.
get_agent_readonly()— awk-based YAML frontmatter parser (nothead -20 | grep) to avoid false positives in body contentapply_persona()4th paramagent_name, threaded toapply_tool_policy()3rd paramspawn_agent()pre-computescurated_name_earlybeforeapply_personacall- OpenClaw registry rebuilt (89 entries)
- 39 commands, 50 skills
- CC v2.1.73 feature sync — 6 new detection flags (86 total, 28 version_compare blocks):
SUPPORTS_MODEL_OVERRIDES— CCmodelOverridessetting for custom provider model IDs (e.g. Bedrock inference profile ARNs)./octo:doctorsurfaces this on enterprise backends.SUPPORTS_LOOP_ENTERPRISE_FIX—/loopnow works on Bedrock/Vertex/Foundry and when telemetry is disabledSUPPORTS_SUBAGENT_MODEL_FIX—model: opus/sonnet/haikufrontmatter no longer silently downgraded on enterprise.spawn_agent()warns when running on enterprise without this fix.SUPPORTS_SESSION_RESUME_HOOK_FIX—SessionStarthooks fire exactly once on--resume/--continue(was double-firing)SUPPORTS_BG_PROCESS_CLEANUP— background bash processes spawned by subagents are cleaned up on agent exitSUPPORTS_SKILL_DEADLOCK_FIX— no deadlock when 50 skill files load duringgit pull./octo:doctorwarns on CC < v2.1.73.
- Multi-LLM /octo:review — 3-round parallel fleet (Codex + Gemini + Claude + Perplexity), inline PR comments, REVIEW.md support, verified findings
- Fix /octo:setup command name mismatch
- Update setup.md troubleshooting with correct manual reinstall steps for broken plugin update UI (#17)
- Relevance-aware synthesis, CC pre-prompt alignment, model catalog, usage reporting, test fixes
- Provider activation reliability: synthesis timeout recovery, claude-sonnet agent capture, model updates
- Cost estimate placement in embrace workflow (test regression fix)
- Claude Code v2.1.72 feature sync: 8 new detection flags, effort symbols, cron control
- Codex OAuth freshness check in preflight
synthesize-proberecovery command for timeout resilienceOCTOPUS_FORCE_LEGACY_DISPATCHfor reliable claude-sonnet capture
- Dual-backend scheduler: guided wizard, job dashboard, coworkd/daemon detection
- Skill directive WHY reasoning, improved descriptions for better triggering
- Reaction engine —
scripts/reactions.shprovides configurable auto-response to agent lifecycle events. Detects CI failures, review comments, stuck agents, and PR approvals. Dispatches actions: forward CI logs to agents, forward review comments, notify, escalate. Retry tracking with max retries and escalation timeout (default 30m for CI, 60m for reviews). - 13-state PR lifecycle — agent registry expanded from 4 statuses (running, retrying, done, failed) to 13: running, retrying, pr_open, ci_pending, ci_failed, review_pending, changes_requested, approved, mergeable, merged, done, failed, stuck.
- Reaction inbox — agents receive CI failure logs and review comments in
~/.claude-octopus/agents/reactions/inbox/<agent-id>/for processing. - Escalation with timeout — if an agent exceeds max retries or escalation timeout, the
reaction engine displays a prominent escalation notice and logs to
escalations.log. - Project-level reaction config —
.octo/reactions.confoverrides embedded defaults using pipe-delimited rules (EVENT|ACTION|MAX_RETRIES|ESCALATE_AFTER_MIN|ENABLED).
agent-registry.sh health --react— new--reactflag fires the reaction engine after detecting state changes. Health checks now monitor all active agents (not just running/retrying).flow-parallel.mdmonitoring loop — reaction engine fires between poll cycles to auto-handle CI failures and review comments while work packages execute./octo:sentinel— execution contract now includes reaction engine step after triage, so CI failures and review comments are auto-forwarded to agents during monitoring.- Agent registry cleanup —
mergedstatus treated as terminal alongsidedoneandfailed.
- Agent registry —
scripts/agent-registry.shprovides persistent lifecycle tracking for spawned coding agents. Tracks agent ID, branch, worktree path, status, PR number, and CI status across sessions. Commands: register, update, get, list, health, cleanup. - Worktree-per-agent in
/octo:parallel— each work package now runs in its own isolated git worktree, eliminating file write contention when multiple agents modify files simultaneously. Worktrees are auto-created before launch and cleaned up after completion. - PR comment posting —
/octo:review,/octo:staged-review, and/octo:delivernow detect open PRs on the current branch and post review findings as PR comments viagh pr comment. Auto-posts in automated workflows (embrace, factory), asks first in standalone mode.
flow-parallel.mdlaunch template — work packages create isolated worktrees, register in agent registry on spawn, and update registry status on completion or failure.flow-deliver.md— Step 7 now includes PR comment posting after validation report.skill-code-review.md— added post-review PR comment section with auto/ask behavior.skill-staged-review.md— combined report posted to PR when available.
- Context-aware quality injection —
flow-develop.mdandflow-deliver.mdnow detect 6 dev subtypes (frontend-ui, cli-tool, api-service, infra, data, general) and inject domain-specific quality criteria into provider prompts. Frontend tasks get accessibility and self-containment rules; CLI tasks get exit code and help text checks; API tasks get input validation and auth requirements. - BM25 design intelligence auto-injection — when
frontend-uisubtype is detected in the develop phase, the BM25 search engine is queried for style and UX patterns relevant to the task, injected directly into the provider prompt. - Reference integrity gate —
quality-gate.shnow scans recently created HTML, shell scripts, and Docker Compose files for broken file references (missing scripts, stylesheets, sourced files, Dockerfiles). Blocks with actionable error listing each broken reference. - Three-way adversarial design critique —
/octo:design-ui-uxnow runs a mandatory critique step between Define and Develop phases. Codex (implementation critique), Gemini (ecosystem critique), and Claude (independent design critique) all review the proposed design direction in parallel. Issues are triaged, fixes applied, and a visible revision diff is shown before tokens/components are generated.
- Implementer persona — added deliverable integrity rules: every referenced file must exist, prefer self-contained deliverables, single artifacts stay as one file.
- Researcher persona — added output quality bar: evidence-backed claims, trade-off disclosure, explicit uncertainty acknowledgment.
- Synthesizer persona — added synthesis integrity rules: explicit conflict surfacing, completeness validation against original request, standalone output requirement.
- Task decomposition — both
tangle_develop()andmap_reduce()now include cohesion rules preventing single-deliverable fragmentation. "2-6 subtasks; fewer is better when tightly coupled" replaces the old "4-6 independent subtasks." aggregate_results()— now synthesizes via Gemini instead of concatenating markdown files. Falls back to concatenation if Gemini unavailable.- Design workflow banner — now shows provider availability (Codex, Gemini, Claude) and the critique phase in the pipeline indicator.
- Mandatory compliance blocks on all 8 workflow commands (embrace, discover, define,
develop, deliver, plan, review, security) — Claude is now explicitly prohibited from
skipping workflows it judges "too simple." Addresses user reports of
/octo:embracebeing bypassed for straightforward tasks. - Interactive next-steps after every workflow completes — all phase commands and embrace
now ask the user what to do next via
AskUserQuestioninstead of ending silently. - Anti-injection nonces (
sanitize_external_content()in orchestrate.sh) — wraps file-sourced content (memory files, provider history, earned skills) in random hex boundary tokens to prevent prompt injection from untrusted external content. - Session learnings layer —
session-end.shnow writesoctopus-learnings.mdto auto-memory with per-session meta-reflection (workflow, phase, agent calls, errors, debate). - Feature gap analysis —
docs/FEATURE-GAP.mdliving document tracks all 72 CC feature flags with Green/Yellow/Red adoption status and gap closure history. - Multi-LLM debate gates in embrace, plan, review, security, and define commands — optional Claude + Codex + Gemini deliberation at workflow transition points.
- Reinstated
/octo:debateand/octo:researchcommands wrongly removed in v8.41.0 consolidation. These had unique standalone functionality (three-way AI debates and deep multi-AI research respectively). - Removed "Don't use for" sections from phase commands that contradicted mandatory compliance blocks and encouraged Claude to skip workflows.
- Command count corrected: 36 → 38 (debate + research reinstated).
- OpenClaw registry updated: 86 → 88 entries (debate + research commands).
- All debate-related options across commands now explicitly say "Multi-LLM" and name all three models (Claude + Codex + Gemini) so users understand what they're enabling.
- 3 new hook events registered in hooks.json:
PreCompact— persists workflow state (phase, decisions, blockers) before context compactionSessionEnd— finalizes metrics, persists preferences to auto-memory, cleans up session artifactsUserPromptSubmit— classifies task intent via keyword matching for improved skill routing
- 10 native agent definitions in
.claude/agents/mirroring top personas:- security-auditor, code-reviewer, backend-architect, tdd-orchestrator, debugger, performance-engineer, frontend-developer, docs-architect, cloud-architect, database-architect
- Persona-agent sync test ensuring every agent definition has a matching persona file
- Auto-memory integration: SessionEnd hook writes
octopus-preferences.mdto project memory with autonomy mode, provider config, and last update timestamp enable-http-telemetry.shscript for converting shell-based telemetry to native HTTP hooks (CC v2.1.63+)- Mixed models integration:
_get_agent_model_raw()now checksCLAUDE_MODELenv var (Priority 0.5) for Claude-side agents, respecting native CC model settings without duplicate config - Spec mode plan view alignment:
flow-spec.mdStep 7.5 usesEnterPlanModefor NLSpec review when VSCode plan view is available (CC v2.1.70+), with graceful terminal fallback - 89-test suite (
test-v8.41.0-feature-adoption.sh) covering hooks, agents, sync, droids, telemetry, and auto-memory - Factory droid generation in
build-factory-skills.sh— generatesagents/droids/from.claude/agents/so Factory AI discovers native droids alongside Claude Code agent definitions - Native HTTP telemetry hook in hooks.json (
"type": "http") alongside shell fallback; shell hook skips whenSUPPORTS_HTTP_HOOKS=trueto avoid double telemetry - SessionStart auto-memory restoration (
session-start-memory.sh) — reads persisted preferences fromoctopus-preferences.mdon session start and injects them intosession.json
- Command consolidation: 13 thin wrapper commands removed (49 → 36 commands)
- 8 pure wrappers deleted: issues, ship, rollback, debate, resume, setup, validate, status
- 5 flow aliases deleted: probe, grasp, tangle, ink, research
- Matching skills now have
user-invocable: truefrontmatter for direct invocation
- Hook event count: 10 → 13 (PreCompact, SessionEnd, UserPromptSubmit)
- Total hook scripts: 25 → 29
- Task manager simplified:
create_embrace_tasks()andcreate_phase_task()deprecated in favor of native TodoWrite for Claude-side task tracking - Telemetry webhook updated: native HTTP hook entry in hooks.json with shell fallback;
shell hook has
SUPPORTS_HTTP_HOOKSguard to skip when HTTP hooks are active
- 6 new Claude Code feature detection flags for v2.1.70-71:
SUPPORTS_VSCODE_PLAN_VIEW— VSCode full markdown plan view with comments (v2.1.70+)SUPPORTS_IMAGE_CACHE_COMPACTION— compaction preserves images for prompt cache reuse (v2.1.70+)SUPPORTS_RENAME_WHILE_PROCESSING—/renameworks during processing (v2.1.70+)SUPPORTS_NATIVE_LOOP— native/loopcommand + cron scheduling tools (v2.1.71+)SUPPORTS_RUNTIME_DEBUG—/debugtoggle mid-session (v2.1.71+)SUPPORTS_FAST_BRIDGE_RECONNECT— bridge reconnects in seconds instead of 10 minutes (v2.1.71+)
- Effort level callout in agent spawn output when
SUPPORTS_EFFORT_CALLOUTis true (wires previously dead flag) - Agent-type capture in SubagentStop hook for per-agent cost attribution (
SUPPORTS_HOOK_AGENT_FIELDS) - Memory-safe timeout boost: complex/debate/audit tasks get +60s timeout when CC has memory leak fixes (v2.1.63+)
- Total feature detection flags: 66 → 72 (covering CC v2.1.12 through v2.1.71)
- Detection thresholds: 22 → 24 version checkpoints
- Codex agent 401 auth failure:
build_provider_env()output contained escaped quotes that became literal characters afterread -ra, corruptingHOMEpath and preventing Codex CLI from finding~/.codex/auth.json(Issue #117) - Added regression tests for literal quote detection in credential isolation
- GPT-5.4 model support:
gpt-5.4($2.50/$15 MTok) andgpt-5.4-pro($30/$180 MTok, API-key only) -
gpt-5-codex-mini($0.25/$2.00 MTok) — budget model replacinggpt-5.1-codex-mini -
gpt-5base model ($1.25/$10 MTok) -
o3-pro($20/$80 MTok) ando3-mini($1.10/$4.40 MTok) reasoning models (API-key only) - OAuth vs API-key availability documentation for all OpenAI models
- Default codex premium model:
gpt-5.3-codex→gpt-5.4 - Default codex-max model:
gpt-5.3-codex→gpt-5.4 - Default codex-mini model:
gpt-5.1-codex-mini→gpt-5-codex-mini - Default codex-review model:
gpt-5.3-codex→gpt-5.4 - Stale model migration targets updated to
gpt-5.4
-
gpt-5.1-codex-minipricing corrected: $0.30/$1.25 → $0.25/$2.00 per MTok - Bash 3.2 compatibility: replaced
${var^}and${var,,}(Bash 4+) with POSIX-compatible_ucfirst()/_lowercase()helpers — fixesocto:embraceon stock macOS (Issue #108)
- Factory AI command discoverability: all commands now prefixed with
octo-(e.g.,/octo-embrace,/octo-discover) to mirror Claude Code's/octo:*namespace — Factory has no automatic plugin namespacing so commands were invisible when typing/octo
- Factory AI commands not working:
build-factory-skills.shnow strips Claude Code-specific frontmatter (command,aliases,redirect,version,category,tags) from generated commands, keeping only Factory-compatible fields (description,argument-hint,allowed-tools,disable-model-invocation)
scripts/build-factory-skills.sh— generates Factory AI-compatibleskills/<name>/SKILL.mddirectories from.claude/skills/*.mdsources- Generated
skills/directory at plugin root with 44 Factory-format skill files (6 human_only skills excluded)
- Factory skill discovery: replaced symlink approach (v8.38.0) with build-generated skill directories — Factory clones strip symlinks
- Factory skills use simplified frontmatter (
name,version,description) with trigger content merged into descriptions - Updated
docs/FACTORY-AI.mdto document build-based approach and Factory's skills-only model
- Root-level
commandsandskillssymlinks (Factory clone doesn't preserve symlinks; Factory has no commands concept)
- Factory AI Droid not discovering skills after plugin install (symlinks from v8.38.0 broken by Factory's clone process)
- Root-level
commandsandskillssymlinks pointing to.claude/commandsand.claude/skillsfor Factory AI Droid auto-discovery - Cross-platform discovery documentation in
docs/FACTORY-AI.md
- Simplified
.factory-plugin/plugin.json— removedskillsandcommandsarrays (Factory uses directory-based auto-discovery, not manifest arrays) - Updated troubleshooting in
docs/FACTORY-AI.mdwith symlink verification steps
- Factory AI Droid not discovering slash commands after plugin install (no
commands/orskills/at plugin root)
STEELMAN.md— internal competitive analysis moved out of public repoSAFEGUARDS.md— plugin name lock docs consolidated intodocs/PLUGIN_NAME_SAFEGUARDS.mddeploy.shandscripts/deploy.sh— deployment validation redundant with CIinstall.sh— marketplace install is the supported method.npmignore— not published to npm
- Trimmed
CHANGELOG.mdfrom 5,382 to ~220 lines — pre-8.22.0 history available via GitHub Releases - Updated
package.jsonfilesarray to remove deleted files - Updated safeguard references in
.claude-plugin/README.mdanddocs/PLUGIN_NAME_SAFEGUARDS.md
- Factory AI dual-platform support —
.factory-plugin/plugin.jsonmanifest, auto-detection of Claude Code vs Factory Droid runtime - Platform detection shim in
orchestrate.sh—OCTOPUS_HOSTvariable (claude/factory/standalone) detect_claude_code_version()now handles Factory Droid viadroid --versionwith feature parity assumptiondocs/FACTORY-AI.md— install guide, architecture notes, troubleshooting for Factory AI users- Factory AI install instructions in README with marketplace and direct install methods
- Adaptive reasoning effort per phase —
get_effort_level()now wired intospawn_agent(), gated bySUPPORTS_OPUS_MEDIUM_EFFORT(CC v2.1.68+) - Worktree branch display in statusline — shows active worktree branch when agents run in isolation (CC v2.1.69+)
- InstructionsLoaded hook — injects dynamic workflow context (phase, autonomy, recent results) when CLAUDE.md loads (CC v2.1.69+)
- Recurrence detection, issue categorization, JSONL decision logging, CodeRabbit integration
- UI/UX design workflow with BM25 design intelligence
- Marketing, finance, legal personas and IDE integration
- Add /octo:batch alias and strengthen parallel quality defaults
- Multi-model intelligence improvements
- Agent continuation/resume for iterative tangle retries
- Context Compaction Survival (P0): SessionStart hook (
context-reinforcement.sh) re-injects Iron Laws after context compaction. Enforcement rules no longer lost on conversation compression. - Description Trap Audit (P1): 5 skill descriptions rewritten to opaque, outcome-focused format. Prevents model from skipping full skill reads.
- XML Enforcement Tags (P1):
<HARD-GATE>tags on 5 Iron Laws for higher model compliance. Applied to skill-deep-research, skill-factory, skill-tdd, skill-verify, skill-debug. - Human-Only Skill Flag (P1):
invocation: human_onlyon 5 expensive skills — prevents auto-triggering without explicit user invocation. - Two-Stage Review Pipeline (P2): New
skill-staged-review.md— Stage 1 validates spec compliance against intent contract, Stage 2 runs stub detection and code quality. Gate between stages. - EnterPlanMode Interception (P2): PreToolUse hook (
plan-mode-interceptor.sh) re-injects enforcement rules when entering plan mode.
- Changelog Integration (Claude Code v2.1.46-v2.1.59): 9 new feature flags, 2 new version detection blocks (v2.1.51+, v2.1.59+). Tracks remote control, npm registries, fast Bash, disk persistence, account env vars, managed settings, native auto-memory, agent memory GC, smart Bash prefixes.
- Worktree Lifecycle Hooks: WorktreeCreate and WorktreeRemove handlers (
worktree-setup.sh,worktree-teardown.sh). Propagates provider env vars, copies.octostate, cleans up on teardown. 8 hook event types (was 6). - Settings Enhancement: 8 new configurable defaults — Codex sandbox, memory injection, persona packs, worktree isolation, parallel agent limit, quality gate threshold, cost warnings, tool policies.
- Doctor Agents Category: 10th diagnostic category. Checks agent definitions, worktree coverage, native CLI registration, version compatibility warnings.
- Native Auto-Memory Delegation: When v2.1.59+ detected, skip redundant project/user memory injection. Retain provider-specific cross-session context.
- Agent Isolation Expansion: security-auditor and deployment-engineer now use worktree isolation (10 agents total, was 8).
- Dark Factory Mode (closes #37): Spec-in, software-out autonomous pipeline with
/octo:factorycommand. Wraps embrace workflow with scenario holdout testing (E19), satisfaction scoring (E21), and non-interactive execution (E22). 7 new functions:parse_factory_spec,generate_factory_scenarios,split_holdout_scenarios,run_holdout_tests,score_satisfaction,generate_factory_report,factory_run. Weighted 4-dimension scoring (behavior 40%, constraints 20%, holdout 25%, quality 15%) with PASS/WARN/FAIL verdicts. Retry on failure with remediation context. Artifacts stored at.octo/factory/<run-id>/.
- Add missing /octo:claw and /octo:doctor command files
- Add /octo:claw OpenClaw sysadmin command, /octo:doctor health diagnostics, and openclaw-admin standalone repo
- OpenClaw Runtime API Mismatch: Rewrite OpenClaw extension to match the actual
OpenClawPluginApicontract fromopenclaw@2026.2.22-2. Replacesapi.getConfig()(non-existent method) withapi.pluginConfig,api.log()withapi.logger, and migrates tool format from custom{run, parameters: JSON}to the realAgentToolinterface using{execute, parameters: TypeBox, label}with properAgentToolResultreturn type (closes #50).
- Add release.sh automation script
- OpenClaw Register Crash: Guard
api.getConfig()with?? {}fallback — OpenClaw passesundefinedconfig during initial registration, causingTypeError: Cannot read properties of undefined (reading 'enabledWorkflows')(closes #48).
- Coverage CI Job: Removed the coverage report CI job that consistently failed due to 37% coverage (below 80% threshold) and missing GitHub API permissions.
- OpenClaw Install Registration: Changed
package.jsonname from@octo-claw/openclawto@octo-claw/octo-clawso install directory matches manifest idocto-claw. OpenClaw derives config entry key from unscoped package name, so it must match manifest id or config validation fails withplugin not found(closes #45). - CI Coverage Permissions: Added
pull-requests: writeandissues: writepermissions to test workflow. Made PR comment step non-fatal withcontinue-on-error.
- Validation: Added check that
openclaw.plugin.jsonid matches unscoped package name to prevent registration mismatch.
- OpenClaw Dist Shipping: Whitelisted
openclaw/dist/andmcp-server/dist/in.gitignoreso compiled extension files ship with the repo — fixes install failure (closes #41). - CI Test Suite: Fixed
((0++))arithmetic crashes underset -ein 3 unit tests andbuild-openclaw.sh. Fixed integration test assertions for.gitignorepatterns and insufficient grep context windows. All 58 tests now pass.
- Branch Protection: Enabled on
mainrequiring Smoke Tests, Unit Tests, and Integration Tests CI checks. Enforced for admins. - Pre-push Hook: Added git pre-push hook running full test suite before every push.
- Validation: Added
dist/index.jsexistence check totests/validate-openclaw.shto prevent regression.
- Test Suite: Resolved all 24 pre-existing test failures — 22/22 tests now pass. Deleted 10 tests for non-existent features or architectural incompatibility. Fixed 12 tests covering path calculation, bash arithmetic under
set -e, plugin name assertions, insufficient grep context windows, and pattern mismatches. - OpenClaw Manifest: Added required
idfield toopenclaw.plugin.json— fixes gateway crash on startup (closes #40).
- OpenClaw Identity: Renamed OpenClaw-facing identity from
claude-octopustoocto-clawacross plugin manifest, package names (@octo-claw/openclaw,@octo-claw/mcp-server), MCP server name, and.mcp.jsonserver key. GitHub repo URLs unchanged. - Validation: Added
idfield check totests/validate-openclaw.shto prevent regression.
OpenClaw Compatibility Layer — Three new components enable cross-platform usage without modifying the core Claude Code plugin:
-
MCP Server (
mcp-server/): Model Context Protocol server exposing 10 Octopus tools (octopus_discover,octopus_define,octopus_develop,octopus_deliver,octopus_embrace,octopus_debate,octopus_review,octopus_security,octopus_list_skills,octopus_status). Auto-starts via.mcp.jsonwhen plugin is enabled. Built with@modelcontextprotocol/sdk. -
OpenClaw Extension (
openclaw/): Adapter package for OpenClaw AI assistant framework. Registers Octopus workflows as native OpenClaw tools. Configurable viaopenclaw.plugin.jsonwith workflow selection, autonomy modes, and path resolution. -
Shared Skill Schema (
mcp-server/src/schema/skill-schema.json): Universal JSON Schema for skill metadata supporting both Claude Code and OpenClaw platforms. Defines name, description, parameters, triggers, aliases, and platform-specific configuration.
Build Tooling:
scripts/build-openclaw.sh— Generates OpenClaw tool registry from skill YAML frontmatter (90 entries).--checkmode for CI drift detection.tests/validate-openclaw.sh— 13-check validation suite covering plugin integrity, OpenClaw manifest, MCP config, registry sync, and schema validation.
Zero modifications to existing plugin files. Compatibility layers wrap around the plugin via:
.mcp.jsonat plugin root (Claude Code auto-discovers this)openclaw/directory with separatepackage.jsonand extension entry pointmcp-server/directory with separatepackage.jsonand MCP server
All execution routes through orchestrate.sh — behavioral parity guaranteed.
For versions prior to 8.22.0, see the GitHub Releases page.