ADR-273: Implement core step-machine — dev_team.py, dev-team.md loop, task-runner agent by jodavis-claude · Pull Request #27 · jodavis/agent-plugins

jodavis-claude · 2026-06-09T23:53:39Z

Work item

ADR-273 — Refactor dev_team.py into a re-entrant step machine that emits JSON action descriptors instead of spawning sub-agents, introduce an orchestration loop in dev-team.md, and create a new task-runner agent so sub-agents inherit full MCP credentials from the top-level Claude Code session.

Changes

plugins/dev-team/scripts/dev_team.py — Refactored from subprocess-spawn orchestrator to step machine. Removes call_agent(), _MdStreamWriter, _resolve_claude(), CLAUDE_CMD, concurrent.futures, and threading. Adds exit_with_action(), compute_context_path(), _apply_counter_updates(), _handle_agent_failure(), and _handle_agent_success(). Adds --context-file as a required CLI argument; log directory is now derived from context_path.parent / "logs". PipelineContext gains signoff_cycle_count, consecutive_failures, review_cycle_count, troubleshooter_input, pending_agent, signoff_review, and signoff_research with full save/load roundtrip. Every call_agent() call site is replaced with exit_with_action({...}). Workflow-file mechanism is unchanged. DevTeamPipeline.run() calls _apply_counter_updates() after each step and checks troubleshooter thresholds.
plugins/dev-team/scripts/test_dev_team.py — New pytest file with 31 tests covering exit_with_action, compute_context_path (with and without DEV_TEAM_STATE_DIR), and counter increment/reset logic for all three counters with save/load roundtrips.
plugins/dev-team/agents/task-runner.md — New agent definition with Read, Write, Edit, Skill, Bash, Glob, Grep tools. Implements the seven-step read-context → invoke-skill → write-result → return-one-line protocol.
plugins/dev-team/commands/dev-team.md — New orchestration loop skill. Computes the context file path from DEV_TEAM_STATE_DIR/~/.dev-team/<repo-slug>/<id>.md, drives the step-machine loop, spawns task-runner or troubleshooter agents, and handles all three troubleshooter outcomes (continue, terminate, needs_user_input).
plugins/dev-team/commands/implement.md — Updated to invoke the dev-team skill instead of run-workflow.
plugins/dev-team/commands/fix.md — Updated to invoke the dev-team skill instead of run-workflow.
plugins/dev-team/commands/run-workflow.md — Deprecated with a redirect header pointing to the new mechanism.

Design decisions

Script as a step machine, not a daemon. dev_team.py runs to the first agent boundary, emits {"action": "spawn_agent", ...} on stdout, and exits 0. dev-team.md loops — it re-invokes the script after every agent return. The script never blocks waiting for an agent; the loop lives in the top-level Claude Code session.

Context file is the only communication channel. Each step's re-entrant entry point reads the context file to determine whether its agent sub-task has already been completed (e.g., ResearchStep.run() checks if ctx.brief: return "research_done" before emitting the exit descriptor). The task-runner writes the agent's output into the named section that the script checks on the next invocation.

SignoffStep sub-step tracking. Because only one agent can be spawned per script invocation, the former parallel ThreadPoolExecutor pattern is replaced with sequential sub-step completion tracked in frontmatter fields (signoff_review, signoff_research), requiring up to 3 loop iterations for one signoff pass.

compute_context_path() helper retained in dev_team.py. The spec requires unit tests for path computation; the helper is exported from the script (tested but not called in main() since --context-file is always passed by dev-team.md).

… task-runner agent - dev_team.py: refactored from subprocess-spawn orchestrator to step machine. Each step exits with a JSON descriptor when an agent is needed (exit_with_action). Removes call_agent(), _MdStreamWriter, _resolve_claude(). Adds --context-file required arg; log_dir derived from context file path. PipelineContext gains signoff_cycle_count, consecutive_failures, review_cycle_count, troubleshooter_input, pending_agent, signoff_review, signoff_research fields. Troubleshooter trigger conditions check thresholds and route to exit_with_action. - test_dev_team.py: 31 unit tests covering exit_with_action (subprocess isolation), compute_context_path (with/without DEV_TEAM_STATE_DIR), and counter increment/reset logic for all three counters (signoff_cycle_count, review_cycle_count, consecutive_failures) including save/load roundtrip. - agents/task-runner.md: new agent that owns the orchestration protocol — reads context sections, invokes a skill, writes result to context file, returns exactly one result_format line. - commands/dev-team.md: new orchestration loop skill that computes the context file path, drives the step-machine loop, and spawns task-runner or troubleshooter agents via the Agent tool. - commands/implement.md, fix.md: updated to invoke dev-team skill instead of run-workflow. - commands/run-workflow.md: deprecated with redirect to dev-team.md. Key decisions: SignoffStep runs reviewer-sign-off and researcher-validate sequentially (not in parallel) tracking completion via context file sections; PR URL parsed from "PR URL" section by PipelineContext.load(); compute_context_path provided as tested helper; review_cycle_count increments on every ReviewStep completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ected

github-actions · 2026-06-09T23:53:47Z

build-and-test: Python test results

Status: ✅ Passed

Test log

jodavis-claude

Review: ADR-273 Core Step-Machine

Overall, the architecture is correct and the implementation covers most exit criteria. Three correctness issues found; one security note; one style point.

Priority 1 — Correctness

[P1-A] ReviewStep re-entry returns stale data after a fix-pr cycle. ReviewStep.run() checks if ctx.review_notes: at sub-step 2 and returns the parsed status immediately — but ctx.review_notes is never cleared between review cycles. When the workflow returns to the reviewing state after a FixPrStep pass, ctx.review_notes still holds the previous cycle's output, so the reviewer is never re-spawned. The step returns the old approved/changes_requested decision and never produces new feedback.

[P1-B] _handle_agent_success incorrectly resets consecutive_failures for non-agent steps. DevTeamPipeline.run() calls _handle_agent_success(self.ctx) after every step.run() return — including FindSpecStep and ValidateStep, which run in-process scripts with no agent involvement. A passing build clears the counter even when the surrounding agent calls were all failures, masking true consecutive failure counts.

[P1-C] researcher-validate result is checked by substring "failed", not JSON parsing. ctx.signoff_research holds the full researcher output (JSON array of exit criteria written by the task-runner). The old code properly parsed "status": "fail" / "partial" items; the new code does "failed" in ctx.signoff_research.lower(). A criterion JSON containing "fail" (not "failed") will be silently ignored. A description containing "never failed" will false-positive.

Priority 2 — Security note

[P2] troubleshooter_input Python one-liner embeds user answer unsanitised. The dev-team.md one-liner for needs_user_input interpolates <user_answer> directly into a Python string literal (answer = '<user_answer>'). A user answer containing a single-quote or backslash breaks the Python syntax or injects code. This should use a temp file or environment variable rather than shell string interpolation.

Priority 5 — Style (non-blocking)

[P5] Test helpers named _make_ctx() instead of make_sut(). CONTRIBUTING.md requires make_sut() for the subject-under-test factory function.

…er on next cycle

…p; add explicit calls in SignoffStep on each agent result detection

…ch with JSON-array fallback, replacing fragile substring check

… inline string interpolation to prevent shell injection

…convention

jodavis-claude

Sign-off review — all prior comments addressed.

[P1-A] review_notes clearing ✓ Resolved — ctx.review_notes = "" added in FixPrStep.run() on the agent-success path (line 895), ensuring ReviewStep always re-spawns the reviewer after a PR-fix cycle.

[P1-B] _handle_agent_success unconditional ✓ Resolved — removed from the pipeline loop; explicit calls now appear in SignoffStep.run() at the two points where agent results are detected (signoff_review populated at line 742, signoff_research populated at line 770).

[P1-C] Researcher-validate result parsing ✓ Resolved — new _researcher_validated() helper (lines 395–417) checks exact indicators first ("validated"/"failed"), then falls back to JSON-array parsing. Tests cover the "never failed" false-positive case.

[P2] Shell injection ✓ Resolved — user answer is now passed via stdin with a single-quoted heredoc (<<'ANSWER_HEREDOC'); Python reads it via sys.stdin.read(). No remaining shell injection vector from the user-supplied answer.

New issue noted (non-blocking, style/correctness): The re.sub replacement in dev-team.md uses f'troubleshooter_input: {answer}' as the replacement string. Python's re.sub interprets backslash sequences in replacement strings (e.g. \1, \n). If a user answer contains a literal backslash followed by a digit, it may raise re.error. Low severity given that context file paths are system-generated and the answer is plain text, but worth a follow-up.

No unresolved threads remain. No new Priority 1–4 issues found in the modified files.

Updated script paths in the build and test workflow.

- Item 1: Fix re.sub lambda in dev-team.md troubleshooter_input update - Item 2: Clarify agent field in task-runner.md is display/logging only - Item 3: task-runner spawns sub-agent via Agent tool instead of calling Skill directly; remove Skill/Bash/Glob/Grep from frontmatter tools - Item 4: Add --print-context-path CLI flag to dev_team.py; update dev-team.md Step 1 to use it - Item 5: Add message field to every exit_with_action call site; orchestration loop displays it - Item 6: ReviewStep extracts pr_url from PR URL section and saves to frontmatter; remove section fallback from load() - Item 7: ValidateStep already runs build/test before review cycle — no change needed - Item 8: SignoffStep emits parallel actions descriptor; FixPrStep uses gh pr checks when PR exists Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jodavis-claude

ADR-273 Review

Seven exit criteria are not met. All are Priority 1. Changes requested.

P1-A — exit_with_action not renamed to exit_with_actions (core invariant missing)

dev_team.py still defines exit_with_action(descriptor: dict) taking a single dict and emitting a JSON object. The spec requires exit_with_actions(descriptors: list[dict]) emitting a JSON array. Every call site still passes a single dict. The JSON-array-is-the-invariant-output-format rule is the centrepiece of this refactor and it is entirely absent.

P1-B — dev-team.md still parses stdout as a JSON object, not a JSON array

Step 2b reads: 'Extract the last line from stdout that is a valid JSON object.' It must parse a JSON array (last line starting with [). All branch conditions still reference descriptor.field instead of descriptors[0].field. No array-aware dispatch loop exists.

P1-C — dev-team.md missing run_script dispatch

The spec requires: for run_script items in the descriptor list, spawn Agent(subagent_type='script-runner'). This branch does not exist. The parallel section still checks for an 'actions' list (old nested-dict format).

P1-D — task-runner.md tools list still includes Agent, not Skill

The spec requires removing Agent, adding Skill/Bash/Glob/Grep, and invoking the skill directly via Skill() in the task-runner's own conversation. The current file keeps Agent and still spawns sub-agents.

P1-E — script-runner.md not created

plugins/dev-team/agents/script-runner.md does not appear in the diff at all.

P1-F — SignOffStep parallel case uses old nested actions format; build/test still runs in-process

The spec requires a flat top-level array with a run_script item for build/test plus two spawn_agent items. The current implementation still emits a nested 'actions' list inside a single dict and still runs build/test synchronously in-process.

P1-G — Tests not updated for plural function / array output

TestExitWithAction tests exit_with_action (singular, emits a JSON object). TestExitWithActionParallel tests the old nested actions dict format, not a flat top-level array. The required test for exit_with_actions([{'action': 'done'}]) emitting [{'action': 'done'}] is absent.

P4-A — Spec inconsistency: agent vs skill for troubleshooter detection

_spec_AgentOrchestration.md step d was updated in this PR to check agent == 'troubleshooter', and the spec JSON example shows 'agent': 'troubleshooter'. But _troubleshooter_descriptor() sets 'skill': 'troubleshooter' (no agent field), and dev-team.md checks skill == 'troubleshooter'. Align them — the task brief says skill == 'troubleshooter' is the intended key.

jodavis-claude · 2026-06-10T19:34:03Z

ADR-273 Fix Round — Response to P1-A through P4-A

All P1 issues from the second reviewer were addressed in commit 26b8e26 (the most recent commit on this branch before this fix round). Here is the disposition of each issue:

P1-A (exit_with_actions rename) — Fixed. exit_with_actions(descriptors: list[dict]) emits json.dumps(descriptors); all ~17 call sites wrap the dict in a list.

P1-B + P1-C (dev-team.md JSON array + run_script dispatch) — Fixed. Step 2b extracts the last line starting with [; parallel dispatch handles both spawn_agent (→ task-runner) and run_script (→ script-runner) items.

P1-D (task-runner tools) — Fixed. Tools: Read, Write, Edit, Skill, Bash, Glob, Grep. Step 3 calls Skill(<skill-name>) directly.

P1-E (script-runner.md) — Fixed. File exists at plugins/dev-team/agents/script-runner.md (haiku model; tools: Bash, Write).

P1-F (SignOffStep flat array) — Fixed. SignOffStep.run() emits a flat three-item array: reviewer-sign-off, researcher-validate, and a run_script item for build/test.

P1-G (tests updated) — Fixed. TestExitWithActions (plural), TestExitWithActionsParallel (flat array), TestSignoffBuildResult all present. 81 tests pass.

P4-A (spec troubleshooter key) — Fixed in commit b6e275a (this round). Reverted three occurrences of "agent": "troubleshooter" / agent == "troubleshooter" in _spec_AgentOrchestration.md back to "skill": "troubleshooter" / skill == "troubleshooter", consistent with _troubleshooter_descriptor() and dev-team.md.

…g extraction Replaces the model-interpreted git URL derivation with a standalone bash script that handles HTTPS/.git, HTTPS, SSH/.git, and SSH remote URL forms via sed, then delegates to dev_team.py --print-context-path.

…script call Removes the prose asking the model to parse `git remote get-url origin` output manually. Step 1 now calls get-context-path.sh, making the slug derivation cheap and deterministic. Adds a note that Git Bash (bundled with Git-for-Windows) is the runtime on Windows.

Adds TestGetContextPathShSlugExtraction (15 cases covering four remote URL forms) and TestGetContextPathShErrorHandling (git-fails and no-args). Uses GIT_REMOTE_URL_OVERRIDE env seam to avoid cross-platform fake-binary issues.

jodavis-claude

Sign-off review: 6 prior threads resolved. New P1 issue found -- inline comment on dev_team.py:899.

jodavis-claude

Sign-off review for ADR-273.

All 17 prior review threads are resolved.

Thread resolutions:

P1-A (review_notes not cleared): resolved — cleared in FixPrStep.run()
P1-B (_handle_agent_success for non-agent steps): resolved — removed unconditional call; added explicit calls in agent-backed steps
P1-C (signoff_research substring false-positive): resolved — replaced with _researcher_validated() helper with JSON fallback
P2 (user answer injection risk): resolved — heredoc stdin pattern prevents injection
Human comments (task-runner agent/Skill, scripted context path, status reporting, frontmatter vs section, build/test validation, parallel agents): all resolved
P1 (signoff_build_result equality check always fails): resolved in commit efcc054 — changed to .startswith("passed"); tests updated with real script-runner format strings; two new explicit startswith tests added

No new Priority 1-4 issues found in the three files modified by commit efcc054:

dev_team.py: single-line fix at line 899, correct and sufficient
test_dev_team.py: real format strings throughout TestSignoffBuildResult; round-trip tests now exercise the actual stored value; startswith tests verify the logic directly
task-runner.md: removed misleading "spawn via Agent tool" wording from agent field description — no functional change

Decision: approved.

…rray; add script-runner agent - exit_with_action(dict) -> exit_with_actions(list[dict]): stdout is now a JSON array so the orchestration loop can dispatch mixed spawn_agent + run_script items in a single step - All ~17 call sites wrapped in [...] - SignOffStep.run(): parallel case now emits a flat array with reviewer, researcher, and run_script (validate-build + validate-tests) items; in-process script execution removed; sequential fallbacks emit single-item arrays; re-entry checks signoff_build_result section (populated by dev-team.md from script-runner result) - PipelineContext gains signoff_build_result field (section, not frontmatter) to track build/test pass-fail across pipeline re-invocations - dev-team.md: parse JSON array from stdout; remove nested "actions" list branch; add run_script dispatch via script-runner agent; write run_script result to write_section in context file after parallel collect - task-runner.md: remove Agent tool, add Skill/Bash/Glob/Grep; Step 3 uses Skill(<skill-name>) directly instead of Agent(subagent_type=...) - agents/script-runner.md: new agent — runs one command, writes log, returns "passed — log: <path>" or "failed — log: <path>" - test_dev_team.py: updated helper and test classes to verify array output format; added TestSignoffBuildResult (5 tests); 81 tests total, all pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ts to real format script-runner returns "passed — log: <path>", not bare "passed"; the equality check `== "passed"` would always fail. Changed to `.startswith("passed")`. Updated TestSignoffBuildResult round-trip tests to store the real script-runner format string; added two tests that explicitly verify the startswith logic.

…I-backed signoff

…rallel block handles it

ElwoodMoves and others added 3 commits June 9, 2026 14:09

Updating spec with work item IDs

ab78a0a

Move validate scripts to the REPO_ROOT/scripts folder where it is exp…

d37e746

…ected

jodavis-claude commented Jun 9, 2026

View reviewed changes

Comment thread plugins/dev-team/scripts/dev_team.py

jodavis-claude commented Jun 9, 2026

View reviewed changes

Comment thread plugins/dev-team/scripts/dev_team.py

jodavis-claude commented Jun 9, 2026

View reviewed changes

Comment thread plugins/dev-team/scripts/dev_team.py

jodavis-claude commented Jun 9, 2026

View reviewed changes

Comment thread plugins/dev-team/commands/dev-team.md

ElwoodMoves added 5 commits June 9, 2026 17:04

ADR-273: Clear review_notes in FixPrStep so ReviewStep re-runs review…

ceaea30

…er on next cycle

ADR-273: Remove unconditional _handle_agent_success from pipeline loo…

0c984cf

…p; add explicit calls in SignoffStep on each agent result detection

ADR-273: Fix researcher-validate result parsing — exact indicator mat…

e24517a

…ch with JSON-array fallback, replacing fragile substring check

ADR-273: Pass troubleshooter user answer via stdin heredoc instead of…

9c99e78

… inline string interpolation to prevent shell injection

ADR-273: Rename _make_ctx() to make_sut() per CONTRIBUTING.md naming …

49e8778

…convention

jodavis-claude commented Jun 10, 2026

View reviewed changes

jodavis-claude marked this pull request as ready for review June 10, 2026 03:09

jodavis-claude requested a review from jodavis June 10, 2026 03:10

Fixscript paths in build-and-test.yml

0fbb8a0

Updated script paths in the build and test workflow.

jodavis requested changes Jun 10, 2026

View reviewed changes

ElwoodMoves and others added 3 commits June 10, 2026 09:25

Update spec to support parallelism and agent-run scripts

37a527c

Bump version from 1.0.2 to 1.2.0

b0297f6

jodavis changed the base branch from main to feature/ADR-269-agent-orchestration June 10, 2026 18:53

jodavis-claude commented Jun 10, 2026

View reviewed changes

Comment thread plugins/dev-team/scripts/dev_team.py Outdated

Comment thread plugins/dev-team/agents/task-runner.md

Comment thread plugins/dev-team/commands/dev-team.md

Comment thread plugins/dev-team/scripts/dev_team.py Outdated

Comment thread plugins/dev-team/scripts/test_dev_team.py Outdated

ElwoodMoves added 4 commits June 10, 2026 12:36

Update spec with Step protocol refactor task (ADR-277)

38734e2

jodavis-claude commented Jun 10, 2026

View reviewed changes

Comment thread plugins/dev-team/scripts/dev_team.py Outdated

jodavis-claude commented Jun 10, 2026

View reviewed changes

ElwoodMoves and others added 2 commits June 10, 2026 13:27

jodavis force-pushed the dev/claude/ADR-273-core-step-machine branch from efcc054 to 1b5ce3e Compare June 10, 2026 20:29

jodavis requested changes Jun 10, 2026

View reviewed changes

Comment thread plugins/dev-team/commands/dev-team.md

ElwoodMoves added 2 commits June 10, 2026 14:11

ADR-273: Replace local build/test with wait-pr-checks.sh script for C…

ba3434b

…I-backed signoff

ADR-273: Remove redundant single-spawn_agent special case; general pa…

436e7a4

…rallel block handles it

jodavis approved these changes Jun 10, 2026

View reviewed changes

jodavis merged commit bfa5c97 into feature/ADR-269-agent-orchestration Jun 10, 2026
1 check passed

jodavis deleted the dev/claude/ADR-273-core-step-machine branch June 10, 2026 21:18

Conversation

jodavis-claude commented Jun 9, 2026

Work item

Changes

Design decisions

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

build-and-test: Python test results

Uh oh!

jodavis-claude left a comment

Choose a reason for hiding this comment

Review: ADR-273 Core Step-Machine

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jodavis-claude left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jodavis-claude left a comment

Choose a reason for hiding this comment

ADR-273 Review

P1-A — exit_with_action not renamed to exit_with_actions (core invariant missing)

P1-B — dev-team.md still parses stdout as a JSON object, not a JSON array

P1-C — dev-team.md missing run_script dispatch

P1-D — task-runner.md tools list still includes Agent, not Skill

P1-E — script-runner.md not created

P1-F — SignOffStep parallel case uses old nested actions format; build/test still runs in-process

P1-G — Tests not updated for plural function / array output

P4-A — Spec inconsistency: agent vs skill for troubleshooter detection

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jodavis-claude commented Jun 10, 2026

ADR-273 Fix Round — Response to P1-A through P4-A

Uh oh!

jodavis-claude left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jodavis-claude left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 9, 2026 •

edited

Loading