Skip to content

ADR-273: Implement core step-machine — dev_team.py, dev-team.md loop, task-runner agent#27

Merged
jodavis merged 20 commits into
feature/ADR-269-agent-orchestrationfrom
dev/claude/ADR-273-core-step-machine
Jun 10, 2026
Merged

ADR-273: Implement core step-machine — dev_team.py, dev-team.md loop, task-runner agent#27
jodavis merged 20 commits into
feature/ADR-269-agent-orchestrationfrom
dev/claude/ADR-273-core-step-machine

Conversation

@jodavis-claude

Copy link
Copy Markdown
Collaborator

Work item

ADR-273 — Refactor dev_team.py into a re-entrant step machine that emits JSON action descriptors instead of spawning sub-agents, introduce an orchestration loop in dev-team.md, and create a new task-runner agent so sub-agents inherit full MCP credentials from the top-level Claude Code session.

Changes

  • plugins/dev-team/scripts/dev_team.py — Refactored from subprocess-spawn orchestrator to step machine. Removes call_agent(), _MdStreamWriter, _resolve_claude(), CLAUDE_CMD, concurrent.futures, and threading. Adds exit_with_action(), compute_context_path(), _apply_counter_updates(), _handle_agent_failure(), and _handle_agent_success(). Adds --context-file as a required CLI argument; log directory is now derived from context_path.parent / "logs". PipelineContext gains signoff_cycle_count, consecutive_failures, review_cycle_count, troubleshooter_input, pending_agent, signoff_review, and signoff_research with full save/load roundtrip. Every call_agent() call site is replaced with exit_with_action({...}). Workflow-file mechanism is unchanged. DevTeamPipeline.run() calls _apply_counter_updates() after each step and checks troubleshooter thresholds.
  • plugins/dev-team/scripts/test_dev_team.py — New pytest file with 31 tests covering exit_with_action, compute_context_path (with and without DEV_TEAM_STATE_DIR), and counter increment/reset logic for all three counters with save/load roundtrips.
  • plugins/dev-team/agents/task-runner.md — New agent definition with Read, Write, Edit, Skill, Bash, Glob, Grep tools. Implements the seven-step read-context → invoke-skill → write-result → return-one-line protocol.
  • plugins/dev-team/commands/dev-team.md — New orchestration loop skill. Computes the context file path from DEV_TEAM_STATE_DIR/~/.dev-team/<repo-slug>/<id>.md, drives the step-machine loop, spawns task-runner or troubleshooter agents, and handles all three troubleshooter outcomes (continue, terminate, needs_user_input).
  • plugins/dev-team/commands/implement.md — Updated to invoke the dev-team skill instead of run-workflow.
  • plugins/dev-team/commands/fix.md — Updated to invoke the dev-team skill instead of run-workflow.
  • plugins/dev-team/commands/run-workflow.md — Deprecated with a redirect header pointing to the new mechanism.

Design decisions

Script as a step machine, not a daemon. dev_team.py runs to the first agent boundary, emits {"action": "spawn_agent", ...} on stdout, and exits 0. dev-team.md loops — it re-invokes the script after every agent return. The script never blocks waiting for an agent; the loop lives in the top-level Claude Code session.

Context file is the only communication channel. Each step's re-entrant entry point reads the context file to determine whether its agent sub-task has already been completed (e.g., ResearchStep.run() checks if ctx.brief: return "research_done" before emitting the exit descriptor). The task-runner writes the agent's output into the named section that the script checks on the next invocation.

SignoffStep sub-step tracking. Because only one agent can be spawned per script invocation, the former parallel ThreadPoolExecutor pattern is replaced with sequential sub-step completion tracked in frontmatter fields (signoff_review, signoff_research), requiring up to 3 loop iterations for one signoff pass.

compute_context_path() helper retained in dev_team.py. The spec requires unit tests for path computation; the helper is exported from the script (tested but not called in main() since --context-file is always passed by dev-team.md).

ElwoodMoves and others added 3 commits June 9, 2026 14:09
… task-runner agent

- dev_team.py: refactored from subprocess-spawn orchestrator to step machine.
  Each step exits with a JSON descriptor when an agent is needed (exit_with_action).
  Removes call_agent(), _MdStreamWriter, _resolve_claude(). Adds --context-file
  required arg; log_dir derived from context file path. PipelineContext gains
  signoff_cycle_count, consecutive_failures, review_cycle_count, troubleshooter_input,
  pending_agent, signoff_review, signoff_research fields. Troubleshooter trigger
  conditions check thresholds and route to exit_with_action.

- test_dev_team.py: 31 unit tests covering exit_with_action (subprocess isolation),
  compute_context_path (with/without DEV_TEAM_STATE_DIR), and counter
  increment/reset logic for all three counters (signoff_cycle_count,
  review_cycle_count, consecutive_failures) including save/load roundtrip.

- agents/task-runner.md: new agent that owns the orchestration protocol —
  reads context sections, invokes a skill, writes result to context file,
  returns exactly one result_format line.

- commands/dev-team.md: new orchestration loop skill that computes the context
  file path, drives the step-machine loop, and spawns task-runner or
  troubleshooter agents via the Agent tool.

- commands/implement.md, fix.md: updated to invoke dev-team skill instead of
  run-workflow.

- commands/run-workflow.md: deprecated with redirect to dev-team.md.

Key decisions: SignoffStep runs reviewer-sign-off and researcher-validate
sequentially (not in parallel) tracking completion via context file sections;
PR URL parsed from "PR URL" section by PipelineContext.load(); compute_context_path
provided as tested helper; review_cycle_count increments on every ReviewStep
completion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

build-and-test: Python test results

Status: ✅ Passed

Test log

@jodavis-claude jodavis-claude left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: ADR-273 Core Step-Machine

Overall, the architecture is correct and the implementation covers most exit criteria. Three correctness issues found; one security note; one style point.

Priority 1 — Correctness

[P1-A] ReviewStep re-entry returns stale data after a fix-pr cycle. ReviewStep.run() checks if ctx.review_notes: at sub-step 2 and returns the parsed status immediately — but ctx.review_notes is never cleared between review cycles. When the workflow returns to the reviewing state after a FixPrStep pass, ctx.review_notes still holds the previous cycle's output, so the reviewer is never re-spawned. The step returns the old approved/changes_requested decision and never produces new feedback.

[P1-B] _handle_agent_success incorrectly resets consecutive_failures for non-agent steps. DevTeamPipeline.run() calls _handle_agent_success(self.ctx) after every step.run() return — including FindSpecStep and ValidateStep, which run in-process scripts with no agent involvement. A passing build clears the counter even when the surrounding agent calls were all failures, masking true consecutive failure counts.

[P1-C] researcher-validate result is checked by substring "failed", not JSON parsing. ctx.signoff_research holds the full researcher output (JSON array of exit criteria written by the task-runner). The old code properly parsed "status": "fail" / "partial" items; the new code does "failed" in ctx.signoff_research.lower(). A criterion JSON containing "fail" (not "failed") will be silently ignored. A description containing "never failed" will false-positive.

Priority 2 — Security note

[P2] troubleshooter_input Python one-liner embeds user answer unsanitised. The dev-team.md one-liner for needs_user_input interpolates <user_answer> directly into a Python string literal (answer = '<user_answer>'). A user answer containing a single-quote or backslash breaks the Python syntax or injects code. This should use a temp file or environment variable rather than shell string interpolation.

Priority 5 — Style (non-blocking)

[P5] Test helpers named _make_ctx() instead of make_sut(). CONTRIBUTING.md requires make_sut() for the subject-under-test factory function.

Comment thread plugins/dev-team/scripts/dev_team.py
Comment thread plugins/dev-team/scripts/dev_team.py
Comment thread plugins/dev-team/scripts/dev_team.py
Comment thread plugins/dev-team/commands/dev-team.md

@jodavis-claude jodavis-claude left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign-off review — all prior comments addressed.

[P1-A] review_notes clearing ✓ Resolved — ctx.review_notes = "" added in FixPrStep.run() on the agent-success path (line 895), ensuring ReviewStep always re-spawns the reviewer after a PR-fix cycle.

[P1-B] _handle_agent_success unconditional ✓ Resolved — removed from the pipeline loop; explicit calls now appear in SignoffStep.run() at the two points where agent results are detected (signoff_review populated at line 742, signoff_research populated at line 770).

[P1-C] Researcher-validate result parsing ✓ Resolved — new _researcher_validated() helper (lines 395–417) checks exact indicators first ("validated"/"failed"), then falls back to JSON-array parsing. Tests cover the "never failed" false-positive case.

[P2] Shell injection ✓ Resolved — user answer is now passed via stdin with a single-quoted heredoc (<<'ANSWER_HEREDOC'); Python reads it via sys.stdin.read(). No remaining shell injection vector from the user-supplied answer.

New issue noted (non-blocking, style/correctness): The re.sub replacement in dev-team.md uses f'troubleshooter_input: {answer}' as the replacement string. Python's re.sub interprets backslash sequences in replacement strings (e.g. \1, \n). If a user answer contains a literal backslash followed by a digit, it may raise re.error. Low severity given that context file paths are system-generated and the answer is plain text, but worth a follow-up.

No unresolved threads remain. No new Priority 1–4 issues found in the modified files.

@jodavis-claude jodavis-claude marked this pull request as ready for review June 10, 2026 03:09
@jodavis-claude jodavis-claude requested a review from jodavis June 10, 2026 03:10
Updated script paths in the build and test workflow.
Comment thread plugins/dev-team/agents/task-runner.md Outdated
Comment thread plugins/dev-team/agents/task-runner.md Outdated
Comment thread plugins/dev-team/commands/dev-team.md
Comment thread plugins/dev-team/commands/dev-team.md
Comment thread plugins/dev-team/scripts/dev_team.py Outdated
Comment thread plugins/dev-team/scripts/dev_team.py
Comment thread plugins/dev-team/scripts/dev_team.py Outdated
ElwoodMoves and others added 3 commits June 10, 2026 09:25
- Item 1: Fix re.sub lambda in dev-team.md troubleshooter_input update
- Item 2: Clarify agent field in task-runner.md is display/logging only
- Item 3: task-runner spawns sub-agent via Agent tool instead of calling Skill directly; remove Skill/Bash/Glob/Grep from frontmatter tools
- Item 4: Add --print-context-path CLI flag to dev_team.py; update dev-team.md Step 1 to use it
- Item 5: Add message field to every exit_with_action call site; orchestration loop displays it
- Item 6: ReviewStep extracts pr_url from PR URL section and saves to frontmatter; remove section fallback from load()
- Item 7: ValidateStep already runs build/test before review cycle — no change needed
- Item 8: SignoffStep emits parallel actions descriptor; FixPrStep uses gh pr checks when PR exists

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jodavis jodavis changed the base branch from main to feature/ADR-269-agent-orchestration June 10, 2026 18:53

@jodavis-claude jodavis-claude left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADR-273 Review

Seven exit criteria are not met. All are Priority 1. Changes requested.

P1-A — exit_with_action not renamed to exit_with_actions (core invariant missing)

dev_team.py still defines exit_with_action(descriptor: dict) taking a single dict and emitting a JSON object. The spec requires exit_with_actions(descriptors: list[dict]) emitting a JSON array. Every call site still passes a single dict. The JSON-array-is-the-invariant-output-format rule is the centrepiece of this refactor and it is entirely absent.

P1-B — dev-team.md still parses stdout as a JSON object, not a JSON array

Step 2b reads: 'Extract the last line from stdout that is a valid JSON object.' It must parse a JSON array (last line starting with [). All branch conditions still reference descriptor.field instead of descriptors[0].field. No array-aware dispatch loop exists.

P1-C — dev-team.md missing run_script dispatch

The spec requires: for run_script items in the descriptor list, spawn Agent(subagent_type='script-runner'). This branch does not exist. The parallel section still checks for an 'actions' list (old nested-dict format).

P1-D — task-runner.md tools list still includes Agent, not Skill

The spec requires removing Agent, adding Skill/Bash/Glob/Grep, and invoking the skill directly via Skill() in the task-runner's own conversation. The current file keeps Agent and still spawns sub-agents.

P1-E — script-runner.md not created

plugins/dev-team/agents/script-runner.md does not appear in the diff at all.

P1-F — SignOffStep parallel case uses old nested actions format; build/test still runs in-process

The spec requires a flat top-level array with a run_script item for build/test plus two spawn_agent items. The current implementation still emits a nested 'actions' list inside a single dict and still runs build/test synchronously in-process.

P1-G — Tests not updated for plural function / array output

TestExitWithAction tests exit_with_action (singular, emits a JSON object). TestExitWithActionParallel tests the old nested actions dict format, not a flat top-level array. The required test for exit_with_actions([{'action': 'done'}]) emitting [{'action': 'done'}] is absent.

P4-A — Spec inconsistency: agent vs skill for troubleshooter detection

_spec_AgentOrchestration.md step d was updated in this PR to check agent == 'troubleshooter', and the spec JSON example shows 'agent': 'troubleshooter'. But _troubleshooter_descriptor() sets 'skill': 'troubleshooter' (no agent field), and dev-team.md checks skill == 'troubleshooter'. Align them — the task brief says skill == 'troubleshooter' is the intended key.

Comment thread plugins/dev-team/scripts/dev_team.py Outdated
Comment thread plugins/dev-team/agents/task-runner.md
Comment thread plugins/dev-team/commands/dev-team.md
Comment thread plugins/dev-team/scripts/dev_team.py Outdated
Comment thread plugins/dev-team/scripts/test_dev_team.py Outdated
@jodavis-claude

Copy link
Copy Markdown
Collaborator Author

ADR-273 Fix Round — Response to P1-A through P4-A

All P1 issues from the second reviewer were addressed in commit 26b8e26 (the most recent commit on this branch before this fix round). Here is the disposition of each issue:

P1-A (exit_with_actions rename) — Fixed. exit_with_actions(descriptors: list[dict]) emits json.dumps(descriptors); all ~17 call sites wrap the dict in a list.

P1-B + P1-C (dev-team.md JSON array + run_script dispatch) — Fixed. Step 2b extracts the last line starting with [; parallel dispatch handles both spawn_agent (→ task-runner) and run_script (→ script-runner) items.

P1-D (task-runner tools) — Fixed. Tools: Read, Write, Edit, Skill, Bash, Glob, Grep. Step 3 calls Skill(<skill-name>) directly.

P1-E (script-runner.md) — Fixed. File exists at plugins/dev-team/agents/script-runner.md (haiku model; tools: Bash, Write).

P1-F (SignOffStep flat array) — Fixed. SignOffStep.run() emits a flat three-item array: reviewer-sign-off, researcher-validate, and a run_script item for build/test.

P1-G (tests updated) — Fixed. TestExitWithActions (plural), TestExitWithActionsParallel (flat array), TestSignoffBuildResult all present. 81 tests pass.

P4-A (spec troubleshooter key) — Fixed in commit b6e275a (this round). Reverted three occurrences of "agent": "troubleshooter" / agent == "troubleshooter" in _spec_AgentOrchestration.md back to "skill": "troubleshooter" / skill == "troubleshooter", consistent with _troubleshooter_descriptor() and dev-team.md.

…g extraction

Replaces the model-interpreted git URL derivation with a standalone bash
script that handles HTTPS/.git, HTTPS, SSH/.git, and SSH remote URL forms
via sed, then delegates to dev_team.py --print-context-path.
…script call

Removes the prose asking the model to parse `git remote get-url origin`
output manually. Step 1 now calls get-context-path.sh, making the slug
derivation cheap and deterministic. Adds a note that Git Bash (bundled
with Git-for-Windows) is the runtime on Windows.
Adds TestGetContextPathShSlugExtraction (15 cases covering four remote URL
forms) and TestGetContextPathShErrorHandling (git-fails and no-args). Uses
GIT_REMOTE_URL_OVERRIDE env seam to avoid cross-platform fake-binary issues.

@jodavis-claude jodavis-claude left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign-off review: 6 prior threads resolved. New P1 issue found -- inline comment on dev_team.py:899.

Comment thread plugins/dev-team/scripts/dev_team.py Outdated

@jodavis-claude jodavis-claude left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign-off review for ADR-273.

All 17 prior review threads are resolved.

Thread resolutions:

  • P1-A (review_notes not cleared): resolved — cleared in FixPrStep.run()
  • P1-B (_handle_agent_success for non-agent steps): resolved — removed unconditional call; added explicit calls in agent-backed steps
  • P1-C (signoff_research substring false-positive): resolved — replaced with _researcher_validated() helper with JSON fallback
  • P2 (user answer injection risk): resolved — heredoc stdin pattern prevents injection
  • Human comments (task-runner agent/Skill, scripted context path, status reporting, frontmatter vs section, build/test validation, parallel agents): all resolved
  • P1 (signoff_build_result equality check always fails): resolved in commit efcc054 — changed to .startswith("passed"); tests updated with real script-runner format strings; two new explicit startswith tests added

No new Priority 1-4 issues found in the three files modified by commit efcc054:

  • dev_team.py: single-line fix at line 899, correct and sufficient
  • test_dev_team.py: real format strings throughout TestSignoffBuildResult; round-trip tests now exercise the actual stored value; startswith tests verify the logic directly
  • task-runner.md: removed misleading "spawn via Agent tool" wording from agent field description — no functional change

Decision: approved.

ElwoodMoves and others added 2 commits June 10, 2026 13:27
…rray; add script-runner agent

- exit_with_action(dict) -> exit_with_actions(list[dict]): stdout is now a JSON
  array so the orchestration loop can dispatch mixed spawn_agent + run_script items
  in a single step
- All ~17 call sites wrapped in [...]
- SignOffStep.run(): parallel case now emits a flat array with reviewer, researcher,
  and run_script (validate-build + validate-tests) items; in-process script execution
  removed; sequential fallbacks emit single-item arrays; re-entry checks
  signoff_build_result section (populated by dev-team.md from script-runner result)
- PipelineContext gains signoff_build_result field (section, not frontmatter) to
  track build/test pass-fail across pipeline re-invocations
- dev-team.md: parse JSON array from stdout; remove nested "actions" list branch;
  add run_script dispatch via script-runner agent; write run_script result to
  write_section in context file after parallel collect
- task-runner.md: remove Agent tool, add Skill/Bash/Glob/Grep; Step 3 uses
  Skill(<skill-name>) directly instead of Agent(subagent_type=...)
- agents/script-runner.md: new agent — runs one command, writes log, returns
  "passed — log: <path>" or "failed — log: <path>"
- test_dev_team.py: updated helper and test classes to verify array output format;
  added TestSignoffBuildResult (5 tests); 81 tests total, all pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ts to real format

script-runner returns "passed — log: <path>", not bare "passed"; the equality
check `== "passed"` would always fail. Changed to `.startswith("passed")`.

Updated TestSignoffBuildResult round-trip tests to store the real script-runner
format string; added two tests that explicitly verify the startswith logic.
@jodavis jodavis force-pushed the dev/claude/ADR-273-core-step-machine branch from efcc054 to 1b5ce3e Compare June 10, 2026 20:29
Comment thread plugins/dev-team/commands/dev-team.md
Comment thread plugins/dev-team/commands/dev-team.md
Comment thread plugins/dev-team/commands/dev-team.md Outdated
Comment thread plugins/dev-team/scripts/dev_team.py
Comment thread _spec_AgentOrchestration.md
Comment thread plugins/dev-team/scripts/dev_team.py
Comment thread plugins/dev-team/commands/dev-team.md
@jodavis jodavis merged commit bfa5c97 into feature/ADR-269-agent-orchestration Jun 10, 2026
1 check passed
@jodavis jodavis deleted the dev/claude/ADR-273-core-step-machine branch June 10, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants