ADR-273: Implement core step-machine — dev_team.py, dev-team.md loop, task-runner agent#27
Conversation
… task-runner agent - dev_team.py: refactored from subprocess-spawn orchestrator to step machine. Each step exits with a JSON descriptor when an agent is needed (exit_with_action). Removes call_agent(), _MdStreamWriter, _resolve_claude(). Adds --context-file required arg; log_dir derived from context file path. PipelineContext gains signoff_cycle_count, consecutive_failures, review_cycle_count, troubleshooter_input, pending_agent, signoff_review, signoff_research fields. Troubleshooter trigger conditions check thresholds and route to exit_with_action. - test_dev_team.py: 31 unit tests covering exit_with_action (subprocess isolation), compute_context_path (with/without DEV_TEAM_STATE_DIR), and counter increment/reset logic for all three counters (signoff_cycle_count, review_cycle_count, consecutive_failures) including save/load roundtrip. - agents/task-runner.md: new agent that owns the orchestration protocol — reads context sections, invokes a skill, writes result to context file, returns exactly one result_format line. - commands/dev-team.md: new orchestration loop skill that computes the context file path, drives the step-machine loop, and spawns task-runner or troubleshooter agents via the Agent tool. - commands/implement.md, fix.md: updated to invoke dev-team skill instead of run-workflow. - commands/run-workflow.md: deprecated with redirect to dev-team.md. Key decisions: SignoffStep runs reviewer-sign-off and researcher-validate sequentially (not in parallel) tracking completion via context file sections; PR URL parsed from "PR URL" section by PipelineContext.load(); compute_context_path provided as tested helper; review_cycle_count increments on every ReviewStep completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
build-and-test: Python test resultsStatus: ✅ Passed Test log |
jodavis-claude
left a comment
There was a problem hiding this comment.
Review: ADR-273 Core Step-Machine
Overall, the architecture is correct and the implementation covers most exit criteria. Three correctness issues found; one security note; one style point.
Priority 1 — Correctness
[P1-A] ReviewStep re-entry returns stale data after a fix-pr cycle. ReviewStep.run() checks if ctx.review_notes: at sub-step 2 and returns the parsed status immediately — but ctx.review_notes is never cleared between review cycles. When the workflow returns to the reviewing state after a FixPrStep pass, ctx.review_notes still holds the previous cycle's output, so the reviewer is never re-spawned. The step returns the old approved/changes_requested decision and never produces new feedback.
[P1-B] _handle_agent_success incorrectly resets consecutive_failures for non-agent steps. DevTeamPipeline.run() calls _handle_agent_success(self.ctx) after every step.run() return — including FindSpecStep and ValidateStep, which run in-process scripts with no agent involvement. A passing build clears the counter even when the surrounding agent calls were all failures, masking true consecutive failure counts.
[P1-C] researcher-validate result is checked by substring "failed", not JSON parsing. ctx.signoff_research holds the full researcher output (JSON array of exit criteria written by the task-runner). The old code properly parsed "status": "fail" / "partial" items; the new code does "failed" in ctx.signoff_research.lower(). A criterion JSON containing "fail" (not "failed") will be silently ignored. A description containing "never failed" will false-positive.
Priority 2 — Security note
[P2] troubleshooter_input Python one-liner embeds user answer unsanitised. The dev-team.md one-liner for needs_user_input interpolates <user_answer> directly into a Python string literal (answer = '<user_answer>'). A user answer containing a single-quote or backslash breaks the Python syntax or injects code. This should use a temp file or environment variable rather than shell string interpolation.
Priority 5 — Style (non-blocking)
[P5] Test helpers named _make_ctx() instead of make_sut(). CONTRIBUTING.md requires make_sut() for the subject-under-test factory function.
…p; add explicit calls in SignoffStep on each agent result detection
…ch with JSON-array fallback, replacing fragile substring check
… inline string interpolation to prevent shell injection
jodavis-claude
left a comment
There was a problem hiding this comment.
Sign-off review — all prior comments addressed.
[P1-A] review_notes clearing ✓ Resolved — ctx.review_notes = "" added in FixPrStep.run() on the agent-success path (line 895), ensuring ReviewStep always re-spawns the reviewer after a PR-fix cycle.
[P1-B] _handle_agent_success unconditional ✓ Resolved — removed from the pipeline loop; explicit calls now appear in SignoffStep.run() at the two points where agent results are detected (signoff_review populated at line 742, signoff_research populated at line 770).
[P1-C] Researcher-validate result parsing ✓ Resolved — new _researcher_validated() helper (lines 395–417) checks exact indicators first ("validated"/"failed"), then falls back to JSON-array parsing. Tests cover the "never failed" false-positive case.
[P2] Shell injection ✓ Resolved — user answer is now passed via stdin with a single-quoted heredoc (<<'ANSWER_HEREDOC'); Python reads it via sys.stdin.read(). No remaining shell injection vector from the user-supplied answer.
New issue noted (non-blocking, style/correctness): The re.sub replacement in dev-team.md uses f'troubleshooter_input: {answer}' as the replacement string. Python's re.sub interprets backslash sequences in replacement strings (e.g. \1, \n). If a user answer contains a literal backslash followed by a digit, it may raise re.error. Low severity given that context file paths are system-generated and the answer is plain text, but worth a follow-up.
No unresolved threads remain. No new Priority 1–4 issues found in the modified files.
Updated script paths in the build and test workflow.
- Item 1: Fix re.sub lambda in dev-team.md troubleshooter_input update - Item 2: Clarify agent field in task-runner.md is display/logging only - Item 3: task-runner spawns sub-agent via Agent tool instead of calling Skill directly; remove Skill/Bash/Glob/Grep from frontmatter tools - Item 4: Add --print-context-path CLI flag to dev_team.py; update dev-team.md Step 1 to use it - Item 5: Add message field to every exit_with_action call site; orchestration loop displays it - Item 6: ReviewStep extracts pr_url from PR URL section and saves to frontmatter; remove section fallback from load() - Item 7: ValidateStep already runs build/test before review cycle — no change needed - Item 8: SignoffStep emits parallel actions descriptor; FixPrStep uses gh pr checks when PR exists Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jodavis-claude
left a comment
There was a problem hiding this comment.
ADR-273 Review
Seven exit criteria are not met. All are Priority 1. Changes requested.
P1-A — exit_with_action not renamed to exit_with_actions (core invariant missing)
dev_team.py still defines exit_with_action(descriptor: dict) taking a single dict and emitting a JSON object. The spec requires exit_with_actions(descriptors: list[dict]) emitting a JSON array. Every call site still passes a single dict. The JSON-array-is-the-invariant-output-format rule is the centrepiece of this refactor and it is entirely absent.
P1-B — dev-team.md still parses stdout as a JSON object, not a JSON array
Step 2b reads: 'Extract the last line from stdout that is a valid JSON object.' It must parse a JSON array (last line starting with [). All branch conditions still reference descriptor.field instead of descriptors[0].field. No array-aware dispatch loop exists.
P1-C — dev-team.md missing run_script dispatch
The spec requires: for run_script items in the descriptor list, spawn Agent(subagent_type='script-runner'). This branch does not exist. The parallel section still checks for an 'actions' list (old nested-dict format).
P1-D — task-runner.md tools list still includes Agent, not Skill
The spec requires removing Agent, adding Skill/Bash/Glob/Grep, and invoking the skill directly via Skill() in the task-runner's own conversation. The current file keeps Agent and still spawns sub-agents.
P1-E — script-runner.md not created
plugins/dev-team/agents/script-runner.md does not appear in the diff at all.
P1-F — SignOffStep parallel case uses old nested actions format; build/test still runs in-process
The spec requires a flat top-level array with a run_script item for build/test plus two spawn_agent items. The current implementation still emits a nested 'actions' list inside a single dict and still runs build/test synchronously in-process.
P1-G — Tests not updated for plural function / array output
TestExitWithAction tests exit_with_action (singular, emits a JSON object). TestExitWithActionParallel tests the old nested actions dict format, not a flat top-level array. The required test for exit_with_actions([{'action': 'done'}]) emitting [{'action': 'done'}] is absent.
P4-A — Spec inconsistency: agent vs skill for troubleshooter detection
_spec_AgentOrchestration.md step d was updated in this PR to check agent == 'troubleshooter', and the spec JSON example shows 'agent': 'troubleshooter'. But _troubleshooter_descriptor() sets 'skill': 'troubleshooter' (no agent field), and dev-team.md checks skill == 'troubleshooter'. Align them — the task brief says skill == 'troubleshooter' is the intended key.
ADR-273 Fix Round — Response to P1-A through P4-AAll P1 issues from the second reviewer were addressed in commit 26b8e26 (the most recent commit on this branch before this fix round). Here is the disposition of each issue: P1-A (exit_with_actions rename) — Fixed. P1-B + P1-C (dev-team.md JSON array + run_script dispatch) — Fixed. Step 2b extracts the last line starting with P1-D (task-runner tools) — Fixed. Tools: P1-E (script-runner.md) — Fixed. File exists at P1-F (SignOffStep flat array) — Fixed. P1-G (tests updated) — Fixed. P4-A (spec troubleshooter key) — Fixed in commit b6e275a (this round). Reverted three occurrences of |
…g extraction Replaces the model-interpreted git URL derivation with a standalone bash script that handles HTTPS/.git, HTTPS, SSH/.git, and SSH remote URL forms via sed, then delegates to dev_team.py --print-context-path.
…script call Removes the prose asking the model to parse `git remote get-url origin` output manually. Step 1 now calls get-context-path.sh, making the slug derivation cheap and deterministic. Adds a note that Git Bash (bundled with Git-for-Windows) is the runtime on Windows.
Adds TestGetContextPathShSlugExtraction (15 cases covering four remote URL forms) and TestGetContextPathShErrorHandling (git-fails and no-args). Uses GIT_REMOTE_URL_OVERRIDE env seam to avoid cross-platform fake-binary issues.
jodavis-claude
left a comment
There was a problem hiding this comment.
Sign-off review: 6 prior threads resolved. New P1 issue found -- inline comment on dev_team.py:899.
jodavis-claude
left a comment
There was a problem hiding this comment.
Sign-off review for ADR-273.
All 17 prior review threads are resolved.
Thread resolutions:
- P1-A (review_notes not cleared): resolved — cleared in FixPrStep.run()
- P1-B (_handle_agent_success for non-agent steps): resolved — removed unconditional call; added explicit calls in agent-backed steps
- P1-C (signoff_research substring false-positive): resolved — replaced with _researcher_validated() helper with JSON fallback
- P2 (user answer injection risk): resolved — heredoc stdin pattern prevents injection
- Human comments (task-runner agent/Skill, scripted context path, status reporting, frontmatter vs section, build/test validation, parallel agents): all resolved
- P1 (signoff_build_result equality check always fails): resolved in commit efcc054 — changed to
.startswith("passed"); tests updated with real script-runner format strings; two new explicit startswith tests added
No new Priority 1-4 issues found in the three files modified by commit efcc054:
dev_team.py: single-line fix at line 899, correct and sufficienttest_dev_team.py: real format strings throughout TestSignoffBuildResult; round-trip tests now exercise the actual stored value; startswith tests verify the logic directlytask-runner.md: removed misleading "spawn via Agent tool" wording fromagentfield description — no functional change
Decision: approved.
…rray; add script-runner agent - exit_with_action(dict) -> exit_with_actions(list[dict]): stdout is now a JSON array so the orchestration loop can dispatch mixed spawn_agent + run_script items in a single step - All ~17 call sites wrapped in [...] - SignOffStep.run(): parallel case now emits a flat array with reviewer, researcher, and run_script (validate-build + validate-tests) items; in-process script execution removed; sequential fallbacks emit single-item arrays; re-entry checks signoff_build_result section (populated by dev-team.md from script-runner result) - PipelineContext gains signoff_build_result field (section, not frontmatter) to track build/test pass-fail across pipeline re-invocations - dev-team.md: parse JSON array from stdout; remove nested "actions" list branch; add run_script dispatch via script-runner agent; write run_script result to write_section in context file after parallel collect - task-runner.md: remove Agent tool, add Skill/Bash/Glob/Grep; Step 3 uses Skill(<skill-name>) directly instead of Agent(subagent_type=...) - agents/script-runner.md: new agent — runs one command, writes log, returns "passed — log: <path>" or "failed — log: <path>" - test_dev_team.py: updated helper and test classes to verify array output format; added TestSignoffBuildResult (5 tests); 81 tests total, all pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ts to real format
script-runner returns "passed — log: <path>", not bare "passed"; the equality
check `== "passed"` would always fail. Changed to `.startswith("passed")`.
Updated TestSignoffBuildResult round-trip tests to store the real script-runner
format string; added two tests that explicitly verify the startswith logic.
efcc054 to
1b5ce3e
Compare
…rallel block handles it
Work item
ADR-273 — Refactor
dev_team.pyinto a re-entrant step machine that emits JSON action descriptors instead of spawning sub-agents, introduce an orchestration loop indev-team.md, and create a newtask-runneragent so sub-agents inherit full MCP credentials from the top-level Claude Code session.Changes
plugins/dev-team/scripts/dev_team.py— Refactored from subprocess-spawn orchestrator to step machine. Removescall_agent(),_MdStreamWriter,_resolve_claude(),CLAUDE_CMD,concurrent.futures, andthreading. Addsexit_with_action(),compute_context_path(),_apply_counter_updates(),_handle_agent_failure(), and_handle_agent_success(). Adds--context-fileas a required CLI argument; log directory is now derived fromcontext_path.parent / "logs".PipelineContextgainssignoff_cycle_count,consecutive_failures,review_cycle_count,troubleshooter_input,pending_agent,signoff_review, andsignoff_researchwith full save/load roundtrip. Everycall_agent()call site is replaced withexit_with_action({...}). Workflow-file mechanism is unchanged.DevTeamPipeline.run()calls_apply_counter_updates()after each step and checks troubleshooter thresholds.plugins/dev-team/scripts/test_dev_team.py— New pytest file with 31 tests coveringexit_with_action,compute_context_path(with and withoutDEV_TEAM_STATE_DIR), and counter increment/reset logic for all three counters with save/load roundtrips.plugins/dev-team/agents/task-runner.md— New agent definition withRead,Write,Edit,Skill,Bash,Glob,Greptools. Implements the seven-step read-context → invoke-skill → write-result → return-one-line protocol.plugins/dev-team/commands/dev-team.md— New orchestration loop skill. Computes the context file path fromDEV_TEAM_STATE_DIR/~/.dev-team/<repo-slug>/<id>.md, drives the step-machine loop, spawns task-runner or troubleshooter agents, and handles all three troubleshooter outcomes (continue,terminate,needs_user_input).plugins/dev-team/commands/implement.md— Updated to invoke thedev-teamskill instead ofrun-workflow.plugins/dev-team/commands/fix.md— Updated to invoke thedev-teamskill instead ofrun-workflow.plugins/dev-team/commands/run-workflow.md— Deprecated with a redirect header pointing to the new mechanism.Design decisions
Script as a step machine, not a daemon.
dev_team.pyruns to the first agent boundary, emits{"action": "spawn_agent", ...}on stdout, and exits 0.dev-team.mdloops — it re-invokes the script after every agent return. The script never blocks waiting for an agent; the loop lives in the top-level Claude Code session.Context file is the only communication channel. Each step's re-entrant entry point reads the context file to determine whether its agent sub-task has already been completed (e.g.,
ResearchStep.run()checksif ctx.brief: return "research_done"before emitting the exit descriptor). The task-runner writes the agent's output into the named section that the script checks on the next invocation.SignoffStep sub-step tracking. Because only one agent can be spawned per script invocation, the former parallel
ThreadPoolExecutorpattern is replaced with sequential sub-step completion tracked in frontmatter fields (signoff_review,signoff_research), requiring up to 3 loop iterations for one signoff pass.compute_context_path()helper retained indev_team.py. The spec requires unit tests for path computation; the helper is exported from the script (tested but not called inmain()since--context-fileis always passed bydev-team.md).