diff --git a/_spec_AgentOrchestration.md b/_spec_AgentOrchestration.md index 2850b67..fb7a0fb 100644 --- a/_spec_AgentOrchestration.md +++ b/_spec_AgentOrchestration.md @@ -569,7 +569,7 @@ All three components needed to reach a runnable state. `dev_team.py` exits with --- -### Task 2: Step protocol refactor — `get_actions` / `handle_results` / `ParallelSteps` 🤖 +### Task 2: Step protocol refactor — `get_actions` / `handle_results` / `ParallelSteps` 🤖 (ADR-277) _Depends on Task 1. Refactor only — no behaviour changes visible to the pipeline._ diff --git a/plugins/dev-team/agents/developer.md b/plugins/dev-team/agents/developer.md index 4e3794c..86cc751 100644 --- a/plugins/dev-team/agents/developer.md +++ b/plugins/dev-team/agents/developer.md @@ -21,6 +21,7 @@ tools: - mcp__jira__transitionJiraIssue - mcp__jira__editJiraIssue - mcp__jira__addCommentToJiraIssue + - mcp__plugin_github_github__create_pull_request --- You are the Developer for the AdaptiveRemote development team. diff --git a/plugins/dev-team/agents/reviewer.md b/plugins/dev-team/agents/reviewer.md index 2d58f52..b69b2a2 100644 --- a/plugins/dev-team/agents/reviewer.md +++ b/plugins/dev-team/agents/reviewer.md @@ -12,6 +12,11 @@ tools: - Grep - Bash - Skill + - mcp__plugin_github_github__pull_request_read + - mcp__plugin_github_github__pull_request_review_write + - mcp__plugin_github_github__add_comment_to_pending_review + - mcp__plugin_github_github__add_reply_to_pull_request_comment + - mcp__plugin_github_github__update_pull_request --- You are the Reviewer for the AdaptiveRemote development team. diff --git a/plugins/dev-team/agents/task-runner.md b/plugins/dev-team/agents/task-runner.md index 8e9e110..65fbf5f 100644 --- a/plugins/dev-team/agents/task-runner.md +++ b/plugins/dev-team/agents/task-runner.md @@ -86,7 +86,7 @@ Agent( The sub-agent invokes the named skill and returns its full output. Capture that output as the skill result for Step 4. -### Step 4 — Write result to context file +### Step 4 — Write result to context file (Edit only — never Write) Overwrite `write_section` in `context_file` with the skill's full output. @@ -97,10 +97,24 @@ The section format in the file is: ``` -If the section already exists in the file, replace it entirely (including the sentinel -line). If it does not exist, append it after the last existing section. +Use `Read` to read the current file content, then use **`Edit` only — never `Write`**. +Using `Write` would overwrite the entire file and erase sections written by other +concurrent agents. + +**If the sentinel `` already exists in the file:** + +Use `Edit` where: +- `old_string` = the sentinel line, the blank line after it, and all content up to + (but not including) the next `\n\n\n` + +**If the sentinel does not exist (new section):** + +Find the last `\n\n\n` -Use `Read` to read the current file content, then `Edit` or `Write` to update it. **Overwrite the entire section — never append to it.** ### Step 5 — Return result diff --git a/plugins/dev-team/commands/developer-create-pr.md b/plugins/dev-team/commands/developer-create-pr.md index c0d9dba..04ba8c7 100644 --- a/plugins/dev-team/commands/developer-create-pr.md +++ b/plugins/dev-team/commands/developer-create-pr.md @@ -37,29 +37,38 @@ stop: {"pr_url": "$PR_URL"} ``` -### 2 — Determine the base branch +### 2 — Determine the base branch and repo coordinates Using Bash: -1. Run `git fetch --all --quiet` to ensure remote branches are up to date. -2. List candidate base branches in priority order: +1. Get the current branch name: `git branch --show-current` +2. Run `git fetch --all --quiet` to ensure remote branches are up to date. +3. List candidate base branches in priority order: - `main` - Any remote `feature/*` branches: `git branch -r | grep "feature/" | sed "s|.*origin/||"` -3. For each candidate, count how many commits HEAD is ahead of it: +4. For each candidate, count how many commits HEAD is ahead of it: ```bash git rev-list --count origin/..HEAD 2>/dev/null || echo 99999 ``` -4. Select the candidate with the fewest commits (the closest ancestor to HEAD). If two +5. Select the candidate with the fewest commits (the closest ancestor to HEAD). If two candidates tie, prefer `main`. If no candidate is reachable, fall back to `main`. +6. Parse owner and repo from the remote URL: + ```bash + git remote get-url origin + ``` + Extract `owner` and `repo` from formats like `https://github.com/owner/repo.git` + or `git@github.com:owner/repo.git`. ### 3 — Create the draft PR -Use `gh pr create` with: +Use `mcp__plugin_github_github__create_pull_request` with: -- `--draft` -- `--base ` -- `--title ": "` -- `--body` A well-structured description with these sections: +- `owner` and `repo` from step 2 +- `head`: the current branch name from step 2 +- `base`: the base branch from step 2 +- `draft`: `true` +- `title`: `": "` +- `body`: A well-structured description with these sections: - **Work item:** `` with a one-sentence summary of what the task required - **Changes:** A bullet list drawn from the work summaries — one bullet per logical change (new file, modified interface, new test scenario, etc.) @@ -75,6 +84,4 @@ The PR title and body are read by human reviewers — write them clearly and pre Output the PR URL as the final JSON line: -```json {"pr_url": "https://github.com/..."} -``` diff --git a/plugins/dev-team/commands/researcher-validate.md b/plugins/dev-team/commands/researcher-validate.md index e66b15a..79a1713 100644 --- a/plugins/dev-team/commands/researcher-validate.md +++ b/plugins/dev-team/commands/researcher-validate.md @@ -48,23 +48,21 @@ exercises the scenario — not just that the code path exists. ### 3 — Return the result -Return a JSON array. Each entry is one exit criterion. Do not omit criteria even if they -clearly pass — include them all so the caller has a complete picture. - -```json -[ - { - "criterion": "Exact text of the exit criterion from the spec", - "status": "pass | fail | partial", - "finding": "One-sentence explanation. Required for fail and partial; omit for pass." - } -] +Output a JSON object as the very last line of your response (bare, not inside a code block). +Set `status` to `"validated"` if all criteria passed, `"failed"` if any criterion is +`fail` or `partial`. Include the full criteria array so the developer has details. + +``` +{"status": "validated|failed", "criteria": [{"criterion": "...", "status": "pass|fail|partial", "finding": "..."}]} ``` +Each criterion entry: +- `criterion` — exact text of the exit criterion from the spec +- `status` — `pass`, `fail`, or `partial` +- `finding` — one-sentence explanation (required for `fail` and `partial`; omit for `pass`) + **Status definitions:** - `pass` — criterion is fully and demonstrably met by the code and tests - `partial` — criterion is met in part but not completely (e.g., happy path covered but error cases are not, or implementation exists but no test verifies it) - `fail` — criterion is not met - -Return only the JSON array — no surrounding text. diff --git a/plugins/dev-team/commands/reviewer-review.md b/plugins/dev-team/commands/reviewer-review.md index 3b6d26d..c5a8b57 100644 --- a/plugins/dev-team/commands/reviewer-review.md +++ b/plugins/dev-team/commands/reviewer-review.md @@ -36,15 +36,27 @@ error and stop. ## Step 4 — Retrieve and read the diff -Fetch the PR diff using the `gh` CLI: +Extract the pull number from `$PR_URL` (last path segment). Parse `owner` and `repo` from +the URL (e.g. `https://github.com/owner/repo/pull/123`). -```bash -gh pr diff +Fetch the PR diff using the GitHub MCP: + +``` +mcp__plugin_github_github__pull_request_read(method="get_diff", owner=, repo=, pullNumber=) ``` Read all changed files in full to understand the complete context of each change. -## Step 5 — Review the changes +## Step 5 — Create a pending review + +Before reviewing, create a pending review so inline comments can be added as issues are +discovered: + +``` +mcp__plugin_github_github__pull_request_review_write(method="create", owner=, repo=, pullNumber=) +``` + +## Step 6 — Review the changes Evaluate the diff against each dimension below, in priority order. For each issue you find, note the file, line number, and a clear description of the problem. @@ -88,38 +100,35 @@ find, note the file, line number, and a clear description of the problem. - Do tests use `MockBehavior.Strict` and `Expect_*` helpers? - Is there a `CreateSut()` method? -## Step 6 — Post the GitHub PR review +## Step 7 — Post each inline issue as you find it -Create a **pull request review** (not a regular PR comment) using the `gh` CLI: +For each Priority 1–4 issue discovered in step 6, post it immediately to the pending +review using the **source file line number** (not the diff position): -```bash -gh api "repos/${GITHUB_REPOSITORY}/pulls//reviews" \ - --method POST \ - --field body='' \ - --field event='COMMENT' \ - --field comments='[{"path":"","position":,"body":""}]' ``` +mcp__plugin_github_github__add_comment_to_pending_review(owner=, repo=, pullNumber=, path=, line=, side="RIGHT", subjectType="LINE", body=) +``` + +## Step 8 — Submit the review + +Submit the pending review with an overall summary. Use `event: COMMENT` — do not use +`APPROVE` or `REQUEST_CHANGES`, as GitHub rejects those when the reviewer and PR author +share the same account. -Notes: -- Use `event: COMMENT`, not `APPROVE` or `REQUEST_CHANGES` — GitHub rejects those from - the PR author's account, and the developer and reviewer agents share the same account. -- Each entry in `comments` must use the **diff position** (line number within the unified - diff), not the source file line number. Run `gh pr diff ` - and count lines from the start of each file's hunk to get the position. -- Do NOT use `POST /repos/.../issues/{number}/comments` — that creates a plain conversation - comment, not a structured review thread. +``` +mcp__plugin_github_github__pull_request_review_write(method="submit_pending", owner=, repo=, pullNumber=, body=, event="COMMENT") +``` -## Step 7 — Output +## Step 9 — Output Write a concise plain-text summary of all issues found (one bullet per issue, Priority 1–4 first, style issues last). This summary will be passed to the developer so they can address each point without re-reading the full PR thread. -Then output the JSON result as the final line: +Then output the following JSON object as the very last line of your response. +Write it as a bare JSON object — do not wrap it in a code block or add any text after it: -```json {"status": "approved|changes_requested", "pr_url": "https://github.com/..."} -``` Use `"approved"` if no Priority 1–4 issues were found; `"changes_requested"` otherwise. Always include the PR URL even when approving. diff --git a/plugins/dev-team/commands/reviewer-sign-off.md b/plugins/dev-team/commands/reviewer-sign-off.md index d0c8cf8..50715c0 100644 --- a/plugins/dev-team/commands/reviewer-sign-off.md +++ b/plugins/dev-team/commands/reviewer-sign-off.md @@ -23,57 +23,47 @@ The pipeline pushes the latest commits before invoking this sign-off. Check ## Step 3 — Retrieve prior review threads -Fetch all review threads on the PR using the GitHub GraphQL API. This is required because -the REST comments API returns numeric comment IDs, but `resolveReviewThread` requires a -Node ID (`PRRT_...` format) that only the GraphQL API provides. - -Extract the pull number from `$PR_URL` (last path segment), then run: - -```bash -OWNER="${GITHUB_REPOSITORY%%/*}" -REPO="${GITHUB_REPOSITORY##*/}" -gh api graphql -f query=' - query($owner: String!, $repo: String!, $pullNumber: Int!) { - repository(owner: $owner, name: $repo) { - pullRequest(number: $pullNumber) { - reviewThreads(first: 100) { - nodes { - id - isResolved - comments(first: 10) { - nodes { - body - path - author { login } - createdAt - } - } - } - } - } - } - }' -F owner="$OWNER" -F repo="$REPO" -F pullNumber= +Extract the pull number from `$PR_URL` (last path segment). Parse `owner` and `repo` from +the URL (e.g. `https://github.com/owner/repo/pull/123`). + +Fetch all review threads using the GitHub MCP: + +``` +mcp__plugin_github_github__pull_request_read(method="get_review_comments", owner=, repo=, pullNumber=) ``` -Each node's `id` field is the Node ID needed for `resolveReviewThread`. For each thread -where `isResolved` is `false`, note: +Each thread has: +- `id` — the node ID (`PRRT_...`) needed to resolve the thread +- `isResolved` — whether the thread is already resolved +- `comments.nodes[0]` — the first comment: its `body`, `path`, and numeric `id` + +For each thread where `isResolved` is `false`, note: - What was the original issue (from the first comment body)? - What file was it on (from the first comment path)? - Has that file been modified since the comment was posted? -## Step 4 — Check each prior thread for resolution +## Step 4 — Create a pending review + +Create a pending review before checking threads, so any replies or new inline comments +can be added to it as they are written: + +``` +mcp__plugin_github_github__pull_request_review_write(method="create", owner=, repo=, pullNumber=) +``` + +## Step 5 — Check each prior thread for resolution For each unresolved review comment: 1. Read the relevant section of the latest code and any developer replies in the thread. 2. Determine the outcome: - **Addressed satisfactorily:** the problem no longer exists in the code. Resolve the - thread using the GitHub GraphQL API: - ```bash - gh api graphql -f query='mutation { resolveReviewThread(input: {threadId: ""}) { thread { isResolved } } }' + thread using the GitHub MCP: + ``` + mcp__plugin_github_github__pull_request_review_write(method="resolve_thread", owner=, repo=, pullNumber=, threadId=) ``` - **Developer disagreed (posted rationale):** read the developer's rationale. - - **Accept the rationale:** add a reply acknowledging it and resolve the thread. + - **Accept the rationale:** add a reply acknowledging it, then resolve the thread. - **Reject the rationale:** add a reply restating the requirement and explaining why the rationale doesn't address the concern. Leave the thread unresolved. - **Partially addressed:** the developer made a change but the underlying problem @@ -82,7 +72,12 @@ For each unresolved review comment: - **Not addressed:** the code is unchanged and no rationale was posted. Add a follow-up comment restating what is needed and why. Leave the thread unresolved. -## Step 5 — Scan modified files for new issues + To reply to a thread, use the numeric comment ID of the thread's first comment: + ``` + mcp__plugin_github_github__add_reply_to_pull_request_comment(owner=, repo=, pullNumber=, commentId=, body=) + ``` + +## Step 6 — Scan modified files for new issues Identify all files that were modified since the last review push (use the PR diff or git log). Scan **only those files** for new issues introduced by the developer's fix — do not @@ -92,54 +87,48 @@ Apply the same priority order as the first-pass review: 1. Correctness/fault tolerance, 2. Security, 3. Performance, 4. Documentation, 5. Code style (note only) -Post new inline review comments for any new Priority 1–5 issues found in the modified -files. - -## Step 6 — Submit the sign-off review - -Submit a **pull request review** (not a plain PR comment) using the `gh` CLI: +For each new Priority 1–4 issue found, post it immediately to the pending review: -```bash -gh api "repos/${GITHUB_REPOSITORY}/pulls//reviews" \ - --method POST \ - --field body='' \ - --field event='COMMENT' \ - --field comments='[{"path":"","position":,"body":""}]' +``` +mcp__plugin_github_github__add_comment_to_pending_review(owner=, repo=, pullNumber=, path=, line=, side="RIGHT", subjectType="LINE", body=) ``` -Any new issues should be inline review comments attached to the specific file and line. -Submit with event type `COMMENT`. Do not use `APPROVE` or `REQUEST_CHANGES` — GitHub -rejects those from the PR author's account, and the developer and reviewer agents share -the same GitHub account. +## Step 7 — Submit the sign-off review **Sign-off decision:** Set `approved` if all review threads are resolved (no unresolved threads remain) AND no new Priority 1–4 issues were found in the modified files. Set `changes_requested` if any threads remain unresolved or if new blocking issues were found. -## Step 6a — Hand off to human reviewer (approved only) +Submit the pending review. Use `event: COMMENT` — do not use `APPROVE` or +`REQUEST_CHANGES`, as GitHub rejects those when the reviewer and PR author share the same account. + +``` +mcp__plugin_github_github__pull_request_review_write(method="submit_pending", owner=, repo=, pullNumber=, body=, event="COMMENT") +``` + +## Step 7a — Hand off to human reviewer (approved only) If the review outcome is **approved**, do the following before writing the output summary: 1. Convert the PR from draft to Ready for Review: - ```bash - gh pr ready + ``` + mcp__plugin_github_github__update_pull_request(owner=, repo=, pullNumber=, draft=false) ``` 2. Call `mcp__jira__lookupJiraAccountId` with `$REVIEW_ASSIGNEE_EMAIL` to get the human reviewer's account ID. 3. Assign the Jira issue to that account with `mcp__jira__editJiraIssue`. -4. Request a review from `$REVIEW_ASSIGNEE_EMAIL` on the PR: - ```bash - gh pr edit --add-reviewer +4. Request a GitHub review from the assignee: + ``` + mcp__plugin_github_github__update_pull_request(owner=, repo=, pullNumber=, reviewers=[""]) ``` 5. Add a brief Jira comment with `mcp__jira__addCommentToJiraIssue`: "PR ready for human review — reviewer requested on GitHub." -## Step 7 — Output +## Step 8 — Output Write a concise summary: - List each prior comment and whether it was resolved or still needs work - List any new issues found in the modified files -Then output the JSON result as the final line: +Then output the following JSON object as the very last line of your response. +Write it as a bare JSON object — do not wrap it in a code block or add any text after it: -```json {"status": "approved|changes_requested"} -``` diff --git a/plugins/dev-team/scripts/dev_team.py b/plugins/dev-team/scripts/dev_team.py index 6ae55e1..27d31dc 100644 --- a/plugins/dev-team/scripts/dev_team.py +++ b/plugins/dev-team/scripts/dev_team.py @@ -202,6 +202,7 @@ class PipelineContext: signoff_review: str = "" signoff_research: str = "" signoff_build_result: str = "" + validate_result: str = "" # Counters tracked for troubleshooter trigger conditions signoff_cycle_count: int = 0 consecutive_failures: int = 0 @@ -264,6 +265,9 @@ def save(self, path: Path) -> None: if self.signoff_build_result: lines += ["", "", "", self.signoff_build_result.strip()] + if self.validate_result: + lines += ["", "", "", self.validate_result.strip()] + log_links: list[str] = [] if self.build_log: log_links.append(f"- Build: {self.build_log}") @@ -320,6 +324,7 @@ def load(cls, path: Path) -> "PipelineContext": ctx.signoff_review = sections.get("Signoff Review", "") ctx.signoff_research = sections.get("Signoff Research", "") ctx.signoff_build_result = sections.get("Signoff Build Result", "") + ctx.validate_result = sections.get("Validate Result", "") return ctx @@ -417,28 +422,37 @@ def parse_json_output(text: str) -> dict: def _researcher_validated(content: str) -> bool: """Return True if researcher-validate reported success. - The task-runner writes the one-line result indicator from result_format - ("validated" | "failed") to the Signoff Research section. Fall back to - JSON-array parsing for output written by older skill versions. + Expects a JSON object with a "status" field ("validated" | "failed"), + matching the standardized skill output format. """ - text = content.strip() - if text == "validated": + result = parse_json_output(content) + status = result.get("status", "") + if status == "validated": return True - if text == "failed": + if status == "failed": return False - # Legacy / raw skill output: look for a JSON array with fail/partial items. - fenced = re.search(r"```(?:json)?\s*([\s\S]*?)\s*```", text) - candidate = fenced.group(1).strip() if fenced else text - try: - data = json.loads(candidate) - if isinstance(data, list): - return not any(item.get("status") in ("fail", "partial") for item in data) - except (json.JSONDecodeError, TypeError): - pass # Unrecognised format — treat as not validated so signoff retries. return False +def _parse_approval_status(content: str) -> str: + """Extract approval status from agent output. + + Returns "approved" or "changes_requested". Falls back to scanning the + content for the word "approved" if JSON parsing fails, to guard against + minor output format deviations. + """ + result = parse_json_output(content) + status = result.get("status", "") + if status in ("approved", "changes_requested"): + return status + # Secondary heuristic: look for the keyword in the text + lower = content.lower() + if "approved" in lower and "changes_requested" not in lower: + return "approved" + return "changes_requested" + + def _troubleshooter_descriptor( trigger: str, context_path: Path, ctx: "PipelineContext" ) -> dict: @@ -605,6 +619,7 @@ def run(self, ctx: PipelineContext) -> str: "message": "Researcher has written the task brief. Developer is now implementing.", "agent": "developer", "skill": "developer-implement", + "args": ctx.work_item_id, "context_file": str(self._context_path), "read_sections": ["Researcher Brief"], "write_section": "Implementation Summary", @@ -615,49 +630,57 @@ def run(self, ctx: PipelineContext) -> str: class ValidateStep(Step): handles = "validating" - def __init__(self, log_dir: Path) -> None: + _PENDING_KEY = "validate" + + def __init__(self, context_path: Path, log_dir: Path) -> None: + self._context_path = context_path self._log_dir = log_dir def run(self, ctx: PipelineContext) -> str: - print("Running scripts/validate-build...", flush=True) - try: - build_ok, build_log, build_tail = run_validate_script("validate-build", self._log_dir) - except (FileNotFoundError, OSError) as e: - print(f"Error running validate-build:\n{e}", file=sys.stderr) - sys.exit(1) - - ctx.build_log = str(build_log) - if not build_ok: - print(f"Build FAILED. Log: {build_log}", flush=True) + if ctx.validate_result: + # Re-entry: script-runner has written the result. + result = ctx.validate_result.strip() + ctx.validate_result = "" + ctx.pending_agent = "" + if result == "passed": + print("Validation passed.", flush=True) + ctx.last_failure = "" + _commit_and_push(ctx.work_item_id) + return "clean" + print(f"Validation FAILED. Log: {ctx.build_log}", flush=True) ctx.last_failure = ( - f"Build failed.\n\n" - f"Full log (read this for details): {build_log}\n\n" - f"Last 30 lines:\n```\n{build_tail}\n```" + f"Build or test failures.\n\n" + f"Full log (read this for details): {ctx.build_log}" ) return "build_failed" - print(f"Build clean. Log: {build_log}", flush=True) - print("Running scripts/validate-tests...", flush=True) - try: - tests_ok, tests_log, tests_tail = run_validate_script("validate-tests", self._log_dir) - except (FileNotFoundError, OSError) as e: - print(f"Error running validate-tests:\n{e}", file=sys.stderr) - sys.exit(1) - - ctx.test_log = str(tests_log) - if not tests_ok: - print(f"Tests FAILED. Log: {tests_log}", flush=True) - ctx.last_failure = ( - f"Test failures.\n\n" - f"Full log (read this for details): {tests_log}\n\n" - f"Last 30 lines:\n```\n{tests_tail}\n```" + if ctx.pending_agent == self._PENDING_KEY: + # Re-entry with no result — script-runner failed to write outcome. + _handle_agent_failure(ctx) + _check_and_trigger_troubleshooter( + "consecutive_failures", CONSECUTIVE_FAILURES_THRESHOLD, + ctx.consecutive_failures, ctx, self._context_path, ) - return "tests_failed" - print(f"Tests clean. Log: {tests_log}", flush=True) - ctx.last_failure = "" - _commit_and_push(ctx.work_item_id) - return "clean" + self._log_dir.mkdir(parents=True, exist_ok=True) + timestamp = datetime.datetime.now().strftime("%Y%m%dT%H%M%S") + log_path = self._log_dir / f"{ctx.work_item_id}-validate-{timestamp}.log" + ctx.build_log = str(log_path) + ctx.pending_agent = self._PENDING_KEY + ctx.save(self._context_path) + + ext = ".cmd" if sys.platform == "win32" else ".sh" + validate_script = REPO_ROOT / "scripts" / f"validate{ext}" + command = f'cmd /c "{validate_script}"' if sys.platform == "win32" else f'bash "{validate_script}"' + print(f"Spawning script-runner to validate {ctx.work_item_id}...", flush=True) + exit_with_actions([{ + "action": "run_script", + "message": "Running build and test validation.", + "command": command, + "log_file": str(log_path), + "write_section": "Validate Result", + "result_format": "passed | failed", + }]) class ReviewStep(Step): @@ -703,6 +726,7 @@ def run(self, ctx: PipelineContext) -> str: "message": "Implementation complete. Developer is creating a pull request.", "agent": "developer", "skill": "developer-create-pr", + "args": ctx.work_item_id, "context_file": str(self._context_path), "read_sections": read_sections, "write_section": "PR URL", @@ -712,8 +736,7 @@ def run(self, ctx: PipelineContext) -> str: # Sub-step 2: review if ctx.review_notes: _handle_agent_success(ctx) - result = parse_json_output(ctx.review_notes) - status = result.get("status", "changes_requested") + status = _parse_approval_status(ctx.review_notes) if status == "approved": print("Review approved.", flush=True) return "approved" @@ -898,8 +921,7 @@ def run(self, ctx: PipelineContext) -> str: f"Script result: {ctx.signoff_build_result.strip()}" ) - reviewer_result = parse_json_output(ctx.signoff_review) - reviewer_approved = reviewer_result.get("status", "changes_requested") == "approved" + reviewer_approved = _parse_approval_status(ctx.signoff_review) == "approved" if not reviewer_approved: failures.append(f"Reviewer sign-off:\n{ctx.signoff_review}") @@ -978,6 +1000,7 @@ def run(self, ctx: PipelineContext) -> str: ), "agent": "developer", "skill": "developer-fix", + "args": ctx.work_item_id, "context_file": str(self._context_path), "read_sections": read_sections, "write_section": write_section, @@ -1046,6 +1069,7 @@ def run(self, ctx: PipelineContext) -> str: ), "agent": "developer", "skill": "developer-fix", + "args": ctx.work_item_id, "context_file": str(self._context_path), "read_sections": read_sections, "write_section": write_section, @@ -1078,7 +1102,7 @@ def __init__( "debugging": DebugStep(context_path), "researching": ResearchStep(research_skill, context_path), "implementing": ImplementStep(context_path), - "validating": ValidateStep(log_dir), + "validating": ValidateStep(context_path, log_dir), "fixing": FixStep(context_path), "reviewing": ReviewStep(context_path), "signoff": SignoffStep(context_path, log_dir), @@ -1213,49 +1237,6 @@ def _parse_frontmatter(text: str) -> tuple[dict, str]: return metadata, body -def run_validate_script(script_name: str, log_dir: Path) -> tuple[bool, Path, str]: - """Run a validate script from the scripts/ directory. - - Logs full output to a timestamped file under log_dir. - Returns (success, log_path, tail) where tail is the last 30 lines. - """ - scripts_dir = REPO_ROOT / "scripts" - ext = ".cmd" if sys.platform == "win32" else ".sh" - script_path = scripts_dir / f"{script_name}{ext}" - - if not script_path.exists(): - raise FileNotFoundError( - f"Validate script not found: scripts/{script_name}{ext}" - ) - - log_dir.mkdir(parents=True, exist_ok=True) - timestamp = datetime.datetime.now().strftime("%Y%m%dT%H%M%S") - log_path = log_dir / f"{timestamp}-{script_name}.txt" - - if sys.platform == "win32": - invoke = ["cmd", "/c", str(script_path)] - else: - invoke = [str(script_path)] - - proc = subprocess.Popen( - invoke, - stdout=subprocess.PIPE, - stderr=subprocess.STDOUT, - text=True, - encoding="utf-8", - errors="replace", - cwd=REPO_ROOT, - ) - - lines: list[str] = [] - with log_path.open("w", encoding="utf-8") as log_file: - for line in proc.stdout: # type: ignore[union-attr] - log_file.write(line) - lines.append(line) - - proc.wait() - tail = "".join(lines[-30:]) - return proc.returncode == 0, log_path, tail def find_spec_file(work_item_id: str) -> Path: diff --git a/plugins/dev-team/scripts/test_dev_team.py b/plugins/dev-team/scripts/test_dev_team.py index 290b240..5e1314c 100644 --- a/plugins/dev-team/scripts/test_dev_team.py +++ b/plugins/dev-team/scripts/test_dev_team.py @@ -304,52 +304,73 @@ def test_empty_string_roundtrip(self, tmp_path): # --------------------------------------------------------------------------- -# _researcher_validated +# _parse_approval_status # --------------------------------------------------------------------------- -class TestResearcherValidated: - def test_validated_indicator_returns_true(self): - from dev_team import _researcher_validated - assert _researcher_validated("validated") is True +class TestParseApprovalStatus: + def test_approved_json_returns_approved(self): + from dev_team import _parse_approval_status + assert _parse_approval_status('{"status": "approved"}') == "approved" - def test_failed_indicator_returns_false(self): - from dev_team import _researcher_validated - assert _researcher_validated("failed") is False + def test_changes_requested_json_returns_changes_requested(self): + from dev_team import _parse_approval_status + assert _parse_approval_status('{"status": "changes_requested"}') == "changes_requested" + + def test_bare_approved_word_returns_approved(self): + from dev_team import _parse_approval_status + # Secondary heuristic when JSON parsing fails: "approved" keyword present + assert _parse_approval_status("LGTM. Status: approved.") == "approved" - def test_validated_with_whitespace_returns_true(self): + def test_both_words_present_returns_changes_requested(self): + from dev_team import _parse_approval_status + # If both keywords appear, don't false-positive as approved + content = "Previously approved but now changes_requested." + assert _parse_approval_status(content) == "changes_requested" + + def test_unrecognised_content_defaults_to_changes_requested(self): + from dev_team import _parse_approval_status + assert _parse_approval_status("some random output") == "changes_requested" + + def test_json_with_pr_url_approved(self): + from dev_team import _parse_approval_status + content = '{"status": "approved", "pr_url": "https://github.com/org/repo/pull/1"}' + assert _parse_approval_status(content) == "approved" + + +# --------------------------------------------------------------------------- +# _researcher_validated +# --------------------------------------------------------------------------- + +class TestResearcherValidated: + def test_validated_json_object_returns_true(self): from dev_team import _researcher_validated - assert _researcher_validated(" validated\n") is True + assert _researcher_validated('{"status": "validated"}') is True - def test_failed_with_whitespace_returns_false(self): + def test_failed_json_object_returns_false(self): from dev_team import _researcher_validated - assert _researcher_validated(" failed\n") is False + assert _researcher_validated('{"status": "failed"}') is False - def test_json_array_all_pass_returns_true(self): + def test_validated_with_criteria_array_returns_true(self): from dev_team import _researcher_validated - content = '[{"status": "pass", "criterion": "Tests pass"}]' + content = '{"status": "validated", "criteria": [{"criterion": "Tests pass", "status": "pass"}]}' assert _researcher_validated(content) is True - def test_json_array_with_fail_returns_false(self): + def test_failed_with_criteria_array_returns_false(self): from dev_team import _researcher_validated - content = '[{"status": "fail", "criterion": "Tests pass"}]' + content = '{"status": "failed", "criteria": [{"criterion": "Tests pass", "status": "fail", "finding": "not met"}]}' assert _researcher_validated(content) is False - def test_json_array_with_partial_returns_false(self): + def test_validated_embedded_in_prose_returns_true(self): from dev_team import _researcher_validated - content = '[{"status": "partial", "criterion": "Tests pass"}]' - assert _researcher_validated(content) is False + # JSON on its own line within agent prose output + content = 'All criteria were met.\n{"status": "validated", "criteria": []}' + assert _researcher_validated(content) is True - def test_json_array_in_fenced_block_with_fail(self): + def test_failed_embedded_in_prose_returns_false(self): from dev_team import _researcher_validated - content = '```json\n[{"status": "fail", "criterion": "x"}]\n```' + content = 'Some criteria were not met.\n{"status": "failed", "criteria": []}' assert _researcher_validated(content) is False - def test_description_containing_never_failed_is_not_false_positive(self): - from dev_team import _researcher_validated - # "failed" substring in a description must not cause a false-positive - content = '[{"status": "pass", "criterion": "Tests never failed"}]' - assert _researcher_validated(content) is True - def test_unrecognised_content_returns_false(self): from dev_team import _researcher_validated assert _researcher_validated("some unexpected output") is False