Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Tier 4: Pause (human intervention required)
- `STORY_REWRITTEN` — manager rewrote story description/acceptance criteria
- `STORY_SPLIT` — tech lead decomposed into child stories
- `STORY_SLA_BREACHED` — story exceeded per-complexity duration limit (configurable via `sla.max_minutes_per_complexity`)
- `REQ_BLOCKED` — completion gate could not get the composed mainline green after its auto-fix budget; requirement status → `blocked` instead of `completed` (resume with `--godmode` after addressing `.vxd-fix-gaps.md`)

### Event Sourcing
- **Source of truth**: `events.jsonl` (append-only, fsync'd)
Expand Down Expand Up @@ -125,6 +126,8 @@ qa:
value: "PASS"
- kind: file_exists
path: coverage.html
disable_completion_gate: false # default false = gate ON (verify composed mainline before REQ_COMPLETED)
completion_fix_cycles: 2 # auto-fix attempts vs a red mainline before REQ_BLOCKED (0→2, negative→hard gate)
billing:
default_rate: 150.0
currency: USD
Expand Down Expand Up @@ -419,6 +422,15 @@ The doc loop ships the FULL software-factory documentation set, not just README
- Every backstop is best-effort — a model failure logs and skips, never blocking requirement completion. The **scribe story** (`buildScribeStory`) instructs the agent to produce the whole set up front (its `OwnedFiles` + acceptance criteria now include `docs/adr` + `docs/README.md`); these backstops guarantee it ships even when the agent doesn't.
- Tests: `factory_docs_test.go` (docs index determinism, humanize, training skip/generate, ADR parse/render/slug/skip/generate), the upgraded `TestGenerateDocumentation_ProducesSVGDiagrams` (README + 2 SVGs + training + ADRs + index all produced and committed), `planner_test.go::TestPlanner_EmitsScribeStory` (standards baked into the brief). **NXD port pending.**

### Requirement-completion verification gate (completion_gate.go, 2026-06-26)
Closes the long-standing caveat: **vxd reported `REQ_COMPLETED` on code that did not compile.** Per-story QA runs in isolated worktrees and cannot see cross-story drift (an unwired composition root, a missing interface method, an import mismatch). The composed mainline was verified by `RunVerificationLoop`, but the result was *advisory* — gaps were written to `.vxd-fix-gaps.md` and logged, then `REQ_COMPLETED` fired anyway. This is the bug that shipped pulsereview "merged" with its `/reviews`+`/digest` endpoints 404 (composition root never assembled).
- **Where:** `internal/engine/completion_gate.go`. `CompletionGate.Run(ctx, reqID, repoDir) bool` runs in the requirement-completion path (`monitor_dispatch.go dispatchNextWave`, after `pullBaseAfterMerge` + `cleanupDanglingBranches`). Wired via `Monitor.SetCompletionGate` in `resume.go` next to `SetDocGenerator`/`SetTechLeadFixer` (`TestResume_WiresCompletionGate` guards the wire). Skipped in dry-run and when `qa.disable_completion_gate=true`.
- **Order matters:** the local checkout is pulled to the composed mainline (`pullBaseAfterMerge`) **before** the gate verifies, so cycle 1 verifies the true merged tree, not a stale pre-VXD checkout (this reordering is itself part of the fix).
- **Loop:** verify (`RunVerificationLoop` → `ShouldRunFixCycle`) → if green, emit `REQ_COMPLETED`. If red, run up to `completion_fix_cycles` (default 2) auto-fix cycles: dispatch a godmode fix agent (the same skip-permissions `llmClient` already used by the doc generator; runs `claude -p` in cwd, edits + commits + pushes the reconciliation), pull, re-verify. First green cycle → `REQ_COMPLETED`. Cycles exhausted → emit **`REQ_BLOCKED`** (new event → projects requirement status `"blocked"` in `sqlite.go`), leave `.vxd-fix-gaps.md`, and log `vxd resume <id> --godmode` guidance.
- **Graceful degradation as a safety property:** a nil client (no godmode / no LLM) makes the gate a **hard gate** — verify once, block on red, no auto-fix. The dangerous failure mode (silently completing on red) is impossible regardless of wiring, because the gate and the auto-fix are separate concerns. `completion_fix_cycles` < 0 forces hard-gate even with a client.
- **Config:** `qa.disable_completion_gate` (default false = ON), `qa.completion_fix_cycles` (0→2, negative→hard gate; `completionFixCycles` in resume.go pins the mapping).
- **Tests:** `completion_gate_test.go` (green-first→no-fix, red→green→auto-fix once, stays-red→block after maxCycles, nil-client→hard-gate, writes gaps file, `emitRequirementOutcome`→`blocked` status against real stores via injectable `verify`/`pull` seams), `projection_test.go::TestProject_ReqBlocked`, `resume_helpers_test.go::TestCompletionFixCycles`, `resume_wiring_test.go::TestResume_WiresCompletionGate`. **NXD port pending.**

### Model ID Compatibility
- **Use undated aliases, not dated snapshots.** Current defaults: `claude-opus-4-8` (tech_lead), `claude-sonnet-4-6` (senior/qa/manager), `claude-haiku-4-5` (cheapest). All three are verified working on the Claude CLI subscription tier.
- **Default execution tiers are all-Anthropic (2026-06-24 fix).** `DefaultConfig` previously set junior/intermediate/supervisor to `{google, gemma-4-27b-it}` — a model that 404s on the Google AI API (it does not exist on `v1beta`). Every low-complexity story spawned a gemini agent that died in ~10s producing no code, then limped forward by escalating to senior. Defaults are now `{anthropic, claude-haiku-4-5}` so a fresh install works with only the Claude CLI configured (no Google AI key/quota). `TestDefaultConfig_NoInvalidJuniorModel` pins this. **A model 404 in the agent runtime surfaces as "agent produced no code changes," NOT as a model error — if a whole tier silently produces nothing, validate the model ID with `gemini -m <id> -p OK` / `claude --model <id> -p OK` first.**
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -390,7 +390,7 @@ Run `vxd init` to generate `vxd.yaml` with sensible defaults, then customize:
| `merge` | Auto-merge toggle, base branch, PR body template, and human review mode (`auto`/`plan_only`/`manual`) | `auto_merge: true`, `base_branch: main`, `review_mode: auto` |
| `runtimes` | Map of named CLI runtime definitions — command, args, supported models, and idle/permission detection patterns | Includes built-in entries for `claude-code`, `codex`, `gemini`, `swe-agent`; each supports optional `runner: docker\|ssh` |
| `billing` | Hourly consulting rate, currency, Fibonacci-to-hours range mapping, and LLM cost accounting mode | `default_rate: 150.0`, `currency: USD`, `llm_costs.mode: subscription` |
| `qa` | Declarative success criteria evaluated after each story (output_contains, file_exists, file_contains, exit_code_zero, etc.) | No criteria by default; standard lint/build/test always run |
| `qa` | Declarative success criteria evaluated after each story (output_contains, file_exists, file_contains, exit_code_zero, etc.); `disable_pre_merge_verify` (turn off the per-story pre-merge build/test gate); and the requirement-completion gate — `disable_completion_gate` (turn off) + `completion_fix_cycles` (auto-fix attempts against a red composed mainline before blocking; `0`→default 2, negative→hard gate). The completion gate verifies the merged mainline and emits `REQ_BLOCKED` instead of `REQ_COMPLETED` when it cannot make the build/tests green. | No criteria by default; standard lint/build/test always run; `disable_pre_merge_verify: false`, `disable_completion_gate: false`, `completion_fix_cycles: 2` |
| `sla` | Per-Fibonacci-point maximum story duration in minutes; `auto_escalate` promotes breached stories to the next tier | `1pt→60m`, `2pt→120m`, `3pt→240m`, `5pt→480m`, `8pt→960m`, `13pt→1920m`; `auto_escalate: false` |
| `secrets` | Secrets provider: `env` (default, reads from environment) or `vault` (HashiCorp Vault KV v2) | `provider: env`; Vault settings: `vault_mount: secret`, `vault_path: vxd` |
| `notify` | Outbound Slack webhook URL and per-event triggers (`notify_on_sla`, `notify_on_complete`) | Disabled by default (empty `slack_webhook_url`) |
Expand Down
32 changes: 32 additions & 0 deletions internal/cli/resume.go
Original file line number Diff line number Diff line change
Expand Up @@ -497,6 +497,23 @@ func runResume(cmd *cobra.Command, args []string) error {
))
}

// Enable the requirement-completion verification gate: after all stories
// merge, verify the composed mainline (build + tests) and only emit
// REQ_COMPLETED when it is green — auto-fixing a red build for a bounded
// number of cycles (godmode client required to apply fixes), else emit
// REQ_BLOCKED. This closes the gap where a requirement was reported complete
// on code that does not compile. Skipped in dry-run (no real toolchain) and
// when explicitly disabled via qa.disable_completion_gate.
if !dryRun && !s.Config.QA.DisableCompletionGate {
fixCycles := completionFixCycles(s.Config.QA.CompletionFixCycles)
senior := s.Config.Models.Senior
monitor.SetCompletionGate(engine.NewCompletionGate(
llmClient, senior.Model, senior.MaxTokens, fixCycles,
s.Config.Merge.BaseBranch, s.Events, s.Proj,
))
log.Printf("[resume] completion gate enabled (auto-fix cycles=%d)", fixCycles)
}

rc := &engine.RunContext{
ReqID: reqID,
PlannedStories: plannedStories,
Expand Down Expand Up @@ -656,6 +673,21 @@ func buildQAConfig(cfg config.Config, projectDir, repoDir string) engine.QAConfi
return qaCfg
}

// completionFixCycles maps the configured qa.completion_fix_cycles value to the
// number of auto-fix cycles the completion gate should run: 0 selects the
// default of 2; a negative value disables auto-fix (hard gate — verify once,
// block on red); a positive value passes through verbatim.
func completionFixCycles(configured int) int {
switch {
case configured == 0:
return 2
case configured < 0:
return 0
default:
return configured
}
}

// resolveReviewerClient picks the client + model config for the post-execution
// code reviewer. When Models.Reviewer specifies a provider it is built
// independently (e.g. codex/gpt-5.5) — the reviewer is never spawned as a
Expand Down
20 changes: 20 additions & 0 deletions internal/cli/resume_helpers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -140,3 +140,23 @@ func TestRunResume_FlagsParse(t *testing.T) {
}
}


func TestCompletionFixCycles(t *testing.T) {
cases := []struct {
name string
input int
expected int
}{
{"zero uses default of 2", 0, 2},
{"positive value passes through", 3, 3},
{"one passes through", 1, 1},
{"negative disables auto-fix (hard gate)", -1, 0},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if got := completionFixCycles(tc.input); got != tc.expected {
t.Errorf("completionFixCycles(%d) = %d, want %d", tc.input, got, tc.expected)
}
})
}
}
18 changes: 18 additions & 0 deletions internal/cli/resume_wiring_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,21 @@ func TestResume_WiresTechLeadFixer(t *testing.T) {
}
}
}

// TestResume_WiresCompletionGate guards the requirement-completion gate against
// the same dead-wire class: the gate blocks REQ_COMPLETED on a red composed
// mainline, but only if runResume actually constructs and attaches it. This
// scans the resume source to confirm the gate is built and wired.
func TestResume_WiresCompletionGate(t *testing.T) {
src, err := os.ReadFile("resume.go")
if err != nil {
t.Fatalf("read resume.go: %v", err)
}
code := string(src)

for _, want := range []string{"NewCompletionGate(", "SetCompletionGate("} {
if !strings.Contains(code, want) {
t.Errorf("resume.go must wire the completion gate: missing %q", want)
}
}
}
12 changes: 12 additions & 0 deletions internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,18 @@ type QAConfig struct {
// a story that turns a green base branch red (keeping main always-green).
// Default (false) = gate ON. It never blocks when the base is already red.
DisablePreMergeVerify bool `yaml:"disable_pre_merge_verify,omitempty"`
// DisableCompletionGate turns OFF the requirement-completion verification
// gate. The gate verifies the composed mainline (build + tests) after all
// stories merge and only emits REQ_COMPLETED when it is green — otherwise it
// auto-fixes a red build (see CompletionFixCycles) and, failing that, emits
// REQ_BLOCKED. Default (false) = gate ON. When disabled, the legacy advisory
// verification runs and the requirement always completes.
DisableCompletionGate bool `yaml:"disable_completion_gate,omitempty"`
// CompletionFixCycles is the number of automatic fix cycles the completion
// gate runs against a red composed mainline before blocking. 0 uses the
// default of 2. Set to a negative value to disable auto-fix (hard gate only:
// verify once, block on red).
CompletionFixCycles int `yaml:"completion_fix_cycles,omitempty"`
}

// SuccessCriterion defines a declarative QA check.
Expand Down
182 changes: 182 additions & 0 deletions internal/engine/completion_gate.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
package engine

import (
"context"
"fmt"
"log"
"os"
"path/filepath"
"strings"
"time"

"github.com/tzone85/vortex-dispatch/internal/llm"
"github.com/tzone85/vortex-dispatch/internal/state"
)

// verifyFunc runs a verification cycle against repoDir and returns the result.
// It is a seam so tests can script red/green sequences without a real toolchain.
type verifyFunc func(ctx context.Context, repoDir string, cycle int) VerificationResult

// completionFixTimeout bounds a single auto-fix agent invocation.
const completionFixTimeout = 15 * time.Minute

// CompletionGate guards the REQ_COMPLETED signal. When every story has merged,
// it verifies the composed mainline (build + tests + artifacts) and — when the
// build is red — runs a bounded auto-fix loop: dispatch a fix agent, re-verify,
// repeat up to maxCycles. The requirement is only safe to mark complete when
// verification passes; otherwise the caller emits REQ_BLOCKED.
//
// This closes the long-standing gap where per-story QA (run in isolated
// worktrees) could not see cross-story drift, so a requirement was reported
// complete on code that does not compile.
type CompletionGate struct {
client llm.Client // godmode agent that applies fixes; nil ⇒ hard gate only
model string
maxTokens int
maxCycles int
baseBranch string
eventStore state.EventStore
projStore state.ProjectionStore

// Seams (default to real implementations; overridden in tests).
verify verifyFunc
pull func(repoDir, baseBranch string)
}

// NewCompletionGate constructs a gate. maxCycles is the number of auto-fix
// attempts before giving up; 0 makes the gate a pure pass/block check with no
// auto-fix. A nil client also degrades the gate to hard-gate behaviour.
func NewCompletionGate(
client llm.Client,
model string,
maxTokens, maxCycles int,
baseBranch string,
es state.EventStore,
ps state.ProjectionStore,
) *CompletionGate {
if baseBranch == "" {
baseBranch = "main"
}
return &CompletionGate{
client: client,
model: model,
maxTokens: maxTokens,
maxCycles: maxCycles,
baseBranch: baseBranch,
eventStore: es,
projStore: ps,
verify: func(ctx context.Context, repoDir string, cycle int) VerificationResult {
return RunVerificationLoop(ctx, repoDir, cycle)
},
pull: func(repoDir, baseBranch string) {
pullBaseAfterMerge(repoDir, baseBranch)
},
}
}

// Run verifies the composed mainline and auto-fixes a red build up to maxCycles
// times. It returns true when verification is green (safe to emit
// REQ_COMPLETED) and false when the mainline remains red after exhausting the
// auto-fix budget (caller should emit REQ_BLOCKED).
func (g *CompletionGate) Run(ctx context.Context, reqID, repoDir string) bool {
cycle := 1
res := g.verify(ctx, repoDir, cycle)
if !ShouldRunFixCycle(res) {
log.Printf("[gate] %s: verification clean on first pass — completion permitted", reqID)
return true
}

for attempt := 1; attempt <= g.maxCycles; attempt++ {
g.recordRedCycle(reqID, repoDir, res)

if g.client == nil {
log.Printf("[gate] %s: no auto-fix client configured — hard-gating on red build", reqID)
break
}

log.Printf("[gate] %s: auto-fix cycle %d/%d — dispatching fix agent for %d gap(s)",
reqID, attempt, g.maxCycles, len(res.Gaps))
if err := g.applyFix(ctx, repoDir, res); err != nil {
log.Printf("[gate] %s: auto-fix cycle %d failed to dispatch: %v", reqID, attempt, err)
break
}

g.pull(repoDir, g.baseBranch)

cycle++
res = g.verify(ctx, repoDir, cycle)
if !ShouldRunFixCycle(res) {
log.Printf("[gate] %s: verification clean after auto-fix cycle %d — completion permitted",
reqID, attempt)
return true
}
}

log.Printf("[gate] %s: mainline still red after %d auto-fix cycle(s) — BLOCKING completion",
reqID, g.maxCycles)
return false
}

// recordRedCycle persists the gap requirement to .vxd-fix-gaps.md for operator
// transparency. Best-effort: a write failure is logged, never fatal.
func (g *CompletionGate) recordRedCycle(reqID, repoDir string, res VerificationResult) {
fixReq := GapsToRequirement(res.Gaps, filepath.Base(repoDir))
if fixReq == "" {
return
}
fixPath := filepath.Join(repoDir, ".vxd-fix-gaps.md")
if err := os.WriteFile(fixPath, []byte(fixReq), 0o600); err != nil {
log.Printf("[gate] %s: failed to write %s: %v", reqID, fixPath, err)
}
}

// applyFix dispatches a single synchronous fix-agent run. The agent runs in
// godmode (skip-permissions) in the project's working directory, so it can
// read the codebase, edit files, run the build/tests, and commit + push the
// reconciliation to the base branch.
func (g *CompletionGate) applyFix(ctx context.Context, repoDir string, res VerificationResult) error {
fixCtx, cancel := context.WithTimeout(ctx, completionFixTimeout)
defer cancel()

prompt := g.buildFixPrompt(repoDir, res)
_, err := g.client.Complete(fixCtx, llm.CompletionRequest{
Model: g.model,
MaxTokens: g.maxTokens,
System: "You are a Tech Lead repairing a multi-story integration on the main branch. " +
"The composed codebase does not build or its tests fail. Make the minimal changes " +
"needed to turn the build and tests green, then commit and push to the base branch.",
Messages: []llm.Message{{Role: llm.RoleUser, Content: prompt}},
})
return err
}

// buildFixPrompt describes the failing build/tests and the exact remediation
// contract (fix → build → test → commit → push).
func (g *CompletionGate) buildFixPrompt(repoDir string, res VerificationResult) string {
var sb strings.Builder
sb.WriteString("The main branch of this repository is the composed result of several merged stories ")
sb.WriteString("and is currently failing verification.\n\n")

fmt.Fprintf(&sb, "Build passes: %v\n", res.BuildPasses)
fmt.Fprintf(&sb, "Tests: %d passing / %d failing / %d total\n\n", res.TestsPassing, res.TestsFailing, res.TestsTotal)

if len(res.Gaps) > 0 {
sb.WriteString("Gaps detected:\n")
for _, gap := range res.Gaps {
fmt.Fprintf(&sb, " - [%s/%s] %s: %s\n", gap.Category, gap.Severity, gap.File, gap.Detail)
}
sb.WriteString("\n")
}

sb.WriteString("Working directory: ")
sb.WriteString(repoDir)
sb.WriteString("\n\nDo the following, in order:\n")
sb.WriteString("1. Investigate the failing build/tests (read the affected files and error output).\n")
sb.WriteString("2. Apply the MINIMAL change that reconciles the cross-story break — typically a missing ")
sb.WriteString("interface method, an unwired entry point, an import mismatch, or a composition root that ")
sb.WriteString("was never assembled. Do not rewrite working code.\n")
sb.WriteString("3. Run the project's build and test commands and confirm they pass.\n")
fmt.Fprintf(&sb, "4. Commit the fix with a clear message and push it to the '%s' branch.\n", g.baseBranch)
sb.WriteString("Do NOT ask clarifying questions. Do NOT produce JSON. Apply the fix directly.")
return sb.String()
}
Loading
Loading