diff --git a/CLAUDE.md b/CLAUDE.md index b309a0e..e63219a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -453,6 +453,13 @@ A self-upskilling security agent embedded in vxd's core so every build is review - **Config:** `security.disable_gate` (default false=ON), `security.gate_severity`, `security.auto_learn` (default true), `security.kb_path`. **Events:** `STORY_SECURITY_PASSED/FAILED`, `SECURITY_SCAN_COMPLETED`, `SECURITY_RULE_LEARNED` (all in the projection switch; `TestProject_AllDeclaredEventsHandled` guards exhaustiveness). - **Tests:** `internal/security/*_test.go` (16: KB roundtrip/immutability/lang-filter/checklist/Covers, scanner applicability, all 5 parsers, report) + `engine/security_gate_test.go` (7: scan aggregation+event, block-on-critical, pass-below-threshold, self-upskill on new class, no-relearn known class, LLM-findings parse). Host scanners installed (gosec, govulncheck, gitleaks, semgrep); the `security_scanners` preflight check (`CheckSecurityScanners`, WARNING tier) reports any that go missing, with install hints from `security.InstallHint`. **NXD port pending.** +### Frontend design skill (agent/frontend.go + engine/detect.go, 2026-07-02) +vxd-built web UIs previously came out as generic "AI slop" (Inter font, purple gradients, template hero + three feature cards). The factory now carries an embedded frontend-design skill so UI-facing stories are both *planned* and *implemented* with design intent: +- **`agent.FrontendDesignBrief`** — a single-source design-standards block (adapted from Anthropic's frontend-design skill + current anti-slop research): two-pass token-first process (palette/type/layout/signature plan, self-critique for genericness, then code derived from the plan), named banned defaults (Inter/Roboto, purple-gradient-on-white, cream `#F4F1EA`+serif+terracotta, acid-green-on-black, the template feature-card page, scattered animations), a non-negotiable accessibility floor (responsive to 360px, visible keyboard focus, WCAG AA contrast, `prefers-reduced-motion`, 44px touch targets, designed empty/loading/error states, CSS specificity discipline), and copy-as-design-material rules (real product copy, "Save changes" never "Submit"). Size pinned by `TestFrontendDesignBrief_SizeBudget` (≤6 KB — it rides on every UI dispatch). +- **Detection:** `detectFrontend(title, description, ownedFiles)` (`engine/detect.go`) — owned-file extensions (`.tsx/.jsx/.vue/.svelte/.css/.html/...`) are the strongest signal, plus whole-word UI vocabulary (`frontendKeywordRe`; word boundaries so "pagination"/"performance"/"review" don't false-positive). Sets `PromptContext.IsFrontend`/`TemplateContext.IsFrontend` in the executor for BOTH the first dispatch and the retry path (`TestExecutor_WiresFrontendDetection` guards the wire). +- **Planning:** the Tech-Lead ENGINEERING STANDARDS block now requires the FIRST UI story to establish a design-token foundation (palette/typeface-pairing/spacing as CSS custom properties or the framework theme) with later UI stories consuming those tokens, and UI acceptance criteria to include the accessibility quality floor (`TestPlanner_PromptIncludesEngineeringStandards` pins it). +- **Tests:** `agent/frontend_test.go` (brief injected only when flagged, retry path carries it, size budget), `engine/detect_test.go::TestDetectFrontend` (21 cases incl. substring traps). **NXD port pending.** + ### Model ID Compatibility - **Use undated aliases, not dated snapshots.** Current defaults: `claude-opus-4-8` (tech_lead), `claude-sonnet-4-6` (senior/qa/manager), `claude-haiku-4-5` (cheapest). All three are verified working on the Claude CLI subscription tier. - **Default execution tiers are all-Anthropic (2026-06-24 fix).** `DefaultConfig` previously set junior/intermediate/supervisor to `{google, gemma-4-27b-it}` — a model that 404s on the Google AI API (it does not exist on `v1beta`). Every low-complexity story spawned a gemini agent that died in ~10s producing no code, then limped forward by escalating to senior. Defaults are now `{anthropic, claude-haiku-4-5}` so a fresh install works with only the Claude CLI configured (no Google AI key/quota). `TestDefaultConfig_NoInvalidJuniorModel` pins this. **A model 404 in the agent runtime surfaces as "agent produced no code changes," NOT as a model error — if a whole tier silently produces nothing, validate the model ID with `gemini -m -p OK` / `claude --model -p OK` first.** diff --git a/README.md b/README.md index 89ec247..2f2262b 100644 --- a/README.md +++ b/README.md @@ -193,11 +193,12 @@ vhs docs/demo.tape - **Smart retry with error analysis** -- 8 error categories with targeted fix suggestions passed to retry agents - **Human review gates** -- three modes (auto, plan_only, manual) for plan approval and PR review - **Crash recovery** -- lock files, checkpoints, and consistency checks for resuming after process death -- **Pre-flight validation** -- 12 environment checks across 3 severity tiers before pipeline execution +- **Pre-flight validation** -- 16 environment checks across 3 severity tiers before pipeline execution - **Cost estimation** -- quick heuristic and LLM-based estimation with Fibonacci-to-hours mapping - **Watchdog monitoring** -- stuck detection, permission bypass, context freshness checks - **Supervisor oversight** -- periodic drift detection and reprioritization - **Senior code review** -- automated review via LLM with approve/request-changes verdicts +- **Frontend design skill** -- UI-facing stories are detected (owned files + story text) and their agents receive an embedded design brief: token-first planning, one signature element, named anti-"AI slop" defaults banned, and a WCAG accessibility floor; the planner requires a design-token foundation story for web UIs - **Automated QA pipeline** -- lint, build, and test with declarative success criteria (6 kinds) - **Auto-merge with PR creation** -- stories flow from code to merged PR hands-free - **LLM-powered conflict resolution** -- rebase conflicts auto-resolved; binary files handled without LLM (deterministic policy); complex/multi-file conflicts escalate to Tech Lead with full requirement DAG context diff --git a/internal/agent/frontend.go b/internal/agent/frontend.go new file mode 100644 index 0000000..77fd885 --- /dev/null +++ b/internal/agent/frontend.go @@ -0,0 +1,67 @@ +package agent + +// FrontendDesignBrief is vxd's frontend design skill: the standards block +// injected into the goal prompt of every UI-facing story (PromptContext. +// IsFrontend, detected in the engine from owned files + story text). +// +// Synthesized from Anthropic's frontend-design skill and current guidance on +// avoiding generic AI-generated design ("AI slop"): token-first planning, one +// signature element, named anti-pattern looks, real copy, and a non-negotiable +// accessibility floor. Content lives in one const so the planner, the goal +// prompt, and tests share a single source of truth. Keep it under the size +// budget pinned by TestFrontendDesignBrief_SizeBudget — it rides on every +// UI-story dispatch. +const FrontendDesignBrief = ` +## FRONTEND DESIGN — MANDATORY STANDARDS + +You are also the design lead for this UI. The client rejects anything that +looks templated. Make deliberate, opinionated choices specific to THIS +product, its audience, and this page's single job. + +### Two-pass process (plan tokens BEFORE code) +1. Write a compact design-token plan first, as a comment block or DESIGN.md: + - Palette: 4-6 named hex values — one dominant color creating atmosphere, + one sharp accent. Not evenly-distributed timid pastels. + - Type: 2+ roles — a characterful display face used with restraint, a + complementary body face (never the same family you'd pick for any other + project), optional utility face for data. + - Layout: one-sentence concept. Structure must encode something true about + the content (numbered markers only if the content really is a sequence). + - Signature: the ONE element this page will be remembered by. Spend your + boldness there; keep everything around it quiet and disciplined. +2. Critique the plan before coding: if any part is what you would produce for + ANY similar brief, it is a default, not a choice — revise it. Then write + the code deriving every color and type decision from the plan. Encode the + tokens once (CSS custom properties or the Tailwind theme), never ad-hoc + per component. + +### Banned defaults (these read as AI-generated) +- Fonts: Inter, Roboto, Open Sans, Lato, Arial, bare system-ui as the design. +- Purple/blue-purple gradients on white; emerald or acid-green single accent + on near-black; warm cream #F4F1EA + serif display + terracotta accent; + broadsheet hairlines with zero border-radius EVERYWHERE. All four are + legitimate only if the brief explicitly asks for them. +- The template page: gradient hero → vague centered headline → three feature + cards with icons → testimonials → footer. Uniform 16px-radius cards. +- Scattered animations. One orchestrated moment (a page-load sequence or a + scroll reveal) beats effects everywhere; extra motion reads as generated. + +### Quality floor (non-negotiable, never announced in the UI) +- Responsive down to 360px wide; no horizontal scroll, no overlapping text. +- Visible keyboard focus on every interactive element (focus-visible ring). +- prefers-reduced-motion respected: gate every animation on it. +- WCAG AA contrast: 4.5:1 body text, 3:1 large text and UI components. +- Touch targets at least 44x44px. Semantic HTML (nav/main/button, alt text, + labels tied to inputs) — a div with onClick is not a button. +- Empty, loading, and error states designed, not defaulted. +- Watch CSS specificity: section-level and element-level spacing rules that + cancel each other are the classic generated-CSS failure. + +### Copy is design material +Write real copy for THIS product — never lorem ipsum or vague marketing +lines. Buttons say exactly what happens ("Save changes", never "Submit"); +the same action keeps the same name through the whole flow. Errors state +what went wrong and how to fix it, without apologizing. Name things by what +the user controls ("notifications"), not how the system is built ("webhook +config"). Active voice, sentence case, no filler. +` diff --git a/internal/agent/frontend_test.go b/internal/agent/frontend_test.go new file mode 100644 index 0000000..95a0d96 --- /dev/null +++ b/internal/agent/frontend_test.go @@ -0,0 +1,83 @@ +package agent + +import ( + "strings" + "testing" +) + +// The frontend design brief is injected into the goal prompt only for +// UI-facing stories (ctx.IsFrontend). It is the vxd factory's design skill: +// token-first planning, one signature element, the named anti-pattern looks, +// and a non-negotiable accessibility floor. +func TestGoalPrompt_FrontendBriefInjectedWhenFlagSet(t *testing.T) { + ctx := PromptContext{ + StoryID: "s-1", + StoryTitle: "Build the landing page", + StoryDescription: "Marketing page for the product", + AcceptanceCriteria: "- page renders", + IsFrontend: true, + } + got := GoalPrompt(RoleSenior, ctx) + + for _, want := range []string{ + "FRONTEND DESIGN", // section header + "Signature", // one memorable element + "token", // token-first plan before code + "prefers-reduced-motion", // quality floor + "focus", // visible keyboard focus + "Inter", // named anti-pattern font + "purple", // named anti-pattern palette + "#F4F1EA", // named second-generation cliché + "WCAG", // contrast floor + "Submit", // copy rule: never label a button Submit + } { + if !strings.Contains(got, want) { + t.Errorf("frontend brief missing %q", want) + } + } +} + +func TestGoalPrompt_FrontendBriefAbsentForBackendStories(t *testing.T) { + ctx := PromptContext{ + StoryID: "s-2", + StoryTitle: "Create REST API endpoints", + StoryDescription: "Express routes", + AcceptanceCriteria: "- routes tested", + IsFrontend: false, + } + got := GoalPrompt(RoleSenior, ctx) + if strings.Contains(got, "FRONTEND DESIGN") { + t.Error("backend story must not carry the frontend design brief") + } +} + +// Retry dispatches go through RenderGoalWithAttempts — the brief must survive +// the retry path too, or the second attempt regresses to default design. +func TestRenderGoalWithAttempts_CarriesFrontendBrief(t *testing.T) { + ctx := TemplateContext{ + StoryID: "s-1", + StoryTitle: "Build the landing page", + StoryDescription: "Marketing page", + AcceptanceCriteria: "- page renders", + IsFrontend: true, + IsRetry: true, + RetryNumber: 2, + ReviewFeedback: "colors are generic", + PriorAttempts: []AttemptSummary{{Number: 1, Role: "senior", Outcome: "review_failed"}}, + } + got := RenderGoalWithAttempts(ctx) + if !strings.Contains(got, "FRONTEND DESIGN") { + t.Error("retry path must carry the frontend design brief") + } +} + +// The brief itself must stay within a sane token budget — it rides on every +// UI story dispatch. ~6k chars ≈ 1.5k tokens is the ceiling. +func TestFrontendDesignBrief_SizeBudget(t *testing.T) { + if n := len(FrontendDesignBrief); n > 6000 { + t.Errorf("FrontendDesignBrief is %d chars — trim it below 6000 (prompt budget)", n) + } + if n := len(FrontendDesignBrief); n < 1500 { + t.Errorf("FrontendDesignBrief is %d chars — suspiciously small, did the content get lost?", n) + } +} diff --git a/internal/agent/prompts.go b/internal/agent/prompts.go index 68f0a7a..29265b6 100644 --- a/internal/agent/prompts.go +++ b/internal/agent/prompts.go @@ -22,6 +22,7 @@ type PromptContext struct { IsExistingCodebase bool // true when working on a client's existing repo IsBugFix bool // true when the story is about fixing a bug IsInfrastructure bool // true when the story involves Docker/CI/deployment + IsFrontend bool // true when the story builds/changes a user-facing web UI WaveContext string // summary of what prior stories built (from WAVE_CONTEXT.md) DesignApproach string // "ddd-tdd" (default), "tdd", "standard" } @@ -116,6 +117,10 @@ BUG FIX — MANDATORY WORKFLOW: 5. VERIFY: Failing test now passes. Full test suite still passes. No regressions.` } + if ctx.IsFrontend { + base += "\n" + FrontendDesignBrief + } + if ctx.IsInfrastructure { base += ` diff --git a/internal/agent/render.go b/internal/agent/render.go index f3fb3c2..e3ddade 100644 --- a/internal/agent/render.go +++ b/internal/agent/render.go @@ -33,6 +33,7 @@ type TemplateContext struct { IsExistingCodebase bool IsBugFix bool IsInfrastructure bool + IsFrontend bool // true when the story builds/changes a user-facing web UI IsRetry bool // true if this is not the first attempt RetryNumber int // which attempt this is (1-indexed) } @@ -78,6 +79,7 @@ func RenderGoalWithAttempts(ctx TemplateContext) string { IsExistingCodebase: ctx.IsExistingCodebase, IsBugFix: ctx.IsBugFix, IsInfrastructure: ctx.IsInfrastructure, + IsFrontend: ctx.IsFrontend, WaveContext: ctx.WaveContext, } diff --git a/internal/engine/detect.go b/internal/engine/detect.go index dff76bf..cce9930 100644 --- a/internal/engine/detect.go +++ b/internal/engine/detect.go @@ -4,6 +4,7 @@ import ( "os" "os/exec" "path/filepath" + "regexp" "strings" ) @@ -76,6 +77,44 @@ func detectBugFix(title, description string) bool { return false } +// frontendFileExts are file extensions that mark a story as UI-facing when they +// appear in its owned files. +var frontendFileExts = map[string]bool{ + ".tsx": true, ".jsx": true, ".vue": true, ".svelte": true, + ".css": true, ".scss": true, ".sass": true, ".less": true, + ".html": true, ".astro": true, +} + +// detectFrontend checks if the story builds or changes a user-facing web UI. +// This triggers the FrontendDesignBrief injection (agent.FrontendDesignBrief) +// so agents produce distinctive, accessible frontends instead of generic +// AI-default design. Detection combines owned-file extensions (strongest +// signal) with title/description keywords. +func detectFrontend(title, description string, ownedFiles []string) bool { + for _, f := range ownedFiles { + if frontendFileExts[strings.ToLower(filepath.Ext(f))] { + return true + } + } + return frontendKeywordRe.MatchString(strings.ToLower(title + " " + description)) +} + +// frontendKeywordRe matches UI vocabulary as whole words only — plain substring +// matching trips on "pagination" (page), "performance" (form), "review" (view). +// Deliberately absent: "html" (server-side HTML emails/reports are backend +// work; real UI stories carry .html in owned files or another keyword) and +// "responsive" ("responsive API gateway" means fast, not responsive design). +// Hoisted to package level so detection never recompiles it (perf convention). +var frontendKeywordRe = regexp.MustCompile(`\b(` + + `frontend|front-end|ui|ux|user interface|` + + `landing page|page|screen|view|component|widget|` + + `dashboard|layout|styling|stylesheet|css|` + + `tailwind|react|vue|svelte|next\.js|nextjs|astro|` + + `design system|web app|webapp|website|` + + `form|modal|navbar|navigation bar|sidebar|button|` + + `theme|dark mode|typography` + + `)\b`) + // detectInfrastructure checks if the story involves Docker, CI/CD, deployment, // or infrastructure concerns. This triggers the InfrastructureDebugging playbook. func detectInfrastructure(title, description string) bool { diff --git a/internal/engine/detect_test.go b/internal/engine/detect_test.go index 0ff1602..41f7de4 100644 --- a/internal/engine/detect_test.go +++ b/internal/engine/detect_test.go @@ -34,6 +34,53 @@ func TestDetectBugFix(t *testing.T) { } } +func TestDetectFrontend(t *testing.T) { + tests := []struct { + name string + title, desc string + ownedFiles []string + want bool + }{ + {"ui-keyword-title", "Build the landing page", "", nil, true}, + {"component-keyword", "Create reusable Button component", "", nil, true}, + {"dashboard-keyword", "Admin dashboard with charts", "", nil, true}, + {"frontend-in-desc", "", "Implement the frontend for task management", nil, true}, + {"tailwind-keyword", "Style the app", "Use Tailwind for the layout", nil, true}, + {"react-keyword", "Task list view", "React component rendering tasks", nil, true}, + {"owned-tsx", "Wire task state", "", []string{"src/App.tsx"}, true}, + {"owned-css", "Polish spacing", "", []string{"styles/main.css"}, true}, + {"owned-vue", "Item editor", "", []string{"src/Editor.vue"}, true}, + {"owned-svelte", "Item editor", "", []string{"src/Editor.svelte"}, true}, + {"owned-html", "Static page", "", []string{"public/index.html"}, true}, + {"backend-only", "Create REST API endpoints", "Express routes for tasks", nil, false}, + {"db-story", "Add database migrations", "Postgres schema for users", nil, false}, + {"go-files-only", "Implement parser", "", []string{"internal/parser/parser.go"}, false}, + {"cli-story", "Add --json flag to CLI", "", []string{"cmd/root.go"}, false}, + // "server-side rendering of the page" mentions page — still frontend work. + {"ssr", "Server-side rendering of the settings page", "", nil, true}, + // Substring traps: keywords must match whole words only. + {"pagination-is-not-page", "Add pagination to the tasks API", "cursor-based pagination in the repository layer", nil, false}, + {"performance-is-not-form", "Improve performance of the query planner", "", nil, false}, + {"review-is-not-view", "Code review automation for PRs", "LLM review of diffs", nil, false}, + {"format-is-not-form", "Format output as JSON", "", nil, false}, + {"build-is-not-ui", "Build the release pipeline", "artifact signing", nil, false}, + // Server-side HTML and "responsive" infrastructure are NOT UI work. + {"html-email-is-backend", "Generate HTML email report", "render the weekly digest as text/html", nil, false}, + {"responsive-gateway-is-backend", "Design a responsive API gateway", "low-latency request routing", nil, false}, + // But real UI stories that mention html carry the file or another keyword. + {"html-with-owned-file", "Static marketing site", "", []string{"public/index.html"}, true}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got := detectFrontend(tt.title, tt.desc, tt.ownedFiles) + if got != tt.want { + t.Errorf("detectFrontend(%q, %q, %v) = %v, want %v", tt.title, tt.desc, tt.ownedFiles, got, tt.want) + } + }) + } +} + func TestDetectInfrastructure(t *testing.T) { tests := []struct { title, desc string diff --git a/internal/engine/executor.go b/internal/engine/executor.go index b58446f..831b610 100644 --- a/internal/engine/executor.go +++ b/internal/engine/executor.go @@ -204,6 +204,7 @@ func (e *Executor) spawn(repoDir string, a Assignment, story PlannedStory) Spawn isExisting := detectExistingCodebase(worktreePath) isBug := detectBugFix(story.Title, story.Description) isInfra := detectInfrastructure(story.Title, story.Description) + isFrontend := detectFrontend(story.Title, story.Description, story.OwnedFiles) // Load RepoProfile if available to enrich prompts with pre-learned knowledge. var techStackStr, lintCmd, buildCmd, testCmd string @@ -233,6 +234,7 @@ func (e *Executor) spawn(repoDir string, a Assignment, story PlannedStory) Spawn IsExistingCodebase: isExisting, IsBugFix: isBug, IsInfrastructure: isInfra, + IsFrontend: isFrontend, TechStack: techStackStr, LintCommand: lintCmd, BuildCommand: buildCmd, @@ -273,6 +275,7 @@ func (e *Executor) spawn(repoDir string, a Assignment, story PlannedStory) Spawn IsExistingCodebase: isExisting, IsBugFix: isBug, IsInfrastructure: isInfra, + IsFrontend: isFrontend, IsRetry: true, RetryNumber: len(attempts) + 1, PriorAttempts: priorAttempts, diff --git a/internal/engine/frontend_wiring_test.go b/internal/engine/frontend_wiring_test.go new file mode 100644 index 0000000..507e5c4 --- /dev/null +++ b/internal/engine/frontend_wiring_test.go @@ -0,0 +1,28 @@ +package engine + +import ( + "os" + "strings" + "testing" +) + +// TestExecutor_WiresFrontendDetection guards against a dead-wire regression: +// the frontend design brief (agent.FrontendDesignBrief + PromptContext. +// IsFrontend) only fires if the executor actually calls detectFrontend and +// threads the flag into BOTH prompt paths (first dispatch via PromptContext +// and retries via TemplateContext). The prompt-injection unit tests cannot +// catch the executor never setting the flag. +func TestExecutor_WiresFrontendDetection(t *testing.T) { + src, err := os.ReadFile("executor.go") + if err != nil { + t.Fatalf("read executor.go: %v", err) + } + code := string(src) + + if !strings.Contains(code, "detectFrontend(") { + t.Error("executor.go must call detectFrontend — the design brief is otherwise dead code") + } + if got := strings.Count(code, "IsFrontend:"); got < 2 { + t.Errorf("executor.go must thread IsFrontend into both the PromptContext and the retry TemplateContext, found %d assignment(s)", got) + } +} diff --git a/internal/engine/planner.go b/internal/engine/planner.go index c9d5f51..de12e81 100644 --- a/internal/engine/planner.go +++ b/internal/engine/planner.go @@ -163,6 +163,7 @@ ENGINEERING STANDARDS — every code story's description AND acceptance_criteria - Error handling: handle errors explicitly at every level; no silent failures; user-facing messages must not leak internals/stack traces. - Tests: every code story includes tests covering the happy path AND at least one failure/edge path. - Docs as you go: stories that add a public surface document it (in docs/ or the relevant module doc) so the final README/training can link real content. +- Frontend design (any story that builds or changes a user-facing web UI — vxd injects a full design brief into those agents, so plan for it): the FIRST UI story must establish a design-token foundation (palette of 4-6 named colors with one dominant + one accent, a display+body typeface pairing that is NOT Inter/Roboto/system defaults, spacing scale — as CSS custom properties or the framework theme) and every later UI story consumes those tokens, never ad-hoc values. UI acceptance criteria MUST include the quality floor: responsive to 360px, visible keyboard focus, WCAG AA contrast, prefers-reduced-motion respected, and designed empty/loading/error states. Copy is real product copy, never lorem ipsum or "Submit". CRITICAL RULES: - Respond ONLY with the JSON array. No prose, no explanations, no questions. diff --git a/internal/engine/planner_test.go b/internal/engine/planner_test.go index 86fa116..46993dd 100644 --- a/internal/engine/planner_test.go +++ b/internal/engine/planner_test.go @@ -246,6 +246,13 @@ func TestPlanner_PromptIncludesEngineeringStandards(t *testing.T) { t.Errorf("decomposition prompt missing engineering standard %q", must) } } + // Frontend design is part of the factory standards: token-first foundation + // story, no default fonts, and the accessibility quality floor in criteria. + for _, must := range []string{"design-token", "Inter", "WCAG", "prefers-reduced-motion", "360px"} { + if !strings.Contains(prompt, must) { + t.Errorf("decomposition prompt missing frontend design standard %q", must) + } + } } // TestPlanner_PromptMandatesAssembledAppEndpointTests pins the recurrence guard diff --git a/internal/engine/security_gate_llm_test.go b/internal/engine/security_gate_llm_test.go new file mode 100644 index 0000000..c8d5636 --- /dev/null +++ b/internal/engine/security_gate_llm_test.go @@ -0,0 +1,196 @@ +package engine + +import ( + "context" + "encoding/json" + "os" + "path/filepath" + "strings" + "testing" + + "github.com/tzone85/vortex-dispatch/internal/llm" + "github.com/tzone85/vortex-dispatch/internal/security" +) + +const llmCriticalFinding = `[{"severity":"critical","title":"SQL injection in user filter","file":"db/users.go","line":88,"rule_id":"CWE-89","detail":"query concatenates request input; use parameterised queries"}]` + +// ScanRepo with an LLM client merges the threat-model review's findings with +// the scanner findings — this drives llmReview + callLLM + parseLLMFindings. +func TestSecurityGate_ScanRepo_MergesLLMFindings(t *testing.T) { + kbPath := filepath.Join(t.TempDir(), "kb.json") + scannerFinding := security.Finding{Tool: "gosec", RuleID: "G101", Severity: security.SeverityHigh, File: "a.go", Line: 1, Title: "hardcoded cred", Source: "scanner"} + client := llm.NewReplayClient(llm.CompletionResponse{Content: llmCriticalFinding}) + g := newTestSecurityGate(t, client, kbPath, security.SeverityCritical, false, fakeScan(scannerFinding)) + + report, err := g.ScanRepo(context.Background(), t.TempDir()) + if err != nil { + t.Fatalf("ScanRepo: %v", err) + } + if report.Total() != 2 { + t.Fatalf("want scanner + llm findings merged (2), got %d: %+v", report.Total(), report.Findings) + } + var llmF *security.Finding + for i := range report.Findings { + if report.Findings[i].Source == "llm" { + llmF = &report.Findings[i] + } + } + if llmF == nil { + t.Fatal("LLM finding missing from report") + } + if llmF.Tool != "llm" || llmF.RuleID != "CWE-89" || llmF.Severity != security.SeverityCritical || llmF.Line != 88 { + t.Errorf("LLM finding mis-parsed: %+v", *llmF) + } +} + +// The LLM reviewing a story diff can block the story on its own — no scanner +// finding needed. Drives llmReviewDiff end-to-end through the block path. +func TestSecurityGate_ReviewStory_LLMDiffFindingBlocks(t *testing.T) { + kbPath := filepath.Join(t.TempDir(), "kb.json") + client := llm.NewReplayClient(llm.CompletionResponse{Content: llmCriticalFinding}) + g := newTestSecurityGate(t, client, kbPath, security.SeverityCritical, false, fakeScan()) + + passed, summary, err := g.ReviewStory(context.Background(), "s-1", "add user filter", "+ query := \"SELECT * FROM users WHERE name='\" + name + \"'\"", t.TempDir()) + if err != nil { + t.Fatalf("ReviewStory: %v", err) + } + if passed { + t.Fatal("critical LLM finding must block the story") + } + if !strings.Contains(summary, "SQL injection") { + t.Errorf("block summary must name the finding: %q", summary) + } + // The diff must have been sent to the model as data. + if client.CallCount() != 1 || !strings.Contains(client.CallAt(0).Messages[0].Content, "") { + t.Error("diff review prompt must wrap the diff in data tags") + } +} + +// An LLM failure must degrade to scanners-only, never abort the scan — the +// deterministic findings still make it into the report. +func TestSecurityGate_LLMFailureIsNonFatal(t *testing.T) { + kbPath := filepath.Join(t.TempDir(), "kb.json") + exhausted := llm.NewReplayClient() // zero responses ⇒ every call errors + scannerFinding := security.Finding{Tool: "gitleaks", RuleID: "aws", Severity: security.SeverityCritical, File: "x.env", Line: 3, Title: "AWS key", Source: "scanner"} + g := newTestSecurityGate(t, exhausted, kbPath, security.SeverityCritical, false, fakeScan(scannerFinding)) + + report, err := g.ScanRepo(context.Background(), t.TempDir()) + if err != nil { + t.Fatalf("LLM failure must not abort the scan: %v", err) + } + if report.Total() != 1 || report.Findings[0].Source != "scanner" { + t.Errorf("scanner findings must survive an LLM failure: %+v", report.Findings) + } +} + +// A prose-wrapped or garbage LLM response yields no findings rather than a +// crash or a phantom block. +func TestSecurityGate_ReviewStory_GarbageLLMResponsePasses(t *testing.T) { + kbPath := filepath.Join(t.TempDir(), "kb.json") + client := llm.NewReplayClient(llm.CompletionResponse{Content: "I could not review this code, sorry!"}) + g := newTestSecurityGate(t, client, kbPath, security.SeverityCritical, false, fakeScan()) + + passed, _, err := g.ReviewStory(context.Background(), "s-1", "t", "+ x", t.TempDir()) + if err != nil { + t.Fatalf("ReviewStory: %v", err) + } + if !passed { + t.Error("no parseable findings ⇒ story passes") + } +} + +// A corrupt knowledge base must abort both entry points loudly — silently +// falling back to the baseline would hide that learned rules were lost. +func TestSecurityGate_CorruptKBIsALoudError(t *testing.T) { + kbPath := filepath.Join(t.TempDir(), "kb.json") + if err := os.WriteFile(kbPath, []byte("{corrupt"), 0o600); err != nil { + t.Fatal(err) + } + g := newTestSecurityGate(t, nil, kbPath, security.SeverityCritical, false, fakeScan()) + + if _, err := g.ScanRepo(context.Background(), t.TempDir()); err == nil { + t.Error("ScanRepo must surface a corrupt KB") + } + if _, _, err := g.ReviewStory(context.Background(), "s-1", "t", "+x", t.TempDir()); err == nil { + t.Error("ReviewStory must surface a corrupt KB") + } +} + +// A KB that cannot be persisted must not break the scan — upskilling is +// best-effort; the findings and the report still stand. +func TestSecurityGate_UpskillPersistFailureIsNonFatal(t *testing.T) { + // A read-only KB file: LoadKnowledgeBase succeeds, Save's WriteFile fails. + kbPath := filepath.Join(t.TempDir(), "kb.json") + baseline, _ := json.Marshal(security.BaselineKnowledgeBase()) + if err := os.WriteFile(kbPath, baseline, 0o400); err != nil { + t.Fatal(err) + } + novel := security.Finding{Tool: "semgrep", RuleID: "custom.novel", Severity: security.SeverityHigh, File: "a.go", Line: 1, Title: "novel class", Source: "scanner"} + g := newTestSecurityGate(t, nil, kbPath, security.SeverityCritical, true, fakeScan(novel)) + + report, err := g.ScanRepo(context.Background(), t.TempDir()) + if err != nil { + t.Fatalf("persist failure must not abort the scan: %v", err) + } + if report.Total() != 1 { + t.Errorf("findings must survive a persist failure: %+v", report.Findings) + } +} + +func TestVulnClassID_FallbackChain(t *testing.T) { + cases := []struct { + name string + f security.Finding + want string + }{ + {"cwe-from-ruleid", security.Finding{RuleID: "CWE-89"}, "CWE-89"}, + {"cwe-from-detail", security.Finding{RuleID: "G304", Detail: "path traversal CWE-22 via input"}, "CWE-22"}, + {"category-when-no-cwe", security.Finding{RuleID: "", Category: "Insecure Design"}, "Insecure Design"}, + {"tool-rule-when-nothing-else", security.Finding{Tool: "semgrep", RuleID: "custom.rule"}, "semgrep:custom.rule"}, + {"empty-when-unclassifiable", security.Finding{}, ""}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + if got := vulnClassID(tc.f); got != tc.want { + t.Errorf("vulnClassID(%+v) = %q, want %q", tc.f, got, tc.want) + } + }) + } +} + +func TestCweOf_ExtractsAndBounds(t *testing.T) { + cases := []struct { + in security.Finding + want string + }{ + {security.Finding{RuleID: "CWE-798"}, "CWE-798"}, + {security.Finding{Detail: "maps to CWE-89: injection"}, "CWE-89"}, + {security.Finding{Category: "A03 CWE-79 XSS"}, "CWE-79"}, + {security.Finding{Detail: "CWE- with no digits"}, ""}, + {security.Finding{Title: "CWE-1 in title is ignored (title not scanned)"}, ""}, + } + for _, tc := range cases { + if got := cweOf(tc.in); got != tc.want { + t.Errorf("cweOf(%+v) = %q, want %q", tc.in, got, tc.want) + } + } +} + +func TestParseLLMFindings_WrappedAndInvalid(t *testing.T) { + if got := parseLLMFindings([]byte("")); got != nil { + t.Errorf("empty response ⇒ nil, got %v", got) + } + if got := parseLLMFindings([]byte("no json here at all")); got != nil { + t.Errorf("prose-only response ⇒ nil, got %v", got) + } + // Fence-wrapped array must still parse (extractJSON handles the wrapping). + wrapped := "Here are my findings:\n```json\n" + llmCriticalFinding + "\n```\nLet me know." + got := parseLLMFindings([]byte(wrapped)) + if len(got) != 1 || got[0].RuleID != "CWE-89" || got[0].Source != "llm" { + t.Errorf("fence-wrapped findings must parse: %+v", got) + } + // A JSON object (not array) is a shape mismatch ⇒ nil, logged, no crash. + if got := parseLLMFindings([]byte(`{"severity":"high"}`)); got != nil { + t.Errorf("non-array JSON ⇒ nil, got %v", got) + } +} diff --git a/internal/security/coverage_gaps_test.go b/internal/security/coverage_gaps_test.go new file mode 100644 index 0000000..2043e90 --- /dev/null +++ b/internal/security/coverage_gaps_test.go @@ -0,0 +1,241 @@ +package security + +import ( + "os" + "path/filepath" + "strings" + "testing" +) + +// Covers is the self-upskilling dedup: it decides whether the agent re-learns a +// vulnerability class it already ships guidance for. These cases pin the three +// match modes (rule ID, CWE alias, miss). +func TestCovers_RuleIDAndCWEAlias(t *testing.T) { + kb := BaselineKnowledgeBase() + + if !kb.Covers("A03:2021") { + t.Error("must cover by exact rule ID") + } + // A03's CWE field is CWE-89 — a finding classed as CWE-89 is already covered + // even though no rule has that ID... except the standalone CWE rules do. + if !kb.Covers("CWE-89") { + t.Error("must cover a class via a rule's CWE field") + } + if !kb.Covers("CWE-918") { + t.Error("SSRF is baseline (A10's CWE) and must be covered") + } + if kb.Covers("CWE-999999") { + t.Error("unknown class must NOT be covered — that is what triggers learning") + } + if kb.Covers("") { + t.Error("empty class id must not match anything (rules without CWE must not alias it)") + } +} + +func TestCovers_LearnedRuleExtendsCoverage(t *testing.T) { + kb := BaselineKnowledgeBase() + if kb.Covers("CWE-1333") { + t.Fatal("precondition: ReDoS not in baseline") + } + grown := kb.Add(VulnRule{ID: "CWE-1333", CWE: "CWE-1333", Title: "ReDoS", Detection: "x", Remediation: "y", Source: RuleLearned}) + if !grown.Covers("CWE-1333") { + t.Error("a learned rule must extend coverage") + } + if kb.Covers("CWE-1333") { + t.Error("Add must not mutate the receiver") + } +} + +func TestSave_ParentDirCreationFailure(t *testing.T) { + // A regular file where the parent dir should be makes MkdirAll fail. + obstruction := filepath.Join(t.TempDir(), "not-a-dir") + if err := os.WriteFile(obstruction, []byte("x"), 0o600); err != nil { + t.Fatal(err) + } + kb := BaselineKnowledgeBase() + err := kb.Save(filepath.Join(obstruction, "sub", "knowledge.json")) + if err == nil || !strings.Contains(err.Error(), "create knowledge dir") { + t.Errorf("want create-dir error, got %v", err) + } +} + +func TestSave_WriteFailure(t *testing.T) { + // Path IS a directory — WriteFile fails after MkdirAll succeeds. + dir := t.TempDir() + kb := BaselineKnowledgeBase() + err := kb.Save(dir) + if err == nil || !strings.Contains(err.Error(), "write knowledge base") { + t.Errorf("want write error, got %v", err) + } +} + +func TestLoadKnowledgeBase_CorruptJSON(t *testing.T) { + path := filepath.Join(t.TempDir(), "knowledge.json") + if err := os.WriteFile(path, []byte("{corrupt"), 0o600); err != nil { + t.Fatal(err) + } + if _, err := LoadKnowledgeBase(path); err == nil || !strings.Contains(err.Error(), "parse knowledge base") { + t.Errorf("corrupt KB must be a loud error, never silently replaced by the baseline: %v", err) + } +} + +func TestLoadKnowledgeBase_ReadError(t *testing.T) { + // Path is a directory: ReadFile errors but with something other than + // IsNotExist — must be surfaced, not treated as first-run. + if _, err := LoadKnowledgeBase(t.TempDir()); err == nil || !strings.Contains(err.Error(), "read knowledge base") { + t.Errorf("non-ENOENT read failure must be surfaced: %v", err) + } +} + +func TestDetectLanguages_ManifestSignals(t *testing.T) { + cases := []struct { + name string + files map[string]string + want []string + }{ + {"rust", map[string]string{"Cargo.toml": ""}, []string{"rust"}}, + {"php", map[string]string{"composer.json": "{}"}, []string{"php"}}, + {"ruby", map[string]string{"Gemfile": ""}, []string{"ruby"}}, + {"python", map[string]string{"pyproject.toml": ""}, []string{"python"}}, + {"js-without-tsconfig", map[string]string{"package.json": "{}"}, []string{"javascript"}}, + {"ts-beats-js", map[string]string{"package.json": "{}", "tsconfig.json": "{}"}, []string{"typescript"}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + got := DetectLanguages(seedRepo(t, tc.files)) + if len(got) != len(tc.want) { + t.Fatalf("want %v, got %v", tc.want, got) + } + for i := range tc.want { + if got[i] != tc.want[i] { + t.Fatalf("want %v, got %v", tc.want, got) + } + } + }) + } +} + +func TestDetectLanguages_ExtensionFallbackAndSkips(t *testing.T) { + repo := t.TempDir() + // A .ts source with no manifest establishes typescript by extension; a .js + // file must then NOT add javascript (the ts/js distinction is respected). + // node_modules content must be skipped entirely. + mustWrite := func(rel, content string) { + t.Helper() + p := filepath.Join(repo, rel) + if err := os.MkdirAll(filepath.Dir(p), 0o755); err != nil { + t.Fatal(err) + } + if err := os.WriteFile(p, []byte(content), 0o600); err != nil { + t.Fatal(err) + } + } + mustWrite("tsconfig.json", "{}") + mustWrite("package.json", "{}") + mustWrite("src/app.js", "x") // must not add javascript (ts already set) + mustWrite("deploy.sh", "#!/bin/sh") // shell via extension + mustWrite("node_modules/dep/index.rb", "puts") // must be skipped, no ruby + + got := DetectLanguages(repo) + set := map[string]bool{} + for _, l := range got { + set[l] = true + } + if !set["typescript"] || !set["shell"] { + t.Errorf("want typescript+shell, got %v", got) + } + if set["javascript"] { + t.Errorf("javascript must not be added when typescript is established: %v", got) + } + if set["ruby"] { + t.Errorf("node_modules must be skipped: %v", got) + } +} + +func TestParsers_InvalidJSONIsAnError(t *testing.T) { + garbage := []byte("PANIC not json") + if _, err := parseGosec(garbage, "/repo"); err == nil { + t.Error("parseGosec must surface invalid JSON") + } + if _, err := parseGitleaks(garbage, "/repo"); err == nil { + t.Error("parseGitleaks must surface invalid JSON") + } + if _, err := parseSemgrep(garbage, "/repo"); err == nil { + t.Error("parseSemgrep must surface invalid JSON") + } + if _, err := parseNpmAudit(garbage); err == nil { + t.Error("parseNpmAudit must surface invalid JSON") + } +} + +func TestParseGosec_LineRangeAndMissingCWE(t *testing.T) { + out := []byte(`{"Issues":[{"severity":"MEDIUM","rule_id":"G304","details":"x","file":"a.go","line":"12-14","cwe":{"id":""}}]}`) + fs, err := parseGosec(out, "/repo") + if err != nil || len(fs) != 1 { + t.Fatalf("parse: %v %v", fs, err) + } + if fs[0].Line != 12 { + t.Errorf("range line must take the start, got %d", fs[0].Line) + } + if fs[0].Detail != "" { + t.Errorf("empty CWE id must not render as 'CWE-', got %q", fs[0].Detail) + } +} + +func TestParseNpmAudit_FallsBackToMapKeyForName(t *testing.T) { + out := []byte(`{"vulnerabilities":{"left-pad":{"name":"","severity":"low","range":"*","via":[]}}}`) + fs, err := parseNpmAudit(out) + if err != nil || len(fs) != 1 { + t.Fatalf("parse: %v %v", fs, err) + } + if fs[0].RuleID != "npm:left-pad" { + t.Errorf("empty name must fall back to the map key, got %q", fs[0].RuleID) + } +} + +func TestParseGovulncheck_MalformedLinesSkipped(t *testing.T) { + out := []byte("Vulnerability #1 without colon\nVulnerability #2: \nVulnerability #3: GO-2025-999\n") + fs, err := parseGovulncheck(out) + if err != nil { + t.Fatal(err) + } + if len(fs) != 1 || fs[0].RuleID != "GO-2025-999" { + t.Errorf("malformed lines must be skipped, valid ones kept: %+v", fs) + } +} + +func TestSeverityString_AllValues(t *testing.T) { + want := map[Severity]string{ + SeverityCritical: "critical", + SeverityHigh: "high", + SeverityMedium: "medium", + SeverityLow: "low", + SeverityInfo: "info", + Severity(42): "info", // unknown ranks must not invent labels + } + for sev, label := range want { + if got := sev.String(); got != label { + t.Errorf("Severity(%d).String() = %q, want %q", sev, got, label) + } + } +} + +func TestFormatMarkdown_EmptyReportSaysNone(t *testing.T) { + r := Report{RepoDir: "/x", Languages: []string{"go"}, KBVersion: 1} + md := r.FormatMarkdown() + if !strings.Contains(md, "Scanners run: none") { + t.Errorf("empty scanner lists must render as 'none':\n%s", md) + } + if !strings.Contains(md, "0 total") { + t.Errorf("zero findings must be stated:\n%s", md) + } +} + +func TestRelPath_OutsideRepoKeptVerbatim(t *testing.T) { + if got := relPath("/repo", "/elsewhere/file.go"); !strings.Contains(got, "elsewhere") { + t.Errorf("paths outside the repo must not be mangled, got %q", got) + } + if got := relPath("/repo", "/repo/pkg/a.go"); got != "pkg/a.go" { + t.Errorf("in-repo paths become repo-relative, got %q", got) + } +} diff --git a/internal/security/scanners_exec_test.go b/internal/security/scanners_exec_test.go new file mode 100644 index 0000000..e5919d7 --- /dev/null +++ b/internal/security/scanners_exec_test.go @@ -0,0 +1,269 @@ +package security + +import ( + "context" + "os" + "path/filepath" + "runtime" + "strconv" + "strings" + "testing" +) + +// fakeTool installs an executable shell script named bin into dir that prints +// output and exits with code (verbatim — scanners use non-zero exits to mean +// "findings present", and some distinguish exit 2). The heredoc sentinel is +// collision-resistant so canned output can never terminate it early. +func fakeTool(t *testing.T, dir, bin, output string, code int) { + t.Helper() + if runtime.GOOS == "windows" { + t.Skip("fake-tool harness is POSIX-shell based") + } + const sentinel = "FAKE_SCANNER_OUTPUT_BOUNDARY_9f2c1d" + if strings.Contains(output, sentinel) { + t.Fatalf("canned output collides with the heredoc sentinel %s", sentinel) + } + script := "#!/bin/sh\ncat <<'" + sentinel + "'\n" + output + "\n" + sentinel + "\nexit " + strconv.Itoa(code) + "\n" + if err := os.WriteFile(filepath.Join(dir, bin), []byte(script), 0o755); err != nil { + t.Fatal(err) + } +} + +// seedRepo creates a repo dir whose manifests establish the given languages. +func seedRepo(t *testing.T, files map[string]string) string { + t.Helper() + dir := t.TempDir() + for name, content := range files { + if err := os.WriteFile(filepath.Join(dir, name), []byte(content), 0o600); err != nil { + t.Fatal(err) + } + } + return dir +} + +const fakeGosecOut = `{"Issues":[{"severity":"HIGH","rule_id":"G101","details":"Potential hardcoded credentials","file":"main.go","line":"7","cwe":{"id":"798"}}]}` + +const fakeGitleaksOut = `[{"Description":"AWS access key","File":"config.env","StartLine":3,"RuleID":"aws-access-key-id"}]` + +const fakeSemgrepOut = `{"results":[{"check_id":"go.lang.security.audit.sqli","path":"db.go","start":{"line":42},"extra":{"message":"SQL built from input","severity":"ERROR","metadata":{"cwe":["CWE-89"],"owasp":["A03:2021 - Injection"]}}}]}` + +const fakeGovulncheckOut = `Scanning your code and 42 packages across 7 dependent modules for known vulnerabilities... + +Vulnerability #1: GO-2024-1234 + A bad thing in some module.` + +const fakeNpmAuditOut = `{"vulnerabilities":{"lodash":{"name":"lodash","severity":"high","range":"<4.17.21","via":[]}}}` + +// installAllFakeScanners provisions every registry binary as a fake and points +// PATH at only that directory. Returns the bin dir for per-test overrides. +func installAllFakeScanners(t *testing.T) string { + t.Helper() + bin := t.TempDir() + fakeTool(t, bin, "gosec", fakeGosecOut, 1) + fakeTool(t, bin, "gitleaks", fakeGitleaksOut, 1) + fakeTool(t, bin, "semgrep", fakeSemgrepOut, 0) + fakeTool(t, bin, "govulncheck", fakeGovulncheckOut, 1) + fakeTool(t, bin, "npm", fakeNpmAuditOut, 1) + // Keep /bin:/usr/bin so the fake scripts can find cat; no real security + // scanner is ever installed there, so LookPath still resolves only fakes. + t.Setenv("PATH", bin+":/bin:/usr/bin") + return bin +} + +func TestRunScanners_EndToEnd_AllToolsParse(t *testing.T) { + installAllFakeScanners(t) + repo := seedRepo(t, map[string]string{ + "go.mod": "module example.com/x\n", + "package.json": "{}", + "tsconfig.json": "{}", + }) + + findings, ran, skipped, failed := RunScanners(context.Background(), repo) + + if len(skipped) != 0 { + t.Errorf("all tools installed — skipped must be empty, got %v", skipped) + } + if len(failed) != 0 { + t.Errorf("all tools parse — failed must be empty, got %v", failed) + } + if len(ran) != 5 { + t.Errorf("go+typescript repo makes all 5 scanners applicable, ran=%v", ran) + } + + byTool := map[string]Finding{} + for _, f := range findings { + byTool[f.Tool] = f + } + if f := byTool["gosec"]; f.RuleID != "G101" || f.Severity != SeverityHigh || f.Line != 7 || f.Detail != "CWE-798" { + t.Errorf("gosec finding mis-parsed: %+v", f) + } + if f := byTool["gitleaks"]; f.Severity != SeverityCritical || f.File != "config.env" { + t.Errorf("gitleaks finding mis-parsed: %+v", f) + } + if f := byTool["semgrep"]; f.Severity != SeverityHigh || f.Detail != "CWE-89" || f.Category == "" { + t.Errorf("semgrep finding mis-parsed: %+v", f) + } + if f := byTool["govulncheck"]; f.RuleID != "GO-2024-1234" { + t.Errorf("govulncheck finding mis-parsed: %+v", f) + } + if f := byTool["npm-audit"]; f.RuleID != "npm:lodash" || f.Severity != SeverityHigh { + t.Errorf("npm-audit finding mis-parsed: %+v", f) + } +} + +func TestRunScanners_FailedToolIsReportedNotSwallowed(t *testing.T) { + bin := installAllFakeScanners(t) + // gosec now emits garbage — a parse failure must land in `failed`, never in + // `ran`: a tool that failed to inspect the code must be distinguishable + // from one that found nothing. + fakeTool(t, bin, "gosec", "PANIC: not json", 1) + repo := seedRepo(t, map[string]string{"go.mod": "module example.com/x\n"}) + + findings, ran, skipped, failed := RunScanners(context.Background(), repo) + + if len(failed) != 1 || failed[0] != ScannerGosec { + t.Fatalf("want failed=[gosec], got %v", failed) + } + for _, k := range ran { + if k == ScannerGosec { + t.Error("a failed scanner must not be counted as ran") + } + } + if len(skipped) != 0 { + t.Errorf("nothing should be skipped, got %v", skipped) + } + // The other applicable tools still contribute findings (graceful degradation). + found := map[string]bool{} + for _, f := range findings { + found[f.Tool] = true + } + if !found["gitleaks"] || !found["govulncheck"] { + t.Errorf("other tools must keep scanning after one fails, got %v", found) + } +} + +func TestRunScanners_MissingToolsSkippedVisibly(t *testing.T) { + bin := t.TempDir() + fakeTool(t, bin, "gitleaks", fakeGitleaksOut, 1) // only gitleaks installed + t.Setenv("PATH", bin+":/bin:/usr/bin") + repo := seedRepo(t, map[string]string{"go.mod": "module example.com/x\n"}) + + _, ran, skipped, failed := RunScanners(context.Background(), repo) + + if len(ran) != 1 || ran[0] != ScannerGitleaks { + t.Errorf("want ran=[gitleaks], got %v", ran) + } + skippedSet := map[ScannerKind]bool{} + for _, k := range skipped { + skippedSet[k] = true + } + // Applicable-but-missing for a Go repo: semgrep (all langs), gosec, govulncheck. + for _, want := range []ScannerKind{ScannerSemgrep, ScannerGosec, ScannerGovulncheck} { + if !skippedSet[want] { + t.Errorf("missing tool %s must be reported as skipped, got %v", want, skipped) + } + } + // npm-audit is NOT applicable (no js/ts) — it must not appear anywhere. + if skippedSet[ScannerNpmAudit] { + t.Errorf("inapplicable scanner must not be listed as skipped: %v", skipped) + } + if len(failed) != 0 { + t.Errorf("nothing ran and errored, failed must be empty: %v", failed) + } +} + +func TestRunScanners_DedupesIdenticalFindings(t *testing.T) { + bin := t.TempDir() + // gitleaks reports the same secret twice (same rule, file, line). + dup := `[{"Description":"AWS access key","File":"config.env","StartLine":3,"RuleID":"aws-access-key-id"}, + {"Description":"AWS access key","File":"config.env","StartLine":3,"RuleID":"aws-access-key-id"}]` + fakeTool(t, bin, "gitleaks", dup, 1) + t.Setenv("PATH", bin+":/bin:/usr/bin") + repo := seedRepo(t, map[string]string{"README.md": "x"}) + + findings, _, _, _ := RunScanners(context.Background(), repo) + if len(findings) != 1 { + t.Errorf("identical findings must be deduped, got %d", len(findings)) + } +} + +func TestScannerRun_UnknownKindIsNoOp(t *testing.T) { + s := Scanner{Kind: ScannerKind("bogus"), Bin: "bogus"} + findings, err := s.Run(context.Background(), t.TempDir()) + if findings != nil || err != nil { + t.Errorf("unknown kind must be a no-op, got %v / %v", findings, err) + } +} + +func TestScannerRun_PerKindDispatch(t *testing.T) { + installAllFakeScanners(t) + repo := seedRepo(t, map[string]string{"go.mod": "module x\n"}) + + // Note: Scanner.Run dispatches on Kind with hardcoded binary names and + // never reads Bin — Bin serves only the LookPath availability checks in + // RunScanners/DetectScanners. Bin values here mirror allScanners() (the + // npm-audit scanner execs "npm", not "npm-audit"). + cases := []struct { + kind ScannerKind + bin string + wantTool string + }{ + {ScannerGosec, "gosec", "gosec"}, + {ScannerGitleaks, "gitleaks", "gitleaks"}, + {ScannerSemgrep, "semgrep", "semgrep"}, + {ScannerGovulncheck, "govulncheck", "govulncheck"}, + {ScannerNpmAudit, "npm", "npm-audit"}, + } + for _, tc := range cases { + fs, err := Scanner{Kind: tc.kind, Bin: tc.bin}.Run(context.Background(), repo) + if err != nil { + t.Errorf("%s: %v", tc.kind, err) + continue + } + if len(fs) != 1 || fs[0].Tool != tc.wantTool { + t.Errorf("%s: want one %s finding, got %+v", tc.kind, tc.wantTool, fs) + } + } +} + +func TestDetectScanners_CombinesLanguageAndAvailability(t *testing.T) { + bin := t.TempDir() + fakeTool(t, bin, "gosec", "", 0) + fakeTool(t, bin, "npm", "", 0) + t.Setenv("PATH", bin+":/bin:/usr/bin") + + // Go-only repo: gosec applies and is installed; npm is installed but not + // applicable; gitleaks/semgrep apply but are missing. + repo := seedRepo(t, map[string]string{"go.mod": "module x\n"}) + got := DetectScanners(repo) + if len(got) != 1 || got[0].Kind != ScannerGosec { + t.Errorf("want [gosec], got %v", got) + } +} + +func TestKnownScanners_RegistryComplete(t *testing.T) { + known := KnownScanners() + if len(known) != 5 { + t.Fatalf("registry drifted: want 5 scanners, got %d", len(known)) + } + bins := map[string]bool{} + for _, s := range known { + bins[s.Bin] = true + } + for _, want := range []string{"gosec", "govulncheck", "gitleaks", "semgrep", "npm"} { + if !bins[want] { + t.Errorf("registry missing %s", want) + } + } +} + +func TestInstallHint_EveryRegistryBinHasOne(t *testing.T) { + for _, s := range KnownScanners() { + if InstallHint(s.Bin) == "" { + t.Errorf("no install hint for %s — the preflight message would be blank", s.Bin) + } + } + if InstallHint("made-up-tool") != "" { + t.Error("unknown binary must return empty hint") + } +}