Security coverage to 95%+ (real tests) + embedded frontend-design skill#112
Merged
Conversation
… tests internal/security: 73.0% → 98.6%. engine/security_gate.go: → 95.2% (statements). The centerpiece is a fake-scanner harness (scanners_exec_test.go): executable shell scripts named gosec/gitleaks/semgrep/govulncheck/npm on a controlled PATH emit canned real-format output, so RunScanners' orchestration is exercised end-to-end — language applicability, PATH availability, per-tool parse, dedup, and the three-way ran/skipped/failed classification (a tool that emits garbage lands in failed, never in ran; missing tools are skipped visibly; inapplicable tools appear nowhere). Previously-uncovered load-bearing logic now pinned: - Covers (0% → 100%): the self-upskilling dedup — rule-ID match, CWE-alias match, unknown class triggers learning, learned rules extend coverage, immutability of the receiver. - Gate LLM paths (0% → 100%): llmReview/llmReviewDiff/callLLM via llm.ReplayClient — LLM findings merge with scanner findings, a critical diff finding blocks a story, the diff travels inside <diff> data tags, an LLM failure degrades to scanners-only, garbage responses pass cleanly. - vulnClassID fallback chain (28.6% → 100%) + cweOf extraction bounds. - Knowledge-base failure modes: corrupt KB is a loud error at both gate entry points (never silently replaced by the baseline); a read-only KB makes upskill persistence fail without aborting the scan. - DetectLanguages manifest branches (rust/php/ruby/python/ts-beats-js), extension fallback, node_modules skipping. - Parser error paths for all 5 tools, gosec line-range + missing-CWE, npm-audit map-key fallback, govulncheck malformed-line skipping. Remaining uncovered lines are error-log statements requiring failing event/projection stores — tracked, not gamed.
…n-intent planning + a distinctive-design brief vxd-built web UIs came out as generic AI-default design. The factory now carries a frontend-design skill at both ends of the pipeline: - agent.FrontendDesignBrief (internal/agent/frontend.go): single-source design-standards block synthesized from Anthropic's frontend-design skill and current anti-'AI slop' research — two-pass token-first process (palette/type/layout/signature plan, self-critique for genericness, code derived from the plan), named banned defaults (Inter/Roboto, purple-gradient-on-white, cream #F4F1EA + serif + terracotta, acid-green-on-black, the template feature-card page, scattered animation), a non-negotiable accessibility floor (responsive to 360px, visible keyboard focus, WCAG AA contrast, prefers-reduced-motion, 44px targets, designed empty/loading/error states, CSS specificity discipline), and copy-as-design-material rules. Size budget pinned (≤6 KB). - detectFrontend (engine/detect.go): owned-file extensions + whole-word UI vocabulary regex (word boundaries — 'pagination'/'performance'/'review' must not false-positive; pinned by 21 table cases). - Executor threads IsFrontend into BOTH the first-dispatch PromptContext and the retry TemplateContext; TestExecutor_WiresFrontendDetection guards the wire (dangling-wire pattern). - Planner ENGINEERING STANDARDS now require the first UI story to establish a design-token foundation consumed by later UI stories, and UI acceptance criteria to carry the accessibility floor. Docs: CLAUDE.md section + README feature bullet (+ stale '12 checks' corrected to 16). NXD port pending (offline-first mirror).
…anch
- detect.go: drop 'html' and 'responsive' from frontendKeywordRe — server-side
HTML (email reports, text/html rendering) and 'responsive API gateway' are
backend work; real UI stories carry .html in owned files or another keyword.
Three new table cases pin the distinction.
- scanners_exec_test.go: fakeTool now honors the exit code verbatim
(strconv.Itoa, not a 0/1 collapse) and uses a collision-resistant heredoc
sentinel with an explicit guard; TestScannerRun_PerKindDispatch uses the
real registry Bin values ('npm' for npm-audit) and documents that Run
dispatches on Kind with hardcoded binaries — Bin serves only LookPath.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two requested improvements, both TDD'd and independently reviewed.
1. Security coverage: 73.0% → 98.6% / 95.2% — real behavior pins, not checkbox tests
internal/security: 73.0% → 98.6%. Centerpiece: a fake-scanner harness (scanners_exec_test.go) — executable shell scripts namedgosec/gitleaks/semgrep/govulncheck/npmon a controlled PATH emit canned real-format tool output, drivingRunScannersend-to-end with zero network: language applicability, PATH availability, per-tool parsing, dedup, and the three-way ran/skipped/failed classification (a tool emitting garbage lands infailedand never masquerades as a clean run; missing tools are skipped visibly; inapplicable tools appear nowhere).engine/security_gate.go→ 95.2% (statements). All LLM review paths driven viallm.ReplayClient: LLM findings merge with scanner findings, a critical diff finding blocks a story, the diff is asserted to travel inside<diff>data tags (injection mitigation), an LLM failure degrades to scanners-only, garbage responses pass cleanly.Coverswas at 0% — the self-upskilling dedup that decides whether the agent re-learns a vulnerability class (rule-ID match, CWE alias, learning trigger, receiver immutability). Plus corrupt-KB-is-a-loud-error at both gate entry points, read-only-KB persist failure (best-effort, never aborts),vulnClassIDfallback chain,cweOfextraction bounds, language-manifest branches, and parser error paths for all 5 tools.2. Embedded frontend-design skill — vxd-built UIs stop looking AI-generated
Research: Anthropic's frontend-design skill (read in full) + a web-research sweep (Anthropic's design-skills blog, anti-"AI slop" guides, WCAG 2026 checklists, LLM UI stack analyses).
agent.FrontendDesignBrief— single-source design-standards block injected into every UI-facing story's goal prompt: two-pass token-first process (palette/type/layout/signature plan → self-critique for genericness → code derived from the plan), named banned defaults (Inter/Roboto, purple-gradient-on-white, cream#F4F1EA+serif+terracotta, acid-green-on-black, the template feature-card page, scattered animation), a non-negotiable accessibility floor (responsive to 360px, visible keyboard focus, WCAG AA contrast,prefers-reduced-motion, 44px touch targets, designed empty/loading/error states, CSS specificity discipline), and copy-as-design-material rules ("Save changes", never "Submit"). Size budget pinned at ≤6 KB.detectFrontend— owned-file extensions (strongest signal) + whole-word UI vocabulary regex. Word boundaries pinned by table tests: "pagination"≠page, "performance"≠form, "review"≠view; per review, "html" and "responsive" excluded (server-side HTML emails and "responsive API gateway" are backend work).TestExecutor_WiresFrontendDetection).Review
everything-claude-code:go-reviewer: approve, no CRITICAL/HIGH; 2 MEDIUM + 2 LOW all applied in the final commit (keyword false-positives removed with new negative cases; fake-tool harness exit-code fidelity + collision-resistant sentinel + honest Bin values). The suggestedstylekeyword was deliberately not added — it would trade a minor miss for "code style" false positives.Test plan
go test ./... -count=1— all packages passgo vet ./...clean;golangci-lint run— 0 issuesinternal/security98.6% coverage;security_gate.go95.2%