Skip to content

Security coverage to 95%+ (real tests) + embedded frontend-design skill#112

Merged
tzone85 merged 3 commits into
mainfrom
feat/security-coverage-frontend-skill
Jul 2, 2026
Merged

Security coverage to 95%+ (real tests) + embedded frontend-design skill#112
tzone85 merged 3 commits into
mainfrom
feat/security-coverage-frontend-skill

Conversation

@tzone85

@tzone85 tzone85 commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

Two requested improvements, both TDD'd and independently reviewed.

1. Security coverage: 73.0% → 98.6% / 95.2% — real behavior pins, not checkbox tests

  • internal/security: 73.0% → 98.6%. Centerpiece: a fake-scanner harness (scanners_exec_test.go) — executable shell scripts named gosec/gitleaks/semgrep/govulncheck/npm on a controlled PATH emit canned real-format tool output, driving RunScanners end-to-end with zero network: language applicability, PATH availability, per-tool parsing, dedup, and the three-way ran/skipped/failed classification (a tool emitting garbage lands in failed and never masquerades as a clean run; missing tools are skipped visibly; inapplicable tools appear nowhere).
  • engine/security_gate.go → 95.2% (statements). All LLM review paths driven via llm.ReplayClient: LLM findings merge with scanner findings, a critical diff finding blocks a story, the diff is asserted to travel inside <diff> data tags (injection mitigation), an LLM failure degrades to scanners-only, garbage responses pass cleanly.
  • Previously-uncovered load-bearing logic now pinned: Covers was at 0% — the self-upskilling dedup that decides whether the agent re-learns a vulnerability class (rule-ID match, CWE alias, learning trigger, receiver immutability). Plus corrupt-KB-is-a-loud-error at both gate entry points, read-only-KB persist failure (best-effort, never aborts), vulnClassID fallback chain, cweOf extraction bounds, language-manifest branches, and parser error paths for all 5 tools.
  • Remaining uncovered lines are error-log statements requiring failing event/projection stores — acknowledged, not gamed.

2. Embedded frontend-design skill — vxd-built UIs stop looking AI-generated

Research: Anthropic's frontend-design skill (read in full) + a web-research sweep (Anthropic's design-skills blog, anti-"AI slop" guides, WCAG 2026 checklists, LLM UI stack analyses).

  • agent.FrontendDesignBrief — single-source design-standards block injected into every UI-facing story's goal prompt: two-pass token-first process (palette/type/layout/signature plan → self-critique for genericness → code derived from the plan), named banned defaults (Inter/Roboto, purple-gradient-on-white, cream #F4F1EA+serif+terracotta, acid-green-on-black, the template feature-card page, scattered animation), a non-negotiable accessibility floor (responsive to 360px, visible keyboard focus, WCAG AA contrast, prefers-reduced-motion, 44px touch targets, designed empty/loading/error states, CSS specificity discipline), and copy-as-design-material rules ("Save changes", never "Submit"). Size budget pinned at ≤6 KB.
  • detectFrontend — owned-file extensions (strongest signal) + whole-word UI vocabulary regex. Word boundaries pinned by table tests: "pagination"≠page, "performance"≠form, "review"≠view; per review, "html" and "responsive" excluded (server-side HTML emails and "responsive API gateway" are backend work).
  • Threaded into both dispatch paths (first attempt + retry) with a dangling-wire source-scan guard (TestExecutor_WiresFrontendDetection).
  • Planner: ENGINEERING STANDARDS now require the first UI story to establish a design-token foundation (CSS custom properties / framework theme) consumed by later UI stories, and UI acceptance criteria to carry the accessibility floor.

Review

everything-claude-code:go-reviewer: approve, no CRITICAL/HIGH; 2 MEDIUM + 2 LOW all applied in the final commit (keyword false-positives removed with new negative cases; fake-tool harness exit-code fidelity + collision-resistant sentinel + honest Bin values). The suggested style keyword was deliberately not added — it would trade a minor miss for "code style" false positives.

Test plan

  • go test ./... -count=1 — all packages pass
  • go vet ./... clean; golangci-lint run — 0 issues
  • internal/security 98.6% coverage; security_gate.go 95.2%
  • Doc-coverage wiring tests pass (CLAUDE.md section + README bullet added; stale "12 checks" corrected to 16)
  • NXD port noted as pending in CLAUDE.md

tzone85 added 3 commits July 2, 2026 12:26
… tests

internal/security: 73.0% → 98.6%. engine/security_gate.go: → 95.2% (statements).

The centerpiece is a fake-scanner harness (scanners_exec_test.go): executable
shell scripts named gosec/gitleaks/semgrep/govulncheck/npm on a controlled
PATH emit canned real-format output, so RunScanners' orchestration is
exercised end-to-end — language applicability, PATH availability, per-tool
parse, dedup, and the three-way ran/skipped/failed classification (a tool
that emits garbage lands in failed, never in ran; missing tools are skipped
visibly; inapplicable tools appear nowhere).

Previously-uncovered load-bearing logic now pinned:
- Covers (0% → 100%): the self-upskilling dedup — rule-ID match, CWE-alias
  match, unknown class triggers learning, learned rules extend coverage,
  immutability of the receiver.
- Gate LLM paths (0% → 100%): llmReview/llmReviewDiff/callLLM via
  llm.ReplayClient — LLM findings merge with scanner findings, a critical
  diff finding blocks a story, the diff travels inside <diff> data tags,
  an LLM failure degrades to scanners-only, garbage responses pass cleanly.
- vulnClassID fallback chain (28.6% → 100%) + cweOf extraction bounds.
- Knowledge-base failure modes: corrupt KB is a loud error at both gate
  entry points (never silently replaced by the baseline); a read-only KB
  makes upskill persistence fail without aborting the scan.
- DetectLanguages manifest branches (rust/php/ruby/python/ts-beats-js),
  extension fallback, node_modules skipping.
- Parser error paths for all 5 tools, gosec line-range + missing-CWE,
  npm-audit map-key fallback, govulncheck malformed-line skipping.

Remaining uncovered lines are error-log statements requiring failing
event/projection stores — tracked, not gamed.
…n-intent planning + a distinctive-design brief

vxd-built web UIs came out as generic AI-default design. The factory now
carries a frontend-design skill at both ends of the pipeline:

- agent.FrontendDesignBrief (internal/agent/frontend.go): single-source
  design-standards block synthesized from Anthropic's frontend-design skill
  and current anti-'AI slop' research — two-pass token-first process
  (palette/type/layout/signature plan, self-critique for genericness, code
  derived from the plan), named banned defaults (Inter/Roboto,
  purple-gradient-on-white, cream #F4F1EA + serif + terracotta,
  acid-green-on-black, the template feature-card page, scattered animation),
  a non-negotiable accessibility floor (responsive to 360px, visible keyboard
  focus, WCAG AA contrast, prefers-reduced-motion, 44px targets, designed
  empty/loading/error states, CSS specificity discipline), and
  copy-as-design-material rules. Size budget pinned (≤6 KB).
- detectFrontend (engine/detect.go): owned-file extensions + whole-word UI
  vocabulary regex (word boundaries — 'pagination'/'performance'/'review'
  must not false-positive; pinned by 21 table cases).
- Executor threads IsFrontend into BOTH the first-dispatch PromptContext and
  the retry TemplateContext; TestExecutor_WiresFrontendDetection guards the
  wire (dangling-wire pattern).
- Planner ENGINEERING STANDARDS now require the first UI story to establish
  a design-token foundation consumed by later UI stories, and UI acceptance
  criteria to carry the accessibility floor.

Docs: CLAUDE.md section + README feature bullet (+ stale '12 checks'
corrected to 16). NXD port pending (offline-first mirror).
…anch

- detect.go: drop 'html' and 'responsive' from frontendKeywordRe — server-side
  HTML (email reports, text/html rendering) and 'responsive API gateway' are
  backend work; real UI stories carry .html in owned files or another keyword.
  Three new table cases pin the distinction.
- scanners_exec_test.go: fakeTool now honors the exit code verbatim
  (strconv.Itoa, not a 0/1 collapse) and uses a collision-resistant heredoc
  sentinel with an explicit guard; TestScannerRun_PerKindDispatch uses the
  real registry Bin values ('npm' for npm-audit) and documents that Run
  dispatches on Kind with hardcoded binaries — Bin serves only LookPath.
@tzone85 tzone85 merged commit e2f3218 into main Jul 2, 2026
5 checks passed
@tzone85 tzone85 deleted the feat/security-coverage-frontend-skill branch July 2, 2026 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant