Security agent: self-upskilling scanner + LLM review embedded in vxd core#109
Merged
Conversation
…security) New package backing vxd's security agent: - knowledge.go: growable, JSON-persisted KnowledgeBase seeded with OWASP Top 10 (2021) + high-value CWEs (hardcoded secrets, path traversal, XSS), each with detection + remediation guidance. Add() is immutable + version-bumping + dedup-by-ID (the self-upskilling store). Checklist() renders for prompts. - scanners.go: orchestrates gosec/govulncheck/gitleaks/semgrep/npm-audit with language-aware applicability + PATH detection (graceful degrade). Pure parsers per tool turn real scanner output into Findings — no hallucinated vulns. - languages.go: manifest + extension language detection (ts vs js aware). - finding.go/severity.go/report.go: findings model, severity ranking with scanner-synonym parsing, dedup, and an operator-facing markdown report. TDD: 16 tests (KB roundtrip/immutability/lang-filter/checklist, scanner applicability, all 5 parsers against representative output, report counts/format). vet + golangci-lint clean.
…lf-upskilling) engine/security_gate.go — vxd's security agent, two entry points: - ScanRepo: standalone whole-repo scan (deterministic scanners ∪ LLM threat-model review against the KB checklist), emits SECURITY_SCAN_COMPLETED. - ReviewStory: per-story pre-merge gate; blocks when any finding meets/exceeds the configured gate severity; emits STORY_SECURITY_PASSED/FAILED. Continuous upskilling: confirmed high+ findings whose vuln CLASS (CWE, else OWASP category, else tool rule) isn't already covered are added to the knowledge base as learned rules (KnowledgeBase.Covers matches ID or CWE so OWASP-indexed baseline classes aren't re-learned), persisted, and announced via SECURITY_RULE_LEARNED — so every future build inherits classes found in past ones. New events STORY_SECURITY_PASSED/FAILED + SECURITY_SCAN_COMPLETED/RULE_LEARNED wired into the projection switch (TestProject_AllDeclaredEventsHandled passes). TDD: 7 tests (scan aggregation+event, block-on-critical, pass-below-threshold, self-upskill on new class, no-relearn known class, LLM findings parse). Injectable scan + now seams; nil client ⇒ scanner-only. vet + golangci-lint clean.
…, config, docs
- Pipeline: SecurityGate.ReviewStory runs per-story after QA, before merge
(monitor_post_execution.go). A finding >= gate severity PAUSES the requirement
(human decision) instead of escalating; a scanner failure never blocks merge.
Monitor.SetSecurityGate + resume.go wiring (TestResume_WiresSecurityGate).
- CLI: `vxd security scan [path]` (scanners + optional --llm review, --min for CI
exit code, --json) and `vxd security kb` (inspect baseline + learned rules).
- Forward-embedded: planner ENGINEERING STANDARDS now spells out the OWASP Top 10
so every planned story is designed secure; the live (growable) KB is enforced
at the per-story gate.
- Config: security.{disable_gate, gate_severity (default high), auto_learn
(default true), kb_path}; DefaultConfig seeds the defaults.
- Events STORY_SECURITY_PASSED/FAILED + SECURITY_SCAN_COMPLETED/RULE_LEARNED
projected (exhaustiveness guard passes).
- Docs: README config table + CLAUDE.md (CLI table, vxd.yaml block, events,
security-agent knowledge section). Doc-coverage tests pass.
Full suite (32 pkgs) + vet + golangci-lint (0 issues) green. Binary rebuilt.
…ld-usable) Validated the agent on real repos (Go/Python/TS): it surfaces real findings (gosec perms/path-traversal, semgrep CWE-89 SQLi patterns) and proved self-upskilling in production (KB grew v1→v5, learned CWE-190/338/367/400 from the vortex-dispatch scan). But gosec/semgrep HIGH severity is context-dependent (non-crypto rand in a Bayesian sampler, taint on operator-controlled $HOME paths, parameterized SQL flagged as concatenation) — gating builds on it would stall the pipeline on noise. Default security.gate_severity: high → critical. The per-story gate now pauses a build only on CRITICAL findings (leaked secrets via gitleaks, LLM-confirmed injection/hardcoded credentials) — high-signal where it counts. The standalone `vxd security scan` still reports high/medium (default --min high) for thorough audits, and operators can tighten the gate to "high". Docs updated. Full suite (32 pkgs) + vet + golangci-lint (0 issues) green. Binary rebuilt.
6 tasks
tzone85
added a commit
that referenced
this pull request
Jul 2, 2026
…r check (#111) * chore(security): dogfood scan hardening — pin GH Actions to SHAs + annotate accepted findings Ran vxd security scan on vxd itself (346 findings) and closed the high-severity set: - Pin all 14 GitHub Actions references in ci.yml to full commit SHAs (mutable-tag supply-chain class, CWE-1357) with version comments. - Annotate the 29 accepted-by-design findings with #nosec / nosemgrep and a one-line rationale each: sampler seed conversion + math/rand (statistical sampling, crypto seed), server shutdown contexts (fresh ctx after parent cancel is the graceful-shutdown idiom), G703 path-taint sites (paths derive from $HOME/worktrees inside the operator trust boundary), and the 15 dangerous-exec-command sites that ARE the orchestrator's core function (each with its upstream validation named). vxd security scan . now reports 0 high+ findings on its own tree, so the scan is usable as a self-gate (--min high) once CI billing is restored. * feat(preflight): security_scanners check — surface missing SAST/secret tools The per-story security gate degrades gracefully when a scanner binary is absent (skipped, never fatal), which left operators with no signal that scan coverage was reduced. New CheckSecurityScanners (WARNING tier) lists missing binaries from the security.KnownScanners registry with install hints (security.InstallHint) and joins AllChecks — vxd preflight now runs 16 checks. lookPath is injected for testability, matching CheckBinaryPath's pattern. 4 new tests including a dangling-wire guard (AllChecks must include the check). Docs updated (CLAUDE.md + README check counts, security-agent section). * fix(watch): vxd watch silently dropped every event for real requirements + test coverage restore Two matcher bugs made `vxd watch` a silent no-op tail in production: 1. Story events: eventMatchesReq compared evt.StoryID[:8] against reqID[:8], but story IDs are namespaced with storyIDPrefix(reqID) — sha256(reqID)[:8] for any reqID longer than 8 chars, which every real ULID reqID is. The prefixes never matched, so no story event ever printed. The matcher now uses the exported engine.StoryIDPrefix (single source of truth). 2. Requirement events: the code commented 'REQ_* events get routed via payload below' but no payload routing existed — REQ_SUBMITTED/PLANNED/COMPLETED/ BLOCKED never printed. Now matched via the payload req_id/id keys (the two keys real emitters use: planner uses 'id', the planning heartbeat 'req_id'). The old TestEventMatchesReq_PrefixMatch pinned the broken raw-prefix behavior and was replaced (test was wrong, not the spec). New tests: hashed-prefix match, short-reqID verbatim match, payload routing for both key spellings, cross-requirement rejection, and an end-to-end tailRequirementEvents run against real file+sqlite stores that pins print-and-exit-on-terminal. Also restores the internal/cli coverage regression from PR #109 (68.0% → 72.9%): the security scan/kb commands, dashboard status/stop daemon commands, and watch were all untested. New: security_test.go (9 tests — pure helpers, kb text/json, scan with empty PATH pinning graceful degradation + skipped- scanner reporting), dashboard_daemon_test.go (7 tests — not-running status, idempotent stop, malformed/stale pidfiles, watch unknown-req error). * fix(review): apply go-reviewer findings — complete G124 suppression, kill dead branch + inert assertions - auth.go: the cookie site had the nosemgrep half of the annotation but gosec G124 still fired; add the #nosec with the same rationale. - watch.go: drop the unreachable evt.StoryID == prefix branch — the planner always emits <prefix>-<suffix> IDs, so HasPrefix covers every real case. - checks_security_test.go: the installed-scanner negative assertions matched a substring ('missing: gitleaks') that could never occur in the real message format; assert on the bare scanner names so a regression can actually fire. --------- Co-authored-by: Thando Mini <tzone85@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A state-of-the-art, self-upskilling security agent built into vxd's core, so every build is reviewed for vulnerabilities and every future build inherits what past ones taught it.
Three layers
internal/security/(LLM-free, unit-tested) — a growableKnowledgeBaseseeded with the OWASP Top 10 (2021) + high-value CWEs (immutable/versionedAdd,Coversmatches ID or CWE,Checklistrenders for prompts; persisted at<state_dir>/security/knowledge.json). A scanner runner orchestrating gosec, govulncheck, gitleaks, semgrep, npm audit with language-aware applicability + PATH detection + graceful degrade (skipped tools are listed, never silently dropped); pure per-tool parsers → real findings (no hallucinations).engine/security_gate.goSecurityGate—ScanRepo(standalone) +ReviewStory(per-story pre-merge). Deterministic scanners ∪ optional LLM threat-model review against the KB. A finding ≥ gate severity pauses the requirement (human decision), never escalates. Self-upskilling: confirmed high+ findings of a new vuln class become learned KB rules (SECURITY_RULE_LEARNED).vxd security scan/vxd security kbCLI;security.*config.Calibration
Default
security.gate_severity= critical — the build-pausing gate fires only on leaked secrets / LLM-confirmed injection, not context-dependent SAST noise. Standalonevxd security scan(default--min high) stays thorough for audits.Proven live
Tests
internal/security/*_test.go(16) +engine/security_gate_test.go(7) + new events in the projection exhaustiveness guard +TestResume_WiresSecurityGate.go vetclean,golangci-lint0 issues.Follow-ups