Skip to content

Security agent: self-upskilling scanner + LLM review embedded in vxd core#109

Merged
tzone85 merged 4 commits into
mainfrom
feat/security-agent
Jun 26, 2026
Merged

Security agent: self-upskilling scanner + LLM review embedded in vxd core#109
tzone85 merged 4 commits into
mainfrom
feat/security-agent

Conversation

@tzone85

@tzone85 tzone85 commented Jun 26, 2026

Copy link
Copy Markdown
Owner

What

A state-of-the-art, self-upskilling security agent built into vxd's core, so every build is reviewed for vulnerabilities and every future build inherits what past ones taught it.

Three layers

  • internal/security/ (LLM-free, unit-tested) — a growable KnowledgeBase seeded with the OWASP Top 10 (2021) + high-value CWEs (immutable/versioned Add, Covers matches ID or CWE, Checklist renders for prompts; persisted at <state_dir>/security/knowledge.json). A scanner runner orchestrating gosec, govulncheck, gitleaks, semgrep, npm audit with language-aware applicability + PATH detection + graceful degrade (skipped tools are listed, never silently dropped); pure per-tool parsers → real findings (no hallucinations).
  • engine/security_gate.go SecurityGateScanRepo (standalone) + ReviewStory (per-story pre-merge). Deterministic scanners ∪ optional LLM threat-model review against the KB. A finding ≥ gate severity pauses the requirement (human decision), never escalates. Self-upskilling: confirmed high+ findings of a new vuln class become learned KB rules (SECURITY_RULE_LEARNED).
  • Forward-embedded in core — planner ENGINEERING STANDARDS now spell out the OWASP Top 10 (every story designed secure); per-story gate enforces the live KB; vxd security scan / vxd security kb CLI; security.* config.

Calibration

Default security.gate_severity = critical — the build-pausing gate fires only on leaked secrets / LLM-confirmed injection, not context-dependent SAST noise. Standalone vxd security scan (default --min high) stays thorough for audits.

Proven live

  • Self-upskilling fired in production — scanning this repo grew the KB v1→v5 (learned CWE-190/338/367/400).
  • Scanned the 20 most-recently-modified repos (Go/Python/TS/Rust/JS/PHP/Shell). crit=0 everywhere (no leaked secrets, no confirmed criticals).
  • Fixed what was real: 9 reachable Go stdlib CVEs across shiftsync (7) and bounty-dispatch (2) — toolchain bump to 1.26.4, govulncheck now clean (pushed to those repos). Every other "high" verified as a false positive (header-name G101, parameterized SQL, allowlisted property access) or gosec noise against already-hardened code.

Tests

  • internal/security/*_test.go (16) + engine/security_gate_test.go (7) + new events in the projection exhaustiveness guard + TestResume_WiresSecurityGate.
  • Full suite 32 pkgs green, go vet clean, golangci-lint 0 issues.

Follow-ups

  • NXD port (mirror to nexus-dispatch).
  • Optional: per-rule severity calibration map; LLM-triage on standalone scans by default.

tzone85 added 4 commits June 26, 2026 20:08
…security)

New package backing vxd's security agent:
- knowledge.go: growable, JSON-persisted KnowledgeBase seeded with OWASP Top 10
  (2021) + high-value CWEs (hardcoded secrets, path traversal, XSS), each with
  detection + remediation guidance. Add() is immutable + version-bumping +
  dedup-by-ID (the self-upskilling store). Checklist() renders for prompts.
- scanners.go: orchestrates gosec/govulncheck/gitleaks/semgrep/npm-audit with
  language-aware applicability + PATH detection (graceful degrade). Pure parsers
  per tool turn real scanner output into Findings — no hallucinated vulns.
- languages.go: manifest + extension language detection (ts vs js aware).
- finding.go/severity.go/report.go: findings model, severity ranking with
  scanner-synonym parsing, dedup, and an operator-facing markdown report.

TDD: 16 tests (KB roundtrip/immutability/lang-filter/checklist, scanner
applicability, all 5 parsers against representative output, report counts/format).
vet + golangci-lint clean.
…lf-upskilling)

engine/security_gate.go — vxd's security agent, two entry points:
- ScanRepo: standalone whole-repo scan (deterministic scanners ∪ LLM threat-model
  review against the KB checklist), emits SECURITY_SCAN_COMPLETED.
- ReviewStory: per-story pre-merge gate; blocks when any finding meets/exceeds
  the configured gate severity; emits STORY_SECURITY_PASSED/FAILED.

Continuous upskilling: confirmed high+ findings whose vuln CLASS (CWE, else
OWASP category, else tool rule) isn't already covered are added to the knowledge
base as learned rules (KnowledgeBase.Covers matches ID or CWE so OWASP-indexed
baseline classes aren't re-learned), persisted, and announced via
SECURITY_RULE_LEARNED — so every future build inherits classes found in past ones.

New events STORY_SECURITY_PASSED/FAILED + SECURITY_SCAN_COMPLETED/RULE_LEARNED
wired into the projection switch (TestProject_AllDeclaredEventsHandled passes).

TDD: 7 tests (scan aggregation+event, block-on-critical, pass-below-threshold,
self-upskill on new class, no-relearn known class, LLM findings parse). Injectable
scan + now seams; nil client ⇒ scanner-only. vet + golangci-lint clean.
…, config, docs

- Pipeline: SecurityGate.ReviewStory runs per-story after QA, before merge
  (monitor_post_execution.go). A finding >= gate severity PAUSES the requirement
  (human decision) instead of escalating; a scanner failure never blocks merge.
  Monitor.SetSecurityGate + resume.go wiring (TestResume_WiresSecurityGate).
- CLI: `vxd security scan [path]` (scanners + optional --llm review, --min for CI
  exit code, --json) and `vxd security kb` (inspect baseline + learned rules).
- Forward-embedded: planner ENGINEERING STANDARDS now spells out the OWASP Top 10
  so every planned story is designed secure; the live (growable) KB is enforced
  at the per-story gate.
- Config: security.{disable_gate, gate_severity (default high), auto_learn
  (default true), kb_path}; DefaultConfig seeds the defaults.
- Events STORY_SECURITY_PASSED/FAILED + SECURITY_SCAN_COMPLETED/RULE_LEARNED
  projected (exhaustiveness guard passes).
- Docs: README config table + CLAUDE.md (CLI table, vxd.yaml block, events,
  security-agent knowledge section). Doc-coverage tests pass.

Full suite (32 pkgs) + vet + golangci-lint (0 issues) green. Binary rebuilt.
…ld-usable)

Validated the agent on real repos (Go/Python/TS): it surfaces real findings
(gosec perms/path-traversal, semgrep CWE-89 SQLi patterns) and proved
self-upskilling in production (KB grew v1→v5, learned CWE-190/338/367/400 from
the vortex-dispatch scan). But gosec/semgrep HIGH severity is context-dependent
(non-crypto rand in a Bayesian sampler, taint on operator-controlled $HOME
paths, parameterized SQL flagged as concatenation) — gating builds on it would
stall the pipeline on noise.

Default security.gate_severity: high → critical. The per-story gate now pauses a
build only on CRITICAL findings (leaked secrets via gitleaks, LLM-confirmed
injection/hardcoded credentials) — high-signal where it counts. The standalone
`vxd security scan` still reports high/medium (default --min high) for thorough
audits, and operators can tighten the gate to "high". Docs updated.

Full suite (32 pkgs) + vet + golangci-lint (0 issues) green. Binary rebuilt.
@tzone85 tzone85 merged commit ebc4157 into main Jun 26, 2026
6 checks passed
@tzone85 tzone85 deleted the feat/security-agent branch June 26, 2026 19:48
tzone85 added a commit that referenced this pull request Jul 2, 2026
…r check (#111)

* chore(security): dogfood scan hardening — pin GH Actions to SHAs + annotate accepted findings

Ran vxd security scan on vxd itself (346 findings) and closed the high-severity set:

- Pin all 14 GitHub Actions references in ci.yml to full commit SHAs
  (mutable-tag supply-chain class, CWE-1357) with version comments.
- Annotate the 29 accepted-by-design findings with #nosec / nosemgrep and a
  one-line rationale each: sampler seed conversion + math/rand (statistical
  sampling, crypto seed), server shutdown contexts (fresh ctx after parent
  cancel is the graceful-shutdown idiom), G703 path-taint sites (paths derive
  from $HOME/worktrees inside the operator trust boundary), and the 15
  dangerous-exec-command sites that ARE the orchestrator's core function
  (each with its upstream validation named).

vxd security scan . now reports 0 high+ findings on its own tree, so the
scan is usable as a self-gate (--min high) once CI billing is restored.

* feat(preflight): security_scanners check — surface missing SAST/secret tools

The per-story security gate degrades gracefully when a scanner binary is
absent (skipped, never fatal), which left operators with no signal that scan
coverage was reduced. New CheckSecurityScanners (WARNING tier) lists missing
binaries from the security.KnownScanners registry with install hints
(security.InstallHint) and joins AllChecks — vxd preflight now runs 16 checks.

lookPath is injected for testability, matching CheckBinaryPath's pattern.
4 new tests including a dangling-wire guard (AllChecks must include the check).
Docs updated (CLAUDE.md + README check counts, security-agent section).

* fix(watch): vxd watch silently dropped every event for real requirements + test coverage restore

Two matcher bugs made `vxd watch` a silent no-op tail in production:

1. Story events: eventMatchesReq compared evt.StoryID[:8] against reqID[:8],
   but story IDs are namespaced with storyIDPrefix(reqID) — sha256(reqID)[:8]
   for any reqID longer than 8 chars, which every real ULID reqID is. The
   prefixes never matched, so no story event ever printed. The matcher now
   uses the exported engine.StoryIDPrefix (single source of truth).
2. Requirement events: the code commented 'REQ_* events get routed via payload
   below' but no payload routing existed — REQ_SUBMITTED/PLANNED/COMPLETED/
   BLOCKED never printed. Now matched via the payload req_id/id keys (the two
   keys real emitters use: planner uses 'id', the planning heartbeat 'req_id').

The old TestEventMatchesReq_PrefixMatch pinned the broken raw-prefix behavior
and was replaced (test was wrong, not the spec). New tests: hashed-prefix
match, short-reqID verbatim match, payload routing for both key spellings,
cross-requirement rejection, and an end-to-end tailRequirementEvents run
against real file+sqlite stores that pins print-and-exit-on-terminal.

Also restores the internal/cli coverage regression from PR #109 (68.0% →
72.9%): the security scan/kb commands, dashboard status/stop daemon commands,
and watch were all untested. New: security_test.go (9 tests — pure helpers,
kb text/json, scan with empty PATH pinning graceful degradation + skipped-
scanner reporting), dashboard_daemon_test.go (7 tests — not-running status,
idempotent stop, malformed/stale pidfiles, watch unknown-req error).

* fix(review): apply go-reviewer findings — complete G124 suppression, kill dead branch + inert assertions

- auth.go: the cookie site had the nosemgrep half of the annotation but gosec
  G124 still fired; add the #nosec with the same rationale.
- watch.go: drop the unreachable evt.StoryID == prefix branch — the planner
  always emits <prefix>-<suffix> IDs, so HasPrefix covers every real case.
- checks_security_test.go: the installed-scanner negative assertions matched
  a substring ('missing: gitleaks') that could never occur in the real message
  format; assert on the bare scanner names so a regression can actually fire.

---------

Co-authored-by: Thando Mini <tzone85@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant