Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,13 @@ A self-upskilling security agent embedded in vxd's core so every build is review
- **Config:** `security.disable_gate` (default false=ON), `security.gate_severity`, `security.auto_learn` (default true), `security.kb_path`. **Events:** `STORY_SECURITY_PASSED/FAILED`, `SECURITY_SCAN_COMPLETED`, `SECURITY_RULE_LEARNED` (all in the projection switch; `TestProject_AllDeclaredEventsHandled` guards exhaustiveness).
- **Tests:** `internal/security/*_test.go` (16: KB roundtrip/immutability/lang-filter/checklist/Covers, scanner applicability, all 5 parsers, report) + `engine/security_gate_test.go` (7: scan aggregation+event, block-on-critical, pass-below-threshold, self-upskill on new class, no-relearn known class, LLM-findings parse). Host scanners installed (gosec, govulncheck, gitleaks, semgrep); the `security_scanners` preflight check (`CheckSecurityScanners`, WARNING tier) reports any that go missing, with install hints from `security.InstallHint`. **NXD port pending.**

### Frontend design skill (agent/frontend.go + engine/detect.go, 2026-07-02)
vxd-built web UIs previously came out as generic "AI slop" (Inter font, purple gradients, template hero + three feature cards). The factory now carries an embedded frontend-design skill so UI-facing stories are both *planned* and *implemented* with design intent:
- **`agent.FrontendDesignBrief`** — a single-source design-standards block (adapted from Anthropic's frontend-design skill + current anti-slop research): two-pass token-first process (palette/type/layout/signature plan, self-critique for genericness, then code derived from the plan), named banned defaults (Inter/Roboto, purple-gradient-on-white, cream `#F4F1EA`+serif+terracotta, acid-green-on-black, the template feature-card page, scattered animations), a non-negotiable accessibility floor (responsive to 360px, visible keyboard focus, WCAG AA contrast, `prefers-reduced-motion`, 44px touch targets, designed empty/loading/error states, CSS specificity discipline), and copy-as-design-material rules (real product copy, "Save changes" never "Submit"). Size pinned by `TestFrontendDesignBrief_SizeBudget` (≤6 KB — it rides on every UI dispatch).
- **Detection:** `detectFrontend(title, description, ownedFiles)` (`engine/detect.go`) — owned-file extensions (`.tsx/.jsx/.vue/.svelte/.css/.html/...`) are the strongest signal, plus whole-word UI vocabulary (`frontendKeywordRe`; word boundaries so "pagination"/"performance"/"review" don't false-positive). Sets `PromptContext.IsFrontend`/`TemplateContext.IsFrontend` in the executor for BOTH the first dispatch and the retry path (`TestExecutor_WiresFrontendDetection` guards the wire).
- **Planning:** the Tech-Lead ENGINEERING STANDARDS block now requires the FIRST UI story to establish a design-token foundation (palette/typeface-pairing/spacing as CSS custom properties or the framework theme) with later UI stories consuming those tokens, and UI acceptance criteria to include the accessibility quality floor (`TestPlanner_PromptIncludesEngineeringStandards` pins it).
- **Tests:** `agent/frontend_test.go` (brief injected only when flagged, retry path carries it, size budget), `engine/detect_test.go::TestDetectFrontend` (21 cases incl. substring traps). **NXD port pending.**

### Model ID Compatibility
- **Use undated aliases, not dated snapshots.** Current defaults: `claude-opus-4-8` (tech_lead), `claude-sonnet-4-6` (senior/qa/manager), `claude-haiku-4-5` (cheapest). All three are verified working on the Claude CLI subscription tier.
- **Default execution tiers are all-Anthropic (2026-06-24 fix).** `DefaultConfig` previously set junior/intermediate/supervisor to `{google, gemma-4-27b-it}` — a model that 404s on the Google AI API (it does not exist on `v1beta`). Every low-complexity story spawned a gemini agent that died in ~10s producing no code, then limped forward by escalating to senior. Defaults are now `{anthropic, claude-haiku-4-5}` so a fresh install works with only the Claude CLI configured (no Google AI key/quota). `TestDefaultConfig_NoInvalidJuniorModel` pins this. **A model 404 in the agent runtime surfaces as "agent produced no code changes," NOT as a model error — if a whole tier silently produces nothing, validate the model ID with `gemini -m <id> -p OK` / `claude --model <id> -p OK` first.**
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,11 +193,12 @@ vhs docs/demo.tape
- **Smart retry with error analysis** -- 8 error categories with targeted fix suggestions passed to retry agents
- **Human review gates** -- three modes (auto, plan_only, manual) for plan approval and PR review
- **Crash recovery** -- lock files, checkpoints, and consistency checks for resuming after process death
- **Pre-flight validation** -- 12 environment checks across 3 severity tiers before pipeline execution
- **Pre-flight validation** -- 16 environment checks across 3 severity tiers before pipeline execution
- **Cost estimation** -- quick heuristic and LLM-based estimation with Fibonacci-to-hours mapping
- **Watchdog monitoring** -- stuck detection, permission bypass, context freshness checks
- **Supervisor oversight** -- periodic drift detection and reprioritization
- **Senior code review** -- automated review via LLM with approve/request-changes verdicts
- **Frontend design skill** -- UI-facing stories are detected (owned files + story text) and their agents receive an embedded design brief: token-first planning, one signature element, named anti-"AI slop" defaults banned, and a WCAG accessibility floor; the planner requires a design-token foundation story for web UIs
- **Automated QA pipeline** -- lint, build, and test with declarative success criteria (6 kinds)
- **Auto-merge with PR creation** -- stories flow from code to merged PR hands-free
- **LLM-powered conflict resolution** -- rebase conflicts auto-resolved; binary files handled without LLM (deterministic policy); complex/multi-file conflicts escalate to Tech Lead with full requirement DAG context
Expand Down
67 changes: 67 additions & 0 deletions internal/agent/frontend.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
package agent

// FrontendDesignBrief is vxd's frontend design skill: the standards block
// injected into the goal prompt of every UI-facing story (PromptContext.
// IsFrontend, detected in the engine from owned files + story text).
//
// Synthesized from Anthropic's frontend-design skill and current guidance on
// avoiding generic AI-generated design ("AI slop"): token-first planning, one
// signature element, named anti-pattern looks, real copy, and a non-negotiable
// accessibility floor. Content lives in one const so the planner, the goal
// prompt, and tests share a single source of truth. Keep it under the size
// budget pinned by TestFrontendDesignBrief_SizeBudget — it rides on every
// UI-story dispatch.
const FrontendDesignBrief = `
## FRONTEND DESIGN — MANDATORY STANDARDS

You are also the design lead for this UI. The client rejects anything that
looks templated. Make deliberate, opinionated choices specific to THIS
product, its audience, and this page's single job.

### Two-pass process (plan tokens BEFORE code)
1. Write a compact design-token plan first, as a comment block or DESIGN.md:
- Palette: 4-6 named hex values — one dominant color creating atmosphere,
one sharp accent. Not evenly-distributed timid pastels.
- Type: 2+ roles — a characterful display face used with restraint, a
complementary body face (never the same family you'd pick for any other
project), optional utility face for data.
- Layout: one-sentence concept. Structure must encode something true about
the content (numbered markers only if the content really is a sequence).
- Signature: the ONE element this page will be remembered by. Spend your
boldness there; keep everything around it quiet and disciplined.
2. Critique the plan before coding: if any part is what you would produce for
ANY similar brief, it is a default, not a choice — revise it. Then write
the code deriving every color and type decision from the plan. Encode the
tokens once (CSS custom properties or the Tailwind theme), never ad-hoc
per component.

### Banned defaults (these read as AI-generated)
- Fonts: Inter, Roboto, Open Sans, Lato, Arial, bare system-ui as the design.
- Purple/blue-purple gradients on white; emerald or acid-green single accent
on near-black; warm cream #F4F1EA + serif display + terracotta accent;
broadsheet hairlines with zero border-radius EVERYWHERE. All four are
legitimate only if the brief explicitly asks for them.
- The template page: gradient hero → vague centered headline → three feature
cards with icons → testimonials → footer. Uniform 16px-radius cards.
- Scattered animations. One orchestrated moment (a page-load sequence or a
scroll reveal) beats effects everywhere; extra motion reads as generated.

### Quality floor (non-negotiable, never announced in the UI)
- Responsive down to 360px wide; no horizontal scroll, no overlapping text.
- Visible keyboard focus on every interactive element (focus-visible ring).
- prefers-reduced-motion respected: gate every animation on it.
- WCAG AA contrast: 4.5:1 body text, 3:1 large text and UI components.
- Touch targets at least 44x44px. Semantic HTML (nav/main/button, alt text,
labels tied to inputs) — a div with onClick is not a button.
- Empty, loading, and error states designed, not defaulted.
- Watch CSS specificity: section-level and element-level spacing rules that
cancel each other are the classic generated-CSS failure.

### Copy is design material
Write real copy for THIS product — never lorem ipsum or vague marketing
lines. Buttons say exactly what happens ("Save changes", never "Submit");
the same action keeps the same name through the whole flow. Errors state
what went wrong and how to fix it, without apologizing. Name things by what
the user controls ("notifications"), not how the system is built ("webhook
config"). Active voice, sentence case, no filler.
`
83 changes: 83 additions & 0 deletions internal/agent/frontend_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
package agent

import (
"strings"
"testing"
)

// The frontend design brief is injected into the goal prompt only for
// UI-facing stories (ctx.IsFrontend). It is the vxd factory's design skill:
// token-first planning, one signature element, the named anti-pattern looks,
// and a non-negotiable accessibility floor.
func TestGoalPrompt_FrontendBriefInjectedWhenFlagSet(t *testing.T) {
ctx := PromptContext{
StoryID: "s-1",
StoryTitle: "Build the landing page",
StoryDescription: "Marketing page for the product",
AcceptanceCriteria: "- page renders",
IsFrontend: true,
}
got := GoalPrompt(RoleSenior, ctx)

for _, want := range []string{
"FRONTEND DESIGN", // section header
"Signature", // one memorable element
"token", // token-first plan before code
"prefers-reduced-motion", // quality floor
"focus", // visible keyboard focus
"Inter", // named anti-pattern font
"purple", // named anti-pattern palette
"#F4F1EA", // named second-generation cliché
"WCAG", // contrast floor
"Submit", // copy rule: never label a button Submit
} {
if !strings.Contains(got, want) {
t.Errorf("frontend brief missing %q", want)
}
}
}

func TestGoalPrompt_FrontendBriefAbsentForBackendStories(t *testing.T) {
ctx := PromptContext{
StoryID: "s-2",
StoryTitle: "Create REST API endpoints",
StoryDescription: "Express routes",
AcceptanceCriteria: "- routes tested",
IsFrontend: false,
}
got := GoalPrompt(RoleSenior, ctx)
if strings.Contains(got, "FRONTEND DESIGN") {
t.Error("backend story must not carry the frontend design brief")
}
}

// Retry dispatches go through RenderGoalWithAttempts — the brief must survive
// the retry path too, or the second attempt regresses to default design.
func TestRenderGoalWithAttempts_CarriesFrontendBrief(t *testing.T) {
ctx := TemplateContext{
StoryID: "s-1",
StoryTitle: "Build the landing page",
StoryDescription: "Marketing page",
AcceptanceCriteria: "- page renders",
IsFrontend: true,
IsRetry: true,
RetryNumber: 2,
ReviewFeedback: "colors are generic",
PriorAttempts: []AttemptSummary{{Number: 1, Role: "senior", Outcome: "review_failed"}},
}
got := RenderGoalWithAttempts(ctx)
if !strings.Contains(got, "FRONTEND DESIGN") {
t.Error("retry path must carry the frontend design brief")
}
}

// The brief itself must stay within a sane token budget — it rides on every
// UI story dispatch. ~6k chars ≈ 1.5k tokens is the ceiling.
func TestFrontendDesignBrief_SizeBudget(t *testing.T) {
if n := len(FrontendDesignBrief); n > 6000 {
t.Errorf("FrontendDesignBrief is %d chars — trim it below 6000 (prompt budget)", n)
}
if n := len(FrontendDesignBrief); n < 1500 {
t.Errorf("FrontendDesignBrief is %d chars — suspiciously small, did the content get lost?", n)
}
}
5 changes: 5 additions & 0 deletions internal/agent/prompts.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ type PromptContext struct {
IsExistingCodebase bool // true when working on a client's existing repo
IsBugFix bool // true when the story is about fixing a bug
IsInfrastructure bool // true when the story involves Docker/CI/deployment
IsFrontend bool // true when the story builds/changes a user-facing web UI
WaveContext string // summary of what prior stories built (from WAVE_CONTEXT.md)
DesignApproach string // "ddd-tdd" (default), "tdd", "standard"
}
Expand Down Expand Up @@ -116,6 +117,10 @@ BUG FIX — MANDATORY WORKFLOW:
5. VERIFY: Failing test now passes. Full test suite still passes. No regressions.`
}

if ctx.IsFrontend {
base += "\n" + FrontendDesignBrief
}

if ctx.IsInfrastructure {
base += `

Expand Down
2 changes: 2 additions & 0 deletions internal/agent/render.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ type TemplateContext struct {
IsExistingCodebase bool
IsBugFix bool
IsInfrastructure bool
IsFrontend bool // true when the story builds/changes a user-facing web UI
IsRetry bool // true if this is not the first attempt
RetryNumber int // which attempt this is (1-indexed)
}
Expand Down Expand Up @@ -78,6 +79,7 @@ func RenderGoalWithAttempts(ctx TemplateContext) string {
IsExistingCodebase: ctx.IsExistingCodebase,
IsBugFix: ctx.IsBugFix,
IsInfrastructure: ctx.IsInfrastructure,
IsFrontend: ctx.IsFrontend,
WaveContext: ctx.WaveContext,
}

Expand Down
39 changes: 39 additions & 0 deletions internal/engine/detect.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"os"
"os/exec"
"path/filepath"
"regexp"
"strings"
)

Expand Down Expand Up @@ -76,6 +77,44 @@ func detectBugFix(title, description string) bool {
return false
}

// frontendFileExts are file extensions that mark a story as UI-facing when they
// appear in its owned files.
var frontendFileExts = map[string]bool{
".tsx": true, ".jsx": true, ".vue": true, ".svelte": true,
".css": true, ".scss": true, ".sass": true, ".less": true,
".html": true, ".astro": true,
}

// detectFrontend checks if the story builds or changes a user-facing web UI.
// This triggers the FrontendDesignBrief injection (agent.FrontendDesignBrief)
// so agents produce distinctive, accessible frontends instead of generic
// AI-default design. Detection combines owned-file extensions (strongest
// signal) with title/description keywords.
func detectFrontend(title, description string, ownedFiles []string) bool {
for _, f := range ownedFiles {
if frontendFileExts[strings.ToLower(filepath.Ext(f))] {
return true
}
}
return frontendKeywordRe.MatchString(strings.ToLower(title + " " + description))
}

// frontendKeywordRe matches UI vocabulary as whole words only — plain substring
// matching trips on "pagination" (page), "performance" (form), "review" (view).
// Deliberately absent: "html" (server-side HTML emails/reports are backend
// work; real UI stories carry .html in owned files or another keyword) and
// "responsive" ("responsive API gateway" means fast, not responsive design).
// Hoisted to package level so detection never recompiles it (perf convention).
var frontendKeywordRe = regexp.MustCompile(`\b(` +
`frontend|front-end|ui|ux|user interface|` +
`landing page|page|screen|view|component|widget|` +
`dashboard|layout|styling|stylesheet|css|` +
`tailwind|react|vue|svelte|next\.js|nextjs|astro|` +
`design system|web app|webapp|website|` +
`form|modal|navbar|navigation bar|sidebar|button|` +
`theme|dark mode|typography` +
`)\b`)

// detectInfrastructure checks if the story involves Docker, CI/CD, deployment,
// or infrastructure concerns. This triggers the InfrastructureDebugging playbook.
func detectInfrastructure(title, description string) bool {
Expand Down
47 changes: 47 additions & 0 deletions internal/engine/detect_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,53 @@ func TestDetectBugFix(t *testing.T) {
}
}

func TestDetectFrontend(t *testing.T) {
tests := []struct {
name string
title, desc string
ownedFiles []string
want bool
}{
{"ui-keyword-title", "Build the landing page", "", nil, true},
{"component-keyword", "Create reusable Button component", "", nil, true},
{"dashboard-keyword", "Admin dashboard with charts", "", nil, true},
{"frontend-in-desc", "", "Implement the frontend for task management", nil, true},
{"tailwind-keyword", "Style the app", "Use Tailwind for the layout", nil, true},
{"react-keyword", "Task list view", "React component rendering tasks", nil, true},
{"owned-tsx", "Wire task state", "", []string{"src/App.tsx"}, true},
{"owned-css", "Polish spacing", "", []string{"styles/main.css"}, true},
{"owned-vue", "Item editor", "", []string{"src/Editor.vue"}, true},
{"owned-svelte", "Item editor", "", []string{"src/Editor.svelte"}, true},
{"owned-html", "Static page", "", []string{"public/index.html"}, true},
{"backend-only", "Create REST API endpoints", "Express routes for tasks", nil, false},
{"db-story", "Add database migrations", "Postgres schema for users", nil, false},
{"go-files-only", "Implement parser", "", []string{"internal/parser/parser.go"}, false},
{"cli-story", "Add --json flag to CLI", "", []string{"cmd/root.go"}, false},
// "server-side rendering of the page" mentions page — still frontend work.
{"ssr", "Server-side rendering of the settings page", "", nil, true},
// Substring traps: keywords must match whole words only.
{"pagination-is-not-page", "Add pagination to the tasks API", "cursor-based pagination in the repository layer", nil, false},
{"performance-is-not-form", "Improve performance of the query planner", "", nil, false},
{"review-is-not-view", "Code review automation for PRs", "LLM review of diffs", nil, false},
{"format-is-not-form", "Format output as JSON", "", nil, false},
{"build-is-not-ui", "Build the release pipeline", "artifact signing", nil, false},
// Server-side HTML and "responsive" infrastructure are NOT UI work.
{"html-email-is-backend", "Generate HTML email report", "render the weekly digest as text/html", nil, false},
{"responsive-gateway-is-backend", "Design a responsive API gateway", "low-latency request routing", nil, false},
// But real UI stories that mention html carry the file or another keyword.
{"html-with-owned-file", "Static marketing site", "", []string{"public/index.html"}, true},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := detectFrontend(tt.title, tt.desc, tt.ownedFiles)
if got != tt.want {
t.Errorf("detectFrontend(%q, %q, %v) = %v, want %v", tt.title, tt.desc, tt.ownedFiles, got, tt.want)
}
})
}
}

func TestDetectInfrastructure(t *testing.T) {
tests := []struct {
title, desc string
Expand Down
Loading
Loading