chore(audit): weekly cloud audit — capacity-content guard + visible scanner failures by tzone85 · Pull Request #110 · tzone85/vortex-dispatch

tzone85 · 2026-06-29T04:22:08Z

Weekly cloud audit — 2026-06-29

Fan-out adversarial audit (concurrency, silent failures, event-sourcing wiring, security) plus a focused second round on the newest code (security agent, completion gate, conflict-resolution JSON merge, integration build). Round 1 was dry across all four priority categories; round 2 on the freshest code surfaced the two verified defects below. Each was adversarially confirmed against the source before fixing and is pinned by a regression test.

Findings fixed

1. internal/engine/conflict_resolver.go · resolveFile — senior conflict path could write a capacity notice into a source file
The tech-lead path (resolveFileTechLead) guards against a session-limit / overloaded notice that some CLI versions return as successful content rather than an error envelope (if llm.ContainsCapacitySignature(resp.Content) { return 429 }). The senior fast path — used for the common single-file conflict — had no such guard. A Claude Max session-limit notice returned as content has no conflict markers and matches no chatter marker, so it passed every check, was returned as a "clean" resolution, and was written verbatim into the conflicted file via os.WriteFile while the rebase reported success — corrupting the file under a transient outage instead of pausing. This is exactly the failure mode the documented "429 / session-limit clean pause" design exists to prevent.
Fix: mirror the tech-lead guard — surface a 429 *llm.APIError (capacity, not errUnmergeable) so the caller aborts the rebase and the pipeline pauses-and-resumes after the limit resets.

2. internal/security/scanners.go · RunScanners — a failed scanner masqueraded as a clean run, with no log
A scanner was appended to ran before s.Run, and on error the failure was continued with a comment claiming "log handled by caller." The callers (ScanRepo / ReviewStory) receive only (findings, ran, skipped) — no error — so nothing was ever logged, and a scanner that crashed / timed out / failed to parse was indistinguishable from one that ran clean and found nothing. For the per-story pre-merge gate this means a build whose scan partly failed was reported as scanned-clean with no audit trail of the lost coverage.
Fix: failed scanners are now logged at the failure site and returned in a new Report.Failed bucket (and excluded from ScannersRun); FormatMarkdown renders "Failed (ran but errored — coverage lost)". ScanRepo/ReviewStory log when coverage is lost. The documented graceful-degradation policy (a scanner failure never blocks the merge) is preserved — the change restores visibility, not blocking.

Audit areas that came back clean (no changes)

Concurrency — no lock held across I/O, no unsynchronized shared-map writes, no new unbounded contexts.
Silent failures / safety gates — completion gate, premerge QA, reviewer, conflict errUnmergeable classification all surface failure correctly.
Event-sourcing wiring — all 60 declared event types are handled in Project(); TestProject_AllDeclaredEventsHandled is sound.
Security — shell/path/SQL injection, dashboard auth, secret redaction, and prompt-injection boundaries all remain guarded.

Verification

go build ./... — clean
go vet ./... — clean
golangci-lint run ./... — 0 issues
go test ./internal/security/ ./internal/engine/ -count=1 — pass (including the two new tests + extended report_test)
go test ./... -count=1 — pass except three pre-existing environment-only failures unrelated to this diff, reproduced on unmodified main: cli/TestApproveStory_* (the gh CLI binary is not installed in the audit container), engine/TestWriteMetadata_InvalidDir and repolearn/TestCountFilesInDir_NonexistentDir (the container runs as root, so writes to /nonexistent/... succeed where the tests assume an unwritable path; the repolearn case is collateral pollution from the engine one). All three pass under a normal non-root CI user with init.defaultBranch=main.
make doc-coverage (TestDocCoverage_*) — pass. No new CLI command or config field; Report.Failed is an additive omitempty JSON field on existing vxd security scan --json output, not covered by the doc-coverage gate.
make vuln (govulncheck) — could not run in this environment: the audit container's network policy blocks vuln.go.dev:443 (gateway returns 403), so the vulnerability DB cannot be fetched. No dependency changes were made in this PR.

🤖 Generated with Claude Code

Generated by Claude Code

…canner failures Two verified defects found in the newest (security agent + conflict resolution) code and fixed with regression tests. 1. conflict_resolver.go · resolveFile — the senior conflict-resolution path lacked the capacity-signature guard that resolveFileTechLead already has. A Claude Max session-limit notice returned as SUCCESSFUL content (some CLI versions do this instead of an error envelope) passed the marker/chatter checks and was written verbatim into the conflicted source file while the rebase "succeeded" — corrupting the file under a transient outage. Now surfaced as a 429 capacity error so the caller aborts and the pipeline pauses-and-resumes after the limit resets. 2. security/scanners.go · RunScanners — a scanner that ran but errored (exec crash, timeout, parse failure) was appended to `ran` and its error swallowed with a comment claiming the caller logs it; the callers never receive the error, so a failed scan was indistinguishable from a clean "ran · 0 findings" with no audit trail. Failed scanners are now logged and reported in a new Report.Failed bucket (and excluded from ScannersRun), restoring visibility of lost coverage. Graceful degradation (never blocks the merge) is preserved. Tests: TestRebaseWithResolution_SeniorCapacityContentDoesNotCorruptFile, TestSecurityGate_ScanRepo_SurfacesFailedScanners, extended report_test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01VzEm2jAYUXJqRi9YhWXdXK

tzone85 merged commit 72eacb7 into main Jun 29, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(audit): weekly cloud audit — capacity-content guard + visible scanner failures#110

chore(audit): weekly cloud audit — capacity-content guard + visible scanner failures#110
tzone85 merged 1 commit into
mainfrom
fix/audit-cloud-2026-06-29

tzone85 commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tzone85 commented Jun 29, 2026

Weekly cloud audit — 2026-06-29

Findings fixed

Audit areas that came back clean (no changes)

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants