chore(audit): weekly cloud audit — capacity-content guard + visible scanner failures#110
Merged
Merged
Conversation
…canner failures Two verified defects found in the newest (security agent + conflict resolution) code and fixed with regression tests. 1. conflict_resolver.go · resolveFile — the senior conflict-resolution path lacked the capacity-signature guard that resolveFileTechLead already has. A Claude Max session-limit notice returned as SUCCESSFUL content (some CLI versions do this instead of an error envelope) passed the marker/chatter checks and was written verbatim into the conflicted source file while the rebase "succeeded" — corrupting the file under a transient outage. Now surfaced as a 429 capacity error so the caller aborts and the pipeline pauses-and-resumes after the limit resets. 2. security/scanners.go · RunScanners — a scanner that ran but errored (exec crash, timeout, parse failure) was appended to `ran` and its error swallowed with a comment claiming the caller logs it; the callers never receive the error, so a failed scan was indistinguishable from a clean "ran · 0 findings" with no audit trail. Failed scanners are now logged and reported in a new Report.Failed bucket (and excluded from ScannersRun), restoring visibility of lost coverage. Graceful degradation (never blocks the merge) is preserved. Tests: TestRebaseWithResolution_SeniorCapacityContentDoesNotCorruptFile, TestSecurityGate_ScanRepo_SurfacesFailedScanners, extended report_test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01VzEm2jAYUXJqRi9YhWXdXK
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Weekly cloud audit — 2026-06-29
Fan-out adversarial audit (concurrency, silent failures, event-sourcing wiring, security) plus a focused second round on the newest code (security agent, completion gate, conflict-resolution JSON merge, integration build). Round 1 was dry across all four priority categories; round 2 on the freshest code surfaced the two verified defects below. Each was adversarially confirmed against the source before fixing and is pinned by a regression test.
Findings fixed
1.
internal/engine/conflict_resolver.go·resolveFile— senior conflict path could write a capacity notice into a source fileThe tech-lead path (
resolveFileTechLead) guards against a session-limit / overloaded notice that some CLI versions return as successful content rather than an error envelope (if llm.ContainsCapacitySignature(resp.Content) { return 429 }). The senior fast path — used for the common single-file conflict — had no such guard. A Claude Max session-limit notice returned as content has no conflict markers and matches no chatter marker, so it passed every check, was returned as a "clean" resolution, and was written verbatim into the conflicted file viaos.WriteFilewhile the rebase reported success — corrupting the file under a transient outage instead of pausing. This is exactly the failure mode the documented "429 / session-limit clean pause" design exists to prevent.Fix: mirror the tech-lead guard — surface a
429*llm.APIError(capacity, noterrUnmergeable) so the caller aborts the rebase and the pipeline pauses-and-resumes after the limit resets.2.
internal/security/scanners.go·RunScanners— a failed scanner masqueraded as a clean run, with no logA scanner was appended to
ranbefores.Run, and on error the failure wascontinued with a comment claiming "log handled by caller." The callers (ScanRepo/ReviewStory) receive only(findings, ran, skipped)— no error — so nothing was ever logged, and a scanner that crashed / timed out / failed to parse was indistinguishable from one that ran clean and found nothing. For the per-story pre-merge gate this means a build whose scan partly failed was reported as scanned-clean with no audit trail of the lost coverage.Fix: failed scanners are now logged at the failure site and returned in a new
Report.Failedbucket (and excluded fromScannersRun);FormatMarkdownrenders "Failed (ran but errored — coverage lost)".ScanRepo/ReviewStorylog when coverage is lost. The documented graceful-degradation policy (a scanner failure never blocks the merge) is preserved — the change restores visibility, not blocking.Audit areas that came back clean (no changes)
errUnmergeableclassification all surface failure correctly.Project();TestProject_AllDeclaredEventsHandledis sound.Verification
go build ./...— cleango vet ./...— cleangolangci-lint run ./...— 0 issuesgo test ./internal/security/ ./internal/engine/ -count=1— pass (including the two new tests + extendedreport_test)go test ./... -count=1— pass except three pre-existing environment-only failures unrelated to this diff, reproduced on unmodifiedmain:cli/TestApproveStory_*(theghCLI binary is not installed in the audit container),engine/TestWriteMetadata_InvalidDirandrepolearn/TestCountFilesInDir_NonexistentDir(the container runs as root, so writes to/nonexistent/...succeed where the tests assume an unwritable path; the repolearn case is collateral pollution from the engine one). All three pass under a normal non-root CI user withinit.defaultBranch=main.make doc-coverage(TestDocCoverage_*) — pass. No new CLI command or config field;Report.Failedis an additiveomitemptyJSON field on existingvxd security scan --jsonoutput, not covered by the doc-coverage gate.make vuln(govulncheck) — could not run in this environment: the audit container's network policy blocksvuln.go.dev:443(gateway returns 403), so the vulnerability DB cannot be fetched. No dependency changes were made in this PR.🤖 Generated with Claude Code
Generated by Claude Code