Skip to content

feat: add test-quality-sentinel agentic workflow#25353

Merged
pelikhan merged 7 commits intomainfrom
copilot/create-test-quality-sentinel-workflow
Apr 8, 2026
Merged

feat: add test-quality-sentinel agentic workflow#25353
pelikhan merged 7 commits intomainfrom
copilot/create-test-quality-sentinel-workflow

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 8, 2026

High test counts create false safety — 500+ tests feel reassuring until vibe-coding reveals the design was wrong. This workflow analyzes what tests actually enforce, not just that they exist.

Workflow: test-quality-sentinel

Triggers on ready_for_review PRs. Analyzes new/modified Go (*_test.go) and JavaScript (*.test.cjs, *.test.js) tests; detects but does not score Python/Rust tests.

Analysis pipeline

  • Step 1–2: Identifies changed test files; checks new Go test files for mandatory //go:build !integration / //go:build integration build tags; extracts up to 50 new/modified test functions (newly added prioritized over modified; sampling noted in PR comment)

  • Step 3: Structural analysis via awk grep patterns on the diff:

    • Go: detects both stdlib (t.Errorf/t.Fatalf) and testify (assert.*/require.*) assertions, t.Run() table-driven tests, and forbidden mock library calls (gomock, testify/mock, .EXPECT(), .On())
    • JavaScript: detects expect() assertions, .toThrow()/.rejects error paths, and vitest mocking (vi.mock, vi.spyOn, vi.fn)
  • Step 4: AI reviews each test against 3 questions:

    • What design invariant does this test enforce?
    • What would break if this test were deleted?
    • Behavioral contract or implementation detail?
  • Step 5: Test inflation check — flags files where new test lines > 2× new production lines (foo_test.go → foo.go, foo.test.cjs → foo.cjs)

  • Step 6: Scores 0–100 across four weighted dimensions:

    Dimension Weight
    Behavioral coverage (% design tests) 40 pts
    Error/edge case coverage 30 pts
    Low duplication 20 pts
    Proportional growth (no inflation) 10 pts
  • Step 7: Posts PR comment with score, per-test classification table, flagged tests with improvement suggestions, and a 🚨 Coding-guideline violations row

  • Step 8: Submits REQUEST_CHANGES review if >30% of new tests are implementation tests or if any coding-guideline violation is detected; APPROVE otherwise

Red flags detected

  1. Mock-heavy (vitest) with no observable-behavior assertion — external I/O mocks (fs, process.stderr, global.core) are exempt
  2. Go mock libraries used (hard violation): gomock, testify/mock, .EXPECT(), .On() — project guideline is "no mocks or test suites; test real component interactions"
  3. Missing //go:build tag on new Go test file (hard violation): every *_test.go must start with //go:build !integration or //go:build integration
  4. Happy-path only (no error/edge cases)
  5. Test inflation ratio >2:1 (test lines vs. production lines added)
  6. Duplicated assertion patterns across 3+ test functions
  7. Zero assertions in test body
  8. Go assertions missing descriptive message arguments (guideline requires them)

.gitignore change

Added negation exception for test-quality-sentinel.* — the existing test-* glob pattern was treating this production workflow as a temporary test artifact.

Copilot AI linked an issue Apr 8, 2026 that may be closed by this pull request
9 tasks
Copilot AI and others added 3 commits April 8, 2026 17:41
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Create test-quality-sentinel workflow that analyzes test quality beyond
code coverage percentages on every PR. The workflow:

- Triggers on pull_request (opened, synchronize, ready_for_review)
- Detects implementation-detail tests, happy-path-only tests, test
  inflation, and duplication via AST + regex analysis
- AI agent answers 3 quality questions per test (design invariant, value
  if deleted, contract vs. implementation)
- Posts Test Quality Score (0-100) comment with per-test feedback
- Distinguishes design tests vs. implementation tests
- Fails check (REQUEST_CHANGES review) if >30% of new tests are low-value
- Supports Python (pytest) and Rust (#[test]) as primary languages
- Add .gitignore exception since workflow name starts with 'test-'

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Clarify Step 1: mark Python/Rust as 'analyzed', Go/JS as 'detected but not scored'
- Remove Go-only test extraction command from Step 2
- Replace Go mock libraries (mockery/testify/mock) with Rust equivalents (mockall)
- Clarify inflation penalty formula as binary (deduct all 10 pts if ANY file exceeds 2:1)
- Remove Go-specific t.Errorf reference from calibration guidelines
- Add explicit sampling instructions (Step 2 + PR comment note) for the 50-test limit
- Scope test inflation mapping to Python and Rust only

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Create test-quality-sentinel agentic workflow feat: add test-quality-sentinel agentic workflow Apr 8, 2026
Copilot AI requested a review from pelikhan April 8, 2026 17:59
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Apr 8, 2026

@copilot this codebase operates on Go and JavaScript

The codebase operates on Go and JavaScript, not Python and Rust.

- Replace Python (pytest) and Rust (#[test]) with Go (*_test.go) and
  JavaScript/TypeScript (*.test.*, *.spec.*) as primary analysis targets
- Remove python3/rustfmt bash tool allowances; add node
- Rewrite Step 3 with Go awk analysis (testify/gomock signals, t.Run
  table-driven test credit) and JavaScript awk analysis (jest/sinon)
- Update Step 2 diff commands to target _test.go and *.test.{js,ts,cjs}
- Update Red Flags section: reference jest.mock/jest.spyOn and gomock
- Update Step 5 inflation mapping: foo_test.go→foo.go, foo.test.ts→foo.ts
- Update PR comment language section: Go/JS counts
- Update Guidelines: language support, calibration (assert.Error, toThrow,
  t.Run table-driven test credit), Analysis Scope
- Fix illustrative example rows in comment template to use Go paths

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/180da1af-6c69-40ea-bcf1-4a3424cfeba7

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 8, 2026

@copilot this codebase operates on Go and JavaScript

Updated in commit 2674b42. The primary analysis targets are now Go (*_test.go — using testify/gomock signal detection, with credit for table-driven t.Run tests) and JavaScript/TypeScript (*.test.*, *.spec.* — using jest/sinon signal detection). Python and Rust are now detected-only (noted in the report but not scored).

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Apr 8, 2026

@copilot review the coding guidelines and habits of this report to update the agentic workflows

Reviewed actual test files in pkg/ and actions/setup/js/ to update the
workflow with accurate patterns:

Go tests:
- Both t.Errorf/t.Fatalf (stdlib) and assert./require. (testify) detected
- Track t.Run() for table-driven tests (the preferred pattern)
- Track forbidden_mocks (gomock/testify/mock/.EXPECT()/.On()) as hard red flag
- Add build tag check: every *_test.go must start with //go:build !integration
  or //go:build integration
- No mock libraries used in Go (project guideline: "no mocks or test suites")

JavaScript tests:
- Test framework is vitest, NOT jest: vi.fn(), vi.spyOn(), vi.mock()
- Primary test extension is .test.cjs (245 files), not .spec.* or .test.ts
- scripts/ uses .test.js; drop all .spec.* and .test.ts references
- Mocking external I/O (fs, process.stderr, global.core) is legitimate;
  flag only internal business-logic mocking without behavioral assertions

New/updated red flags:
- Mock libraries in Go = hard coding-guideline violation (always REQUEST_CHANGES)
- Missing //go:build tag on new Go test file = hard violation
- No assertion messages in Go tests (guideline: always include context)

Scoring / Failure Condition:
- Fail on: >30% implementation tests OR any coding-guideline violation
- Step 8: three distinct review outcomes (pass, quality-fail, guideline-fail)
- PR comment table: add 🚨 Coding-guideline violations row

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/a5773904-75eb-4e0c-9ebe-a61f6ffd5e14

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 8, 2026

@copilot review the coding guidelines and habits of this report to update the agentic workflows

Updated in commit 167a9ef. Changes based on reviewing the actual test files and coding guidelines:

Go patterns (from pkg/):

  • Awk now tracks both stdlib (t.Errorf/t.Fatalf) and testify (assert.*/require.*) assertions — both styles are used
  • t.Run() table-driven tests get per-row credit (this is the preferred pattern per guidelines)
  • gomock/testify/mock/.EXPECT()/.On() are now a hard red flag — the project guideline is "no mocks or test suites; test real component interactions"
  • New: build tag check — every *_test.go must start with //go:build !integration or //go:build integration; missing tag = hard violation → REQUEST_CHANGES
  • New: flags assertion calls missing descriptive message arguments (guidelines require them)

JavaScript patterns (from actions/setup/js/):

  • vitest throughout (not jest): mock primitives updated to vi.fn(), vi.spyOn(), vi.mock()
  • Primary test extension is .test.cjs (245 files); .test.js for scripts; removed all .spec.*/.test.ts references
  • External I/O mocking (fs, process.stderr, global.core) is explicitly noted as legitimate; only flags internal business-logic mocking

Scoring / review logic:

  • Failure now triggers on >30% implementation tests or any coding-guideline violation (mock library in Go, missing build tag)
  • Three distinct REQUEST_CHANGES messages: quality fail, guideline fail

@github-actions github-actions bot mentioned this pull request Apr 8, 2026
description: Analyzes test quality beyond code coverage percentages on every PR, detecting implementation-detail tests, happy-path-only tests, test inflation, and duplication
on:
pull_request:
types: [opened, synchronize, ready_for_review]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot remove opened, synchronize

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9901039 — trigger is now ready_for_review only.

@pelikhan pelikhan marked this pull request as ready for review April 8, 2026 19:14
Copilot AI review requested due to automatic review settings April 8, 2026 19:14
@pelikhan pelikhan merged commit cd97145 into main Apr 8, 2026
@pelikhan pelikhan deleted the copilot/create-test-quality-sentinel-workflow branch April 8, 2026 19:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new agentic workflow (test-quality-sentinel) intended to assess the quality of newly added/modified tests (Go + JavaScript) on ready_for_review PRs, and adjusts ignore rules so the workflow artifacts are committed.

Changes:

  • Added Test Quality Sentinel workflow definition (.md) and generated lockfile (.lock.yml).
  • Updated .gitignore to stop ignoring the new workflow artifacts.
  • Updated architecture-guardian.lock.yml to use the same Copilot driver invocation pattern.
Show a summary per file
File Description
.gitignore Adds negation rules so test-quality-sentinel workflow files aren’t ignored by test-* globs.
.github/workflows/test-quality-sentinel.md New workflow prompt/instructions for analyzing changed test diffs and producing a PR comment + review.
.github/workflows/test-quality-sentinel.lock.yml Compiled workflow output for the new workflow.
.github/workflows/architecture-guardian.lock.yml Updates Copilot CLI execution command to run via copilot_driver.cjs.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/workflows/test-quality-sentinel.md:161

  • The vitest awk scanner for JS tests is unlikely to work on real diffs because it (1) requires it( / test( to be flush-left after + (no indentation), and (2) ends a test block only on a line matching +}), which won’t match the common + }); form (semicolon + indentation). This can cause missing/incorrect per-test counts. Loosen the regexes to allow whitespace and optional ; on the closing line.
git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | awk '
/^\+(it|test)\(/ {
  if (test_name) print test_name, "assertions=" assertions, "errors=" errors, "mocks=" mocks
  match($0, /(it|test)\(["'"'"']([^"'"'"']+)/, arr); test_name=arr[2]; assertions=0; errors=0; mocks=0
}
test_name && /^\+.*expect\(/ { assertions++ }
test_name && /^\+.*(\.toThrow|\.rejects|\.toThrowError)/ { errors++ }
test_name && /^\+.*(vi\.mock|vi\.spyOn|vi\.fn)/ { mocks++ }
test_name && /^\+\}\)/ { print test_name, "assertions=" assertions, "errors=" errors, "mocks=" mocks; test_name="" }
END { if (test_name) print test_name, "assertions=" assertions, "errors=" errors, "mocks=" mocks }
  • Files reviewed: 3/4 changed files
  • Comments generated: 2

git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*_test.go' | grep -E "^\+func Test"

# For JavaScript (.test.cjs is the primary format; .test.js used in scripts/)
git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | grep -E "^\+(it|test|describe)\("
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JS test extraction grep only matches lines that start with +(it|test|describe)( immediately after the +. In typical diffs, test blocks are indented (e.g. + it(...)), so this will miss most JS tests. Consider allowing leading whitespace after + (e.g. ^\+\s*(it|test|describe)\().

This issue also appears on line 152 of the same file.

Suggested change
git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | grep -E "^\+(it|test|describe)\("
git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | grep -E "^\+\s*(it|test|describe)\("

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +145
test_name && /^\+.*(gomock\.|testify\/mock|\.EXPECT\(\)|\.On\(|\.Return\()/ { forbidden_mocks++ }
test_name && /^\+\}$/ { print test_name, "assertions=" assertions, "errors=" errors, "table_driven=" table_driven, "forbidden_mocks=" forbidden_mocks; test_name="" }
END { if (test_name) print test_name, "assertions=" assertions, "errors=" errors, "table_driven=" table_driven, "forbidden_mocks=" forbidden_mocks }
'
```

Key signals for Go tests in this codebase:
- **Assertions (accepted forms)**:
- testify: `assert.Equal`, `assert.NoError`, `assert.Error`, `require.Equal`, `require.NoError`, etc.
- stdlib: `t.Errorf(...)`, `t.Fatalf(...)`, `t.Error(...)`
- **Error coverage**: `assert.Error` / `require.Error`, `assert.NoError` / `require.NoError`, `t.Fatalf` / `t.Errorf` explicitly checking error returns
- **Table-driven tests**: `t.Run()` over a `tests []struct{...}` slice — the preferred pattern in this codebase; a single table-driven test covers all its sub-cases, so credit all included error / edge-case rows
- **Assertion messages**: guidelines require a descriptive message argument on every assertion call — e.g. `assert.Equal(t, expected, actual, "descriptive context")` not bare `assert.Equal(t, expected, actual)`
- **Forbidden**: any use of `gomock`, `testify/mock`, `.EXPECT()`, `.On()`, `.Return()` in Go tests violates the project's "no mocks" guideline; flag immediately

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forbidden Go mock detection treats any .Return( as a hard-violation signal (forbidden_mocks++) and documents .Return() as forbidden, but the PR description’s forbidden list is gomock, testify/mock, .EXPECT(), .On() (no .Return()). As written, .Return( could also match unrelated fluent APIs and create false positives. Recommend aligning the forbidden list with the PR description and/or only counting .Return( when it’s clearly part of a mock chain (e.g., alongside gomock/.EXPECT()), rather than matching any .Return( occurrence.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[plan] Create "test-quality-sentinel" agentic workflow

3 participants