|
| 1 | +--- |
| 2 | +name: CI Doctor |
| 3 | +description: Automated CI failure investigator that analyzes logs, identifies root causes, and creates investigation issues. |
| 4 | + |
| 5 | +on: |
| 6 | + workflow_run: |
| 7 | + # NOTE: GitHub Actions doesn't support wildcards for workflow_run. |
| 8 | + # When adding new workflows, add them to this list to monitor for failures. |
| 9 | + workflows: |
| 10 | + - "Build Verification" |
| 11 | + - "CI/CD Pipelines and Integration Tests Gap Assessment" |
| 12 | + - "CodeQL" |
| 13 | + - "Container Security Scan" |
| 14 | + - "Copilot Setup Steps" |
| 15 | + - "Daily Security Review and Threat Modeling" |
| 16 | + - "Dependency Vulnerability Audit" |
| 17 | + - "Deploy Documentation" |
| 18 | + - "Examples Test" |
| 19 | + - "Issue Duplication Detector" |
| 20 | + - "Issue Monster" |
| 21 | + - "Lint" |
| 22 | + - "Pelis Agent Factory Advisor" |
| 23 | + - "Plan Command" |
| 24 | + - "PR Title Check" |
| 25 | + - "Release" |
| 26 | + - "Security Guard" |
| 27 | + - "Smoke Claude" |
| 28 | + - "Smoke Copilot" |
| 29 | + - "Test Coverage" |
| 30 | + - "Test Setup Action" |
| 31 | + - "TypeScript Type Check" |
| 32 | + - "Update Release Notes" |
| 33 | + types: |
| 34 | + - completed |
| 35 | + branches: |
| 36 | + - main |
| 37 | + |
| 38 | +if: ${{ github.event.workflow_run.conclusion == 'failure' }} |
| 39 | + |
| 40 | +permissions: |
| 41 | + contents: read |
| 42 | + actions: read |
| 43 | + issues: write |
| 44 | + pull-requests: read |
| 45 | + |
| 46 | +imports: |
| 47 | + - shared/mcp-pagination.md |
| 48 | + |
| 49 | +tools: |
| 50 | + github: |
| 51 | + toolsets: [default, actions] |
| 52 | + cache-memory: true |
| 53 | + |
| 54 | +network: |
| 55 | + allowed: |
| 56 | + - github |
| 57 | + |
| 58 | +safe-outputs: |
| 59 | + create-issue: |
| 60 | + title-prefix: "🏥 CI Failure" |
| 61 | + add-comment: |
| 62 | + max: 1 |
| 63 | + |
| 64 | +timeout-minutes: 10 |
| 65 | +--- |
| 66 | + |
| 67 | +# CI Failure Doctor |
| 68 | + |
| 69 | +You are the CI Failure Doctor. When a workflow fails, investigate the root cause and create an actionable investigation report. |
| 70 | + |
| 71 | +## Context |
| 72 | + |
| 73 | +- **Repository**: ${{ github.repository }} |
| 74 | +- **Run**: [${{ github.event.workflow_run.id }}](${{ github.event.workflow_run.html_url }}) |
| 75 | +- **Workflow**: ${{ github.event.workflow_run.name }} |
| 76 | +- **Conclusion**: ${{ github.event.workflow_run.conclusion }} |
| 77 | +- **Commit**: ${{ github.event.workflow_run.head_sha }} |
| 78 | +- **Branch**: ${{ github.event.workflow_run.head_branch }} |
| 79 | + |
| 80 | +## Your Mission |
| 81 | + |
| 82 | +1. **Fetch logs** from failed jobs using the GitHub Actions tools |
| 83 | +2. **Analyze the failure** - look for error patterns, stack traces, and root causes |
| 84 | +3. **Search cache-memory** for similar past failures |
| 85 | +4. **Check for existing issues** that match this failure |
| 86 | +5. **Create an investigation issue** if no duplicate exists |
| 87 | + |
| 88 | +## Key Patterns for This Repository |
| 89 | + |
| 90 | +This is the AWF (Agentic Workflow Firewall) repository with Docker/networking tests. Common failures: |
| 91 | +- Docker network conflicts (`Pool overlaps`, orphaned `awf-net`) |
| 92 | +- Container cleanup issues (`timeout` kills leaving orphaned resources) |
| 93 | +- iptables/NET_ADMIN capability problems |
| 94 | +- Squid proxy healthcheck failures |
| 95 | + |
| 96 | +## Output |
| 97 | + |
| 98 | +Create an issue with: |
| 99 | +- Summary of what failed |
| 100 | +- Root cause analysis |
| 101 | +- Recommended actions |
| 102 | +- Labels: `bug`, `ci` |
| 103 | + |
| 104 | +If a duplicate issue exists, comment on it instead. |
| 105 | + |
| 106 | +--- |
| 107 | +*🏥 Automatically investigated by CI Doctor* |
0 commit comments