π Feature Description
Add PR (Pull Request) support as a skill source type, integrated into the existing GitHub scraper pipeline. PRs contain high-quality concentrated knowledge β diffs, descriptions, review comments, and linked issues β that can produce architectural guidance skills.
Integration with GitHub Source (not separate)
PRs are part of the GitHub source, not a standalone scraper. When using skill-seekers create owner/repo:
- PRs are included as part of the GitHub analysis (alongside code, docs, community streams)
- PR content enriches the codebase skill with architectural guidance
- User can opt to generate PR-only skills with a flag (e.g.,
--prs-only)
PR Collection with Optional Max Count
- Default: no cap β process all PRs (or all matching a filter)
- Optional
--max-prs N to limit count
- Filter by label, author, date range, or state:
skill-seekers create owner/repo --max-prs 50
skill-seekers create owner/repo --pr-filter "label:architecture"
skill-seekers create owner/repo --pr-filter "label:breaking-change"
skill-seekers create owner/repo --prs-only --pr-filter "state:merged"
What to Extract from Each PR
- Title and description β the "why" behind the change
- Diff with context β what actually changed (summarized for large diffs)
- Review comments β reviewer feedback, edge cases caught, design discussions
- Linked issues β the problem being solved
- Labels β categorization (bug, feature, breaking-change, etc.)
- File change summary β which areas of the codebase were touched
Output: PR Skill + Architectural Guidance
- PR content produces skills focused on architectural decisions and patterns
- Supplements codebase analysis with "how and why the code evolved"
- When used with GitHub source: enriches the codebase skill's architecture section
- When used standalone (
--prs-only): produces a focused "architectural guidance" skill
Freshness Warning
- PRs are point-in-time snapshots β code may have changed since
- Include metadata: PR date, merge status, whether files still exist
- Flag stale PRs (e.g., files deleted or heavily modified since merge)
Source Detection
source_detector.py should recognize owner/repo#123 for single PR
- Collection mode activated by
--pr-filter or --max-prs flags
- Default GitHub source includes PRs automatically (opt-out with
--skip-prs)
Implementation Notes
- Extend
github_scraper.py or github_fetcher.py to fetch PR data via GitHub API
- Use
gh api repos/{owner}/{repo}/pulls for PR list and diffs
- Review comments via
pulls/{number}/comments and pulls/{number}/reviews
- Large diffs should be summarized (configurable threshold)
- Add to the Three-Stream architecture: PRs are part of the Community stream
Example
# GitHub source with PRs included (default)
skill-seekers create owner/repo
# Limit PR count
skill-seekers create owner/repo --max-prs 100
# PR-only skill (architectural guidance)
skill-seekers create owner/repo --prs-only --pr-filter "label:architecture"
# Single PR (future: source detection)
skill-seekers create owner/repo#456
π Feature Description
Add PR (Pull Request) support as a skill source type, integrated into the existing GitHub scraper pipeline. PRs contain high-quality concentrated knowledge β diffs, descriptions, review comments, and linked issues β that can produce architectural guidance skills.
Integration with GitHub Source (not separate)
PRs are part of the GitHub source, not a standalone scraper. When using
skill-seekers create owner/repo:--prs-only)PR Collection with Optional Max Count
--max-prs Nto limit countWhat to Extract from Each PR
Output: PR Skill + Architectural Guidance
--prs-only): produces a focused "architectural guidance" skillFreshness Warning
Source Detection
source_detector.pyshould recognizeowner/repo#123for single PR--pr-filteror--max-prsflags--skip-prs)Implementation Notes
github_scraper.pyorgithub_fetcher.pyto fetch PR data via GitHub APIgh api repos/{owner}/{repo}/pullsfor PR list and diffspulls/{number}/commentsandpulls/{number}/reviewsExample