Skip to content

[FEATURE] Pull Requests as skill source β€” extract architectural guidance from PRsΒ #328

@yusufkaraaslan

Description

@yusufkaraaslan

πŸš€ Feature Description

Add PR (Pull Request) support as a skill source type, integrated into the existing GitHub scraper pipeline. PRs contain high-quality concentrated knowledge β€” diffs, descriptions, review comments, and linked issues β€” that can produce architectural guidance skills.

Integration with GitHub Source (not separate)

PRs are part of the GitHub source, not a standalone scraper. When using skill-seekers create owner/repo:

  • PRs are included as part of the GitHub analysis (alongside code, docs, community streams)
  • PR content enriches the codebase skill with architectural guidance
  • User can opt to generate PR-only skills with a flag (e.g., --prs-only)

PR Collection with Optional Max Count

  • Default: no cap β€” process all PRs (or all matching a filter)
  • Optional --max-prs N to limit count
  • Filter by label, author, date range, or state:
    skill-seekers create owner/repo --max-prs 50
    skill-seekers create owner/repo --pr-filter "label:architecture"
    skill-seekers create owner/repo --pr-filter "label:breaking-change"
    skill-seekers create owner/repo --prs-only --pr-filter "state:merged"

What to Extract from Each PR

  1. Title and description β€” the "why" behind the change
  2. Diff with context β€” what actually changed (summarized for large diffs)
  3. Review comments β€” reviewer feedback, edge cases caught, design discussions
  4. Linked issues β€” the problem being solved
  5. Labels β€” categorization (bug, feature, breaking-change, etc.)
  6. File change summary β€” which areas of the codebase were touched

Output: PR Skill + Architectural Guidance

  • PR content produces skills focused on architectural decisions and patterns
  • Supplements codebase analysis with "how and why the code evolved"
  • When used with GitHub source: enriches the codebase skill's architecture section
  • When used standalone (--prs-only): produces a focused "architectural guidance" skill

Freshness Warning

  • PRs are point-in-time snapshots β€” code may have changed since
  • Include metadata: PR date, merge status, whether files still exist
  • Flag stale PRs (e.g., files deleted or heavily modified since merge)

Source Detection

  • source_detector.py should recognize owner/repo#123 for single PR
  • Collection mode activated by --pr-filter or --max-prs flags
  • Default GitHub source includes PRs automatically (opt-out with --skip-prs)

Implementation Notes

  • Extend github_scraper.py or github_fetcher.py to fetch PR data via GitHub API
  • Use gh api repos/{owner}/{repo}/pulls for PR list and diffs
  • Review comments via pulls/{number}/comments and pulls/{number}/reviews
  • Large diffs should be summarized (configurable threshold)
  • Add to the Three-Stream architecture: PRs are part of the Community stream

Example

# GitHub source with PRs included (default)
skill-seekers create owner/repo

# Limit PR count
skill-seekers create owner/repo --max-prs 100

# PR-only skill (architectural guidance)
skill-seekers create owner/repo --prs-only --pr-filter "label:architecture"

# Single PR (future: source detection)
skill-seekers create owner/repo#456

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions