Skip to content

fix(github): drop per-file patch in getPullRequestFiles to bound memory#111

Open
loks0n wants to merge 1 commit into
mainfrom
fix/bound-pull-request-files-memory
Open

fix(github): drop per-file patch in getPullRequestFiles to bound memory#111
loks0n wants to merge 1 commit into
mainfrom
fix/bound-pull-request-files-memory

Conversation

@loks0n

@loks0n loks0n commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Problem

GitHub::getPullRequestFiles() paginates a pull request's files and accumulates the full GitHub file objects — including each file's patch (the complete unified diff) — into one array before returning:

$files = $response['body'] ?? [];
$allFiles = array_merge($allFiles, $files);

patch is the only unbounded field (KBs–MBs per file) and grows with PR size. A large PR (thousands of files — generated code, lockfiles, vendored deps) materialises every diff in memory at once — hundreds of MB in a single request. In a long-running Swoole worker this drives the process toward its memory limit and OOM.

No caller uses patch: both consumers (appwrite/server-ce's GitHub events handler and external-PR authorize handler) only read filename / previous_filename via array_column().

Fix

Strip patch per page while paginating:

$files = $response['body'] ?? [];
foreach ($files as $file) {
    unset($file['patch']);
    $allFiles[] = $file;
}

The return shape is preserved (array of file objects keyed by filename, status, etc.), so existing behaviour and the array_column($result, 'filename') assertions in the GitLab/Gitea adapter tests are unaffected. Memory is now bounded to lightweight per-file metadata.

Test plan

  • php -l clean.
  • No caller reads patch; filename / previous_filename and all other metadata fields remain present.

🤖 Generated with Claude Code

getPullRequestFiles() paginated a pull request's files and accumulated
the full GitHub file objects -- including each file's `patch` (the
complete unified diff) -- into one array before returning. `patch` is
the only unbounded field (KBs-MBs per file) and grows with PR size, so a
large PR (thousands of files, e.g. generated code / lockfiles / vendored
deps) materialised every diff in memory at once -- hundreds of MB in a
single request.

No caller uses `patch`; both consumers only read `filename` /
`previous_filename`. Strip `patch` per page while paginating. The return
shape is preserved (array of file objects keyed by `filename`, etc.), so
memory is bounded to lightweight per-file metadata.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR removes the patch field (the full unified diff per file) from the objects accumulated by GitHub::getPullRequestFiles(). Previously, every diff was held in memory simultaneously across all pages; now only the lightweight metadata fields are retained.

  • The foreach copy-then-unset pattern is correct PHP: $file is a value copy, unset($file['patch']) strips the key from that copy before it is appended to $allFiles, and the original response data is not mutated.
  • The return shape is unchanged — all other fields (filename, status, previous_filename, etc.) are preserved — so callers using array_column() on those fields are unaffected.

Confidence Score: 5/5

Safe to merge — a minimal, well-scoped change that drops an unused field during pagination without altering the return contract.

The change touches a single loop in one method. The unset on a foreach value copy is idiomatic and correct PHP, no callers consume patch, and all other metadata fields remain intact. The fix is straightforward with no side effects.

No files require special attention.

Important Files Changed

Filename Overview
src/VCS/Adapter/Git/GitHub.php Strips the patch field from each file object during pagination in getPullRequestFiles, reducing peak memory from unbounded (full diffs accumulated) to lightweight per-file metadata only.

Reviews (1): Last reviewed commit: "fix(github): drop per-file patch in getP..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants