ci: fix Actions cache quota thrashing behind the slow Servo builds#104
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
💤 Files with no reviewable changes (1)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughUpdates the rust-build-cache action to prefer ccache and change cache input defaults; makes cache saving conditional (auto on main), integrates clippy into native and Servo test jobs, reuses shared caches for downstream jobs, trims servo cache-warm matrix, and simplifies some job dependencies. ChangesCI/CD Cache Strategy and Clippy Integration
🎯 4 (Complex) | ⏱️ ~40 minutes Suggested labels:
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.github/workflows/ci.yml (2)
483-535:⚠️ Potential issue | 🟠 MajorFix same-workflow cache reuse: add proper job dependencies or artifact handoff
In
.github/workflows/ci.yml(lines 483-535 and 631-691),python-generatedande2e-buildrestore Rust caches usingshared-key: rust-test-daemon/servo-daemon, but they onlyneeds: [changes, rust-check-shared]andneeds: changesrespectively—neither waits for the cache-producing jobsrust-test(daemon variant usesshared_key: rust-test-daemon) orrust-test-servo(daemon-servo variant usesshared_key: servo-daemon). Sincerust-cacheis implemented viaSwatinem/rust-cache(which saves/updates cache entries at the end of the producer job lifecycle), these consumers can’t rely on cache entries generated earlier in the same workflow run unless you introduce explicit producer→consumer ordering (vianeeds) or pass build outputs via artifacts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 483 - 535, The python-generated and e2e-build jobs restore caches using shared-key values (rust-test-daemon and servo-daemon) but do not depend on the producer jobs that actually save those caches (rust-test and rust-test-servo), so add explicit job ordering or artifact handoff: update the python-generated job to include needs: [changes, rust-check-shared, rust-test] (or make it consume an artifact produced by rust-test), and update e2e-build to include needs: [changes, rust-test-servo] (or fetch/save artifacts from rust-test-servo), ensuring the rust-cache/Swatinem producer jobs run before consumers so their shared-key caches are available in the same workflow run.
245-249:⚠️ Potential issue | 🟠 MajorEnable
cache-workspace-cratesfor both saving and restoring workspace build outputs
./.github/actions/rust-build-cache/action.ymldefaultscache-workspace-cratesto"false", and therust-test/rust-test-servoproducer steps at lines 245-249 and 362-366 don’t override it—so the saved caches contain dependencies only (workspace crate artifacts excluded). Sincepython-generatedande2e-buildalso restore withoutcache-workspace-crates: "true", they won’t reuse workspace build outputs even if the producers start saving them.Suggested fix
- uses: ./.github/actions/rust-build-cache with: shared-key: ${{ matrix.shared_key }} workspaces: . -> ${{ matrix.target_dir }} + cache-workspace-crates: "true" cache-on-failure: "false"Apply the same change in
rust-test-servo.Also add
cache-workspace-crates: "true"to the./.github/actions/rust-build-cachesteps inpython-generatedande2e-build(their current calls omit it).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 245 - 249, The rust build cache action is not preserving workspace crate artifacts because the parameter cache-workspace-crates defaults to "false"; update the rust-build-cache action invocations in the rust-test and rust-test-servo producer steps to include cache-workspace-crates: "true" so workspace crate outputs are saved, and also add cache-workspace-crates: "true" to the rust-build-cache uses in the python-generated and e2e-build consumer steps so they restore workspace artifacts; locate the action usage lines (the uses: ./.github/actions/rust-build-cache blocks) in those job definitions and add the cache-workspace-crates: "true" key alongside shared-key, workspaces, and cache-on-failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 483-535: The python-generated and e2e-build jobs restore caches
using shared-key values (rust-test-daemon and servo-daemon) but do not depend on
the producer jobs that actually save those caches (rust-test and
rust-test-servo), so add explicit job ordering or artifact handoff: update the
python-generated job to include needs: [changes, rust-check-shared, rust-test]
(or make it consume an artifact produced by rust-test), and update e2e-build to
include needs: [changes, rust-test-servo] (or fetch/save artifacts from
rust-test-servo), ensuring the rust-cache/Swatinem producer jobs run before
consumers so their shared-key caches are available in the same workflow run.
- Around line 245-249: The rust build cache action is not preserving workspace
crate artifacts because the parameter cache-workspace-crates defaults to
"false"; update the rust-build-cache action invocations in the rust-test and
rust-test-servo producer steps to include cache-workspace-crates: "true" so
workspace crate outputs are saved, and also add cache-workspace-crates: "true"
to the rust-build-cache uses in the python-generated and e2e-build consumer
steps so they restore workspace artifacts; locate the action usage lines (the
uses: ./.github/actions/rust-build-cache blocks) in those job definitions and
add the cache-workspace-crates: "true" key alongside shared-key, workspaces, and
cache-on-failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eda6c6a1-992b-4352-99a7-997c4bda50cb
📒 Files selected for processing (3)
.github/actions/rust-build-cache/action.yml.github/workflows/ci.yml.github/workflows/servo-cache-warm.yml
💤 Files with no reviewable changes (1)
- .github/workflows/servo-cache-warm.yml
|
Validation run is in. Everything this PR changes came back clean:
The two red checks are inherited from main, not from this PR: main's latest run fails Rust Check / Shared (stale One nice surprise: dropping This run's jobs were all cold for the same reason — that's the one-time key-rotation cost, not the steady state. |
8474220 to
a83eb8a
Compare
|
Update on the e2e cancellations: it was never the CDN. Node 24.16.0 (what the floating Pushed Full board is re-running on the new head; merging when green per Bliss. |
The repo's 10GB Actions cache quota was holding a ~15GB working set: sccache's GHA backend stored ~1200 blob entries (~1GB) with a measured 0% Rust hit rate, and every PR run saved multi-GB rust-cache entries scoped to its own ref, duplicating main's. LRU eviction then dropped the Servo dependency caches between runs, forcing 20-34 minute cold rebuilds of the ~1100-crate Servo tree. Drop the sccache GHA backend entirely (local sccache via the build wrapper is unaffected), default save-if to "auto" so only main saves cache entries while PRs and tags restore them, stop caching unused registry crates, and bound ccache now that mozjs links from prebuilt archives. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
rust-check-servo and rust-test-servo shared a cache key per Servo shape, but Swatinem/rust-cache only saves on a primary-key miss, so whichever job saved first won permanently and the other rebuilt its artifacts on every main push (Rust Test / Core Servo paid ~20 minutes per run). Clippy now runs inside the test job for each Servo shape and for the native app suite, so one entry per shape carries lint and test artifacts together. E2E builds and the Python generated-client check now restore the matching test shape (servo-daemon, rust-test-daemon) with saves disabled instead of maintaining their own multi-GB entries; the python-generated job was silently cold-building the daemon against an evicted "workspace" key every run. This removes three cache shapes (~4GB) and two runner spins per run, bringing the cache working set under the 10GB quota so the hot Servo entries survive between runs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The e2e suite now restores the servo-daemon union entry saved by the merged clippy+test job, so warming a separate e2e-dev-v1 keyed entry would just recreate the quota pressure the rest of this change removes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Node 24.16.0 (which the floating node-version: 24 now resolves to) hangs `npx playwright install` indefinitely in extract-zip with Playwright < 1.60 (microsoft/playwright#41000). Both e2e suites hit the 30-minute job timeout on three consecutive attempts, on main as well as on PRs, with the download starting and never completing. Pin the e2e runner to Node 22 LTS, the documented workaround, until the e2e Playwright dependency moves past 1.60 and the pin can float again. The e2e-build job stays on Node 24: it never runs playwright install, so it is unaffected. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The push_effect closure borrows the registry immutably and takes the names Vec as an explicit parameter, so the mut binding is dead in every feature configuration. Under -D warnings it fails the featureless daemon check that Rust Check / Shared performs through hypercolor-cli's dev-dependency, which currently breaks CI on main. Verified with: cargo check -p hypercolor-daemon --no-default-features --lib under RUSTFLAGS="-D warnings". This rides along in the CI-cache PR so the rebase-merge heals main in the same push that needs it green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
526619c to
e5bcb4c
Compare
What's going on
CI has been paying 20-34 minute Servo builds on most runs, and the culprit isn't Servo itself — the mozjs prebuilt archive path already works. The repo's GitHub Actions cache was pinned at 9.8GB of the hard 10GB quota while the actual working set was ~15GB, so LRU eviction was deleting the multi-GB Servo dependency caches between runs. Whichever jobs lost the eviction lottery rebuilt the ~1100-crate Servo tree from scratch on a 4-core runner.
Receipts from recent runs:
e2e-cpu,rust-test-shared, andrust-test-native— about 4.5GB of pure duplication.rust-check-servoandrust-test-servoshared a cache key per shape, but Swatinem/rust-cache never saves when the primary key already exists. Whichever job saved first won permanently, so Rust Test / Core Servo rebuilt its test artifacts every single main push (~20 min, forever).python-generatedjob restored a long-evictedworkspacekey and silently cold-built the daemon on every run.What this changes
rust-build-cacheaction:save-ifnow defaults toauto: only main saves cache entries; PRs and tag builds restore them (cross-ref restore from the default branch already works). This ends the ref-duplication and stops PR runs from evicting main's good entries. Tag-scoped saves were unreachable by later runs anyway, so nothing of value is lost.cargo-cache-build.shis untouched — the script detects whatever is on PATH, and both wrappers degrade gracefully when sccache is absent.cache-all-cratesdefaults to false and ccache is bounded to 500M, shrinking every entry now that mozjs links from prebuilt archives.CI workflow:
rust-check-sharedjob stays as the fast PR lint gate.servo-daemonfor Servo,rust-test-daemonfor the CPU smoke) with saves disabled, instead of maintaining their own multi-GB entries.python-generatedrestoresrust-test-daemontoo, so its daemon build finally hits a warm cache.Net effect: three fewer cache shapes (~4GB), no more sccache blobs (~1GB+), no more PR-ref duplicates (~4.5GB), which puts the working set under the 10GB quota so the hot Servo entries actually survive between runs. PR critical path should drop from 20-34 minutes to roughly 8-12, and main pushes from ~34 to ~15.
After merging
The existing
servo-core,servo-daemon, andrust-test-nativecache entries predate the merge and already occupy their primary keys, so the merged jobs would restore them and never save the union. Those stale entries (plus the orphanede2e-cpuones) need a one-timegh cache deleteafter this lands; the first main push then pays one cold build per shape and saves the union entries. I'll run that cleanup once this merges.Worth considering separately: GitHub now supports raising the per-repo cache limit in Settings → Actions (billed beyond the included 10GB). A bump to ~15-20GB would buy comfortable headroom on top of this restructuring.
🤖 Generated with Claude Code
Summary by CodeRabbit
Chores
save-ifnow "auto" (save only on default branch) andcache-all-cratesnow false.Style