Skip to content

ci: fix Actions cache quota thrashing behind the slow Servo builds#104

Merged
hyperb1iss merged 5 commits into
mainfrom
nova/ci-cache-quota
Jun 12, 2026
Merged

ci: fix Actions cache quota thrashing behind the slow Servo builds#104
hyperb1iss merged 5 commits into
mainfrom
nova/ci-cache-quota

Conversation

@hyperb1iss

@hyperb1iss hyperb1iss commented Jun 12, 2026

Copy link
Copy Markdown
Owner

What's going on

CI has been paying 20-34 minute Servo builds on most runs, and the culprit isn't Servo itself — the mozjs prebuilt archive path already works. The repo's GitHub Actions cache was pinned at 9.8GB of the hard 10GB quota while the actual working set was ~15GB, so LRU eviction was deleting the multi-GB Servo dependency caches between runs. Whichever jobs lost the eviction lottery rebuilt the ~1100-crate Servo tree from scratch on a 4-core runner.

Receipts from recent runs:

  • E2E Build / Servo: 8.2 min warm vs 33.8 min cold, and it ran cold in most of the last 25 runs. A cold run even re-downloads the mozjs crates, proving the rust-cache entry was gone.
  • sccache's GHA backend had a measured 0.00% Rust hit rate (1116 misses) while its ~1200 tiny blob entries consumed ~1GB of the same quota that was evicting the useful caches.
  • Identical cache keys appeared twice (PR-ref scope vs main-ref scope) for e2e-cpu, rust-test-shared, and rust-test-native — about 4.5GB of pure duplication.
  • rust-check-servo and rust-test-servo shared a cache key per shape, but Swatinem/rust-cache never saves when the primary key already exists. Whichever job saved first won permanently, so Rust Test / Core Servo rebuilt its test artifacts every single main push (~20 min, forever).
  • The python-generated job restored a long-evicted workspace key and silently cold-built the daemon on every run.

What this changes

rust-build-cache action:

  • save-if now defaults to auto: only main saves cache entries; PRs and tag builds restore them (cross-ref restore from the default branch already works). This ends the ref-duplication and stops PR runs from evicting main's good entries. Tag-scoped saves were unreachable by later runs anyway, so nothing of value is lost.
  • The sccache GHA backend is removed entirely. Local development via cargo-cache-build.sh is untouched — the script detects whatever is on PATH, and both wrappers degrade gracefully when sccache is absent.
  • cache-all-crates defaults to false and ccache is bounded to 500M, shrinking every entry now that mozjs links from prebuilt archives.

CI workflow:

  • Clippy folds into the test job for both Servo shapes and the native app suite, so each shape's cache entry holds the union of lint and test artifacts. This fixes the first-save-wins collision and drops two runner spins per run. The standalone rust-check-shared job stays as the fast PR lint gate.
  • E2E builds restore the matching test shape (servo-daemon for Servo, rust-test-daemon for the CPU smoke) with saves disabled, instead of maintaining their own multi-GB entries.
  • python-generated restores rust-test-daemon too, so its daemon build finally hits a warm cache.
  • The Servo cache warm workflow loses its now-redundant e2e leg.

Net effect: three fewer cache shapes (~4GB), no more sccache blobs (~1GB+), no more PR-ref duplicates (~4.5GB), which puts the working set under the 10GB quota so the hot Servo entries actually survive between runs. PR critical path should drop from 20-34 minutes to roughly 8-12, and main pushes from ~34 to ~15.

After merging

The existing servo-core, servo-daemon, and rust-test-native cache entries predate the merge and already occupy their primary keys, so the merged jobs would restore them and never save the union. Those stale entries (plus the orphaned e2e-cpu ones) need a one-time gh cache delete after this lands; the first main push then pays one cold build per shape and saves the union entries. I'll run that cleanup once this merges.

Worth considering separately: GitHub now supports raising the per-repo cache limit in Settings → Actions (billed beyond the included 10GB). A bump to ~15-20GB would buy comfortable headroom on top of this restructuring.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores

    • Changed cache defaults: save-if now "auto" (save only on default branch) and cache-all-crates now false.
    • Made cache save behavior conditional for restores and disabled some workflow cache writes.
    • Adjusted compiler-cache: removed dedicated installer, stopped exporting sccache envs, enabled smaller ccache and disabled incremental builds.
    • Limited cache warming to core/daemon artifacts, unified shared cache keys, and reused caches across jobs.
    • Improved CI linting: added clippy steps, simplified job dependencies, and downgraded Node for Playwright.
  • Style

    • Minor closure refactor in rendering code.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ac434c5c-67e2-4e31-a30b-09bf342c573b

📥 Commits

Reviewing files that changed from the base of the PR and between 526619c and e5bcb4c.

📒 Files selected for processing (4)
  • .github/actions/rust-build-cache/action.yml
  • .github/workflows/ci.yml
  • .github/workflows/servo-cache-warm.yml
  • crates/hypercolor-daemon/src/render_thread/lighting_feed.rs
💤 Files with no reviewable changes (1)
  • .github/workflows/servo-cache-warm.yml
✅ Files skipped from review due to trivial changes (1)
  • crates/hypercolor-daemon/src/render_thread/lighting_feed.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/actions/rust-build-cache/action.yml
  • .github/workflows/ci.yml

📝 Walkthrough

Walkthrough

Updates the rust-build-cache action to prefer ccache and change cache input defaults; makes cache saving conditional (auto on main), integrates clippy into native and Servo test jobs, reuses shared caches for downstream jobs, trims servo cache-warm matrix, and simplifies some job dependencies.

Changes

CI/CD Cache Strategy and Clippy Integration

Layer / File(s) Summary
Compiler cache backend migration
.github/actions/rust-build-cache/action.yml
Removes sccache installation and SCCACHE/RUSTC_WRAPPER env wiring; configures ccache with CCACHE_COMPRESS=true and CCACHE_MAXSIZE=500M.
Smart cache save-if behavior
.github/actions/rust-build-cache/action.yml
Changes inputs.save-if default to "auto" and inputs.cache-all-crates to "false"; Swatinem/rust-cache restore uses an inline conditional to save only when inputs.save-if == 'true' or when inputs.save-if == 'auto' on refs/heads/main.
Servo cache-warm workflow reduction
.github/workflows/servo-cache-warm.yml
Reduces matrix to core-servo and daemon-servo, passes shared-key and target_dir, and removes the e2e-servo warm step.
Clippy integration in native test job
.github/workflows/ci.yml
Formatting tweak in shared clippy block, ensures components: clippy in rust-test toolchain, and adds a clippy step for native matrix that denies warnings.
Clippy integration in servo test job
.github/workflows/ci.yml
Reworks rust-test-servo header/matrix and timeout, ensures clippy component installed, and inserts core/daemon clippy steps running -- -D warnings before Servo tests/doc-tests.
Cache reuse in E2E and daemon test jobs
.github/workflows/ci.yml
Sets CARGO_TARGET_DIR for daemon tests, switches python-generated and e2e-build to reuse shared test caches via shared-key/workspaces, and sets save-if: "false" to avoid writing new caches.
Simplified job dependencies and e2e Node change
.github/workflows/ci.yml, .github/workflows/servo-cache-warm.yml
Removes explicit rust-check-native/rust-check-servo needs from build-native-app/build-release; changes Node for e2e job from 24→22 and adds Playwright workaround comments.
Daemon lighting_feed minor reformat
crates/hypercolor-daemon/src/render_thread/lighting_feed.rs
Reformats a local push_effect closure from multi-line to inline syntax without behavior change.

🎯 4 (Complex) | ⏱️ ~40 minutes

Suggested labels: codex, aardvark

🐰 From sccache to ccache I hop and sing,
Clippy hops in to tighten each wing.
Main branch keeps the cozy cache hoard,
Downstream jobs reuse what we stored.
Simpler builds — a joyful spring!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: restructuring GitHub Actions caching to prevent cache quota thrashing caused by slow Servo builds.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/workflows/ci.yml (2)

483-535: ⚠️ Potential issue | 🟠 Major

Fix same-workflow cache reuse: add proper job dependencies or artifact handoff

In .github/workflows/ci.yml (lines 483-535 and 631-691), python-generated and e2e-build restore Rust caches using shared-key: rust-test-daemon / servo-daemon, but they only needs: [changes, rust-check-shared] and needs: changes respectively—neither waits for the cache-producing jobs rust-test (daemon variant uses shared_key: rust-test-daemon) or rust-test-servo (daemon-servo variant uses shared_key: servo-daemon). Since rust-cache is implemented via Swatinem/rust-cache (which saves/updates cache entries at the end of the producer job lifecycle), these consumers can’t rely on cache entries generated earlier in the same workflow run unless you introduce explicit producer→consumer ordering (via needs) or pass build outputs via artifacts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml around lines 483 - 535, The python-generated and
e2e-build jobs restore caches using shared-key values (rust-test-daemon and
servo-daemon) but do not depend on the producer jobs that actually save those
caches (rust-test and rust-test-servo), so add explicit job ordering or artifact
handoff: update the python-generated job to include needs: [changes,
rust-check-shared, rust-test] (or make it consume an artifact produced by
rust-test), and update e2e-build to include needs: [changes, rust-test-servo]
(or fetch/save artifacts from rust-test-servo), ensuring the rust-cache/Swatinem
producer jobs run before consumers so their shared-key caches are available in
the same workflow run.

245-249: ⚠️ Potential issue | 🟠 Major

Enable cache-workspace-crates for both saving and restoring workspace build outputs

./.github/actions/rust-build-cache/action.yml defaults cache-workspace-crates to "false", and the rust-test / rust-test-servo producer steps at lines 245-249 and 362-366 don’t override it—so the saved caches contain dependencies only (workspace crate artifacts excluded). Since python-generated and e2e-build also restore without cache-workspace-crates: "true", they won’t reuse workspace build outputs even if the producers start saving them.

Suggested fix
      - uses: ./.github/actions/rust-build-cache
        with:
          shared-key: ${{ matrix.shared_key }}
          workspaces: . -> ${{ matrix.target_dir }}
+         cache-workspace-crates: "true"
          cache-on-failure: "false"

Apply the same change in rust-test-servo.

Also add cache-workspace-crates: "true" to the ./.github/actions/rust-build-cache steps in python-generated and e2e-build (their current calls omit it).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml around lines 245 - 249, The rust build cache action
is not preserving workspace crate artifacts because the parameter
cache-workspace-crates defaults to "false"; update the rust-build-cache action
invocations in the rust-test and rust-test-servo producer steps to include
cache-workspace-crates: "true" so workspace crate outputs are saved, and also
add cache-workspace-crates: "true" to the rust-build-cache uses in the
python-generated and e2e-build consumer steps so they restore workspace
artifacts; locate the action usage lines (the uses:
./.github/actions/rust-build-cache blocks) in those job definitions and add the
cache-workspace-crates: "true" key alongside shared-key, workspaces, and
cache-on-failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 483-535: The python-generated and e2e-build jobs restore caches
using shared-key values (rust-test-daemon and servo-daemon) but do not depend on
the producer jobs that actually save those caches (rust-test and
rust-test-servo), so add explicit job ordering or artifact handoff: update the
python-generated job to include needs: [changes, rust-check-shared, rust-test]
(or make it consume an artifact produced by rust-test), and update e2e-build to
include needs: [changes, rust-test-servo] (or fetch/save artifacts from
rust-test-servo), ensuring the rust-cache/Swatinem producer jobs run before
consumers so their shared-key caches are available in the same workflow run.
- Around line 245-249: The rust build cache action is not preserving workspace
crate artifacts because the parameter cache-workspace-crates defaults to
"false"; update the rust-build-cache action invocations in the rust-test and
rust-test-servo producer steps to include cache-workspace-crates: "true" so
workspace crate outputs are saved, and also add cache-workspace-crates: "true"
to the rust-build-cache uses in the python-generated and e2e-build consumer
steps so they restore workspace artifacts; locate the action usage lines (the
uses: ./.github/actions/rust-build-cache blocks) in those job definitions and
add the cache-workspace-crates: "true" key alongside shared-key, workspaces, and
cache-on-failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eda6c6a1-992b-4352-99a7-997c4bda50cb

📥 Commits

Reviewing files that changed from the base of the PR and between 582f656 and 8474220.

📒 Files selected for processing (3)
  • .github/actions/rust-build-cache/action.yml
  • .github/workflows/ci.yml
  • .github/workflows/servo-cache-warm.yml
💤 Files with no reviewable changes (1)
  • .github/workflows/servo-cache-warm.yml

@hyperb1iss

Copy link
Copy Markdown
Owner Author

Validation run is in. Everything this PR changes came back clean:

  • Workflow Lint passed (actionlint on the restructured workflows)
  • Rust Test / Shared, Native App (now with clippy folded in), UI, and both E2E builds: green
  • sccache is confirmed gone from the jobs, and the PR run saved zero cache entries — restore-only, as designed

The two red checks are inherited from main, not from this PR: main's latest run fails Rust Check / Shared (stale #[expect(clippy::unnecessary_wraps)] in display_lane.rs) and Rust Test / Daemon (ProducerFrame not feature-gated, breaking the featureless build) the exact same way. #101 is the fix for that class of breakage — once it lands, a re-run here goes green.

One nice surprise: dropping RUSTC_WRAPPER rotates every rust-cache key, so the stale-primary-key concern in the description resolves itself. The first main push after merge cold-builds once and saves fresh union entries automatically. The old cache entries just become dead weight to clean up (or they expire on their own in 7 days).

This run's jobs were all cold for the same reason — that's the one-time key-rotation cost, not the steady state.

@hyperb1iss hyperb1iss force-pushed the nova/ci-cache-quota branch from 8474220 to a83eb8a Compare June 12, 2026 08:00
@hyperb1iss

Copy link
Copy Markdown
Owner Author

Update on the e2e cancellations: it was never the CDN. Node 24.16.0 (what the floating node-version: 24 resolves to as of today) deadlocks playwright install in extract-zip with Playwright < 1.60 — microsoft/playwright#41000. That's why all three attempts died at exactly the 30-minute job timeout, and why main's 08:06 run got hit too.

Pushed 526619c9: the e2e suite is pinned to Node 22 LTS (the documented workaround) with a comment marking when the pin can float again. The real fix is bumping @playwright/test past 1.60 — dependabot's open e2e bump (#93) is the natural vehicle for that, worth checking it lands ≥ 1.60.

Full board is re-running on the new head; merging when green per Bliss.

hyperb1iss and others added 5 commits June 12, 2026 03:50
The repo's 10GB Actions cache quota was holding a ~15GB working set:
sccache's GHA backend stored ~1200 blob entries (~1GB) with a measured
0% Rust hit rate, and every PR run saved multi-GB rust-cache entries
scoped to its own ref, duplicating main's. LRU eviction then dropped
the Servo dependency caches between runs, forcing 20-34 minute cold
rebuilds of the ~1100-crate Servo tree.

Drop the sccache GHA backend entirely (local sccache via the build
wrapper is unaffected), default save-if to "auto" so only main saves
cache entries while PRs and tags restore them, stop caching unused
registry crates, and bound ccache now that mozjs links from prebuilt
archives.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
rust-check-servo and rust-test-servo shared a cache key per Servo
shape, but Swatinem/rust-cache only saves on a primary-key miss, so
whichever job saved first won permanently and the other rebuilt its
artifacts on every main push (Rust Test / Core Servo paid ~20 minutes
per run). Clippy now runs inside the test job for each Servo shape and
for the native app suite, so one entry per shape carries lint and test
artifacts together.

E2E builds and the Python generated-client check now restore the
matching test shape (servo-daemon, rust-test-daemon) with saves
disabled instead of maintaining their own multi-GB entries; the
python-generated job was silently cold-building the daemon against an
evicted "workspace" key every run.

This removes three cache shapes (~4GB) and two runner spins per run,
bringing the cache working set under the 10GB quota so the hot Servo
entries survive between runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The e2e suite now restores the servo-daemon union entry saved by the
merged clippy+test job, so warming a separate e2e-dev-v1 keyed entry
would just recreate the quota pressure the rest of this change
removes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Node 24.16.0 (which the floating node-version: 24 now resolves to)
hangs `npx playwright install` indefinitely in extract-zip with
Playwright < 1.60 (microsoft/playwright#41000). Both e2e suites hit
the 30-minute job timeout on three consecutive attempts, on main as
well as on PRs, with the download starting and never completing.

Pin the e2e runner to Node 22 LTS, the documented workaround, until
the e2e Playwright dependency moves past 1.60 and the pin can float
again. The e2e-build job stays on Node 24: it never runs playwright
install, so it is unaffected.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The push_effect closure borrows the registry immutably and takes the
names Vec as an explicit parameter, so the mut binding is dead in
every feature configuration. Under -D warnings it fails the
featureless daemon check that Rust Check / Shared performs through
hypercolor-cli's dev-dependency, which currently breaks CI on main.

Verified with: cargo check -p hypercolor-daemon --no-default-features
--lib under RUSTFLAGS="-D warnings".

This rides along in the CI-cache PR so the rebase-merge heals main in
the same push that needs it green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@hyperb1iss hyperb1iss force-pushed the nova/ci-cache-quota branch from 526619c to e5bcb4c Compare June 12, 2026 10:52
@hyperb1iss hyperb1iss merged commit 8bfd0d8 into main Jun 12, 2026
28 checks passed
@hyperb1iss hyperb1iss deleted the nova/ci-cache-quota branch June 12, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant