Skip to content

Harden app-store broker + supervisor (grants, backoff, rotation, RLIMIT_AS, TOCTOU, extend DoS)#19

Merged
TeoSlayer merged 9 commits into
mainfrom
feat/broker-grant-and-supervisor-hardening
Jun 15, 2026
Merged

Harden app-store broker + supervisor (grants, backoff, rotation, RLIMIT_AS, TOCTOU, extend DoS)#19
TeoSlayer merged 9 commits into
mainfrom
feat/broker-grant-and-supervisor-hardening

Conversation

@TeoSlayer

Copy link
Copy Markdown
Contributor

Summary

Hardens the app-store broker and supervisor against several abuse/DoS
vectors and closes a TOCTOU gap, plus tightens the dynamic-extension
surface. Six self-contained changes, each with tests.

Changes

  • Broker authorization (deny-by-default). supervisor.callFrom
    enforces, before any socket dial: target installed → method present in
    the target's exposes (every caller) → for cross-app ipc.call, the
    caller's manifest holds a matching ipc.call grant. New
    Service.CallFrom(callerID, …) is the caller-aware entry point; Call
    stays the trusted daemon/pilotctl path. New manifest helpers
    ExposesMethod / HasGrant (exact, prefix.*, and * target forms).
    Errors: ErrMethodNotExposed, ErrGrantMissing.

  • Exponential verify-fail backoff. The fixed 30s verify-fail retry is
    now a capped exponential ramp (1s→2s→…→30s) that resets on success,
    consistent with crash-loop restart handling. Shared nextBackoff
    helper, unit-tested for growth + saturation.

  • Multi-generation audit log rotation. Single-step supervisor.log → .1 becomes an N-generation shift (.1..N, oldest discarded). New
    Config.AuditLogMaxBackups (default 3).

  • Address-space cap (Linux). applyChildResourceLimits now sets
    RLIMIT_AS alongside RLIMIT_NOFILE for spawned apps, configurable via
    Config.ChildMemoryLimitBytes (default 4 GiB — generous for
    Go/Node/Python virtual reservations, still a runaway guard). No-op on
    non-Linux (documented). Tests read limits back via prlimit64.

  • Spawn-time TOCTOU re-verify. verifyAtSpawn re-checks (symlink
    rejection + sha256) immediately before execve, closing the gap
    between the one-time scan and launch.

  • Extension DoS guards. pkg/extend gains a per-app token-bucket
    rate limiter on hook dispatch (Registry.SetRateLimit, off by default,
    ErrRateLimited) and a cap of 32 dynamic registrations per app in
    DaemonHandler.Register (ErrTooManyRegistrations).

Testing

  • go build ./..., go vet ./... clean.
  • go test -race -parallel 4 -count=1 ./... — all packages green.
  • New tests: manifest grant matching, broker gate branches, backoff
    growth, N-generation rotation, RLIMIT_AS read-back, TOCTOU symlink +
    content swaps, rate-limit refill + per-app isolation + registration cap.

Notes

  • The broker exposes gate now applies to all callers (it is the app's
    IPC surface); existing call-path tests were updated to declare the
    methods they invoke.
  • Pre-existing gofmt drift in untouched files is left as-is to keep the
    diff focused.

teovl added 7 commits June 15, 2026 14:20
Add a deny-by-default authorization gate to the app-store broker:

- manifest.ExposesMethod / HasGrant helpers (cap+target matching with
  exact, prefix-wildcard, and blanket forms).
- supervisor.callFrom enforces, before any dial: target installed,
  method in exposes (all callers), and a matching ipc.call grant for
  cross-app callers. New typed errors ErrMethodNotExposed/ErrGrantMissing.
- Service.CallFrom exposes the caller-aware entry point; Call stays the
  trusted daemon/pilotctl path.

Update existing call-path tests to declare exposed methods.
Replace the fixed 30s verify-fail retry sleep with a capped exponential
ramp (1s→2s→…→30s) that resets on a successful verify, consistent with
crash-loop restart handling. Extract the shared nextBackoff helper and
unit-test that it grows and saturates at the cap.
Replace single-step supervisor.log->.1 rotation with an N-generation
shift (.1..N), discarding the oldest. Add Config.AuditLogMaxBackups
(default 3); worst-case footprint is (backups+1) x AuditLogMaxBytes per
app. Cover generation count and shift semantics with tests.
Extend applyChildResourceLimits to set RLIMIT_AS alongside RLIMIT_NOFILE
for spawned children, configurable via Config.ChildMemoryLimitBytes
(default 4 GiB — generous enough for Go/Node/Python virtual reservations,
still a runaway guard). No-op on non-Linux, documented. Tests read the
limits back via prlimit64 to confirm enforcement.
Add verifyAtSpawn, run immediately before execve in spawn(): Lstat
symlink rejection plus a sha256 re-check against the pinned hash. Closes
the time-of-check/time-of-use gap between scanInstalled (runs once at
discovery) and the actual launch — a binary swapped for a symlink or
different bytes after the scan is now refused. Tests cover both swap
cases and the valid-binary pass.
Add two DoS guards to pkg/extend:

- Per-app token-bucket rate limiter on hook dispatch (Registry.Run),
  enabled via SetRateLimit; aborts with ErrRateLimited when an app's
  budget is spent. Off by default (back-compat); injectable clock for
  deterministic tests.
- Cap of maxDynamicRegistrationsPerApp (32) on runtime hook
  registrations per app in DaemonHandler.Register, via Registry.CountForApp;
  refuses growth with ErrTooManyRegistrations.
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.93289% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
plugin/appstore/supervisor.go 83.05% 7 Missing and 3 partials ⚠️
plugin/appstore/rlimit_linux.go 82.35% 2 Missing and 1 partial ⚠️
pkg/extend/ratelimit.go 91.30% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

teovl added 2 commits June 15, 2026 15:04
Close the codecov patch gap: Service.CallFrom (started + not-started),
SetRateLimit disable branch, and rotateGenerations keep<1 floor.
@TeoSlayer TeoSlayer merged commit cdba2f7 into main Jun 15, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants