Cf gvisor deferred start by vlast3k · Pull Request #504 · cloudfoundry/guardian

vlast3k · 2026-05-27T19:18:38Z

Read the Contributing document.

Summary

Backward Compatibility

Breaking Change? Yes/No

gVisor network=sandbox requires the netns to be configured BEFORE task.Start() because setupNetwork() reads the netns at start time and mirrors it into netstack. Previously Start was called before silk-cni configured networking, leaving gVisor with empty netstack. Split the lifecycle: Create only boots the sandbox (runsc create), then Start is called after Networker.Network() configures the netns.

Wire a --containerd-runtime-type flag through to the containerd client and Nerd layer. When set to a non-runc runtime (e.g. io.containerd.runsc.v1), Create skips runc-specific task options (WithNoNewKeyring, WithUIDAndGID) that runsc rejects, and nils spec.Windows which gVisor does not support. Also adds a no-op Start() to RunRunc for OCIRuntime interface compliance after the deferred-start split.

For gVisor containers, the sentry process PID on the host does not expose the container filesystem via /proc/<pid>/root (the sentry runs in its own mount namespace). Look up users against bundle.Spec.Root.Path (the host-side rootfs) which is always accessible regardless of runtime type.

Non-runc runtimes (e.g. runsc/gVisor) do not write state.json to the runc root directory. On cgroups v2 the memory.use_hierarchy knob does not exist either. Treat a missing state.json as a no-op instead of returning an error that would block container creation.

nstar uses nsenter to enter the container mount namespace, which does not work with gVisor because the sentry PID on the host has a different mount namespace that does not expose the container filesystem. For non-runc runtimes, use runtime.Exec to run tar inside the container through the containerd -> shim -> runsc exec path. This goes through the sentry and keeps its dentry cache consistent with the filesystem state. StreamIn: exec /bin/tar -xf - -C <path> with TarStream on stdin. StreamOut: exec /bin/tar -cf - -C <source> <path> with stdout piped back. Detection is based on the configured runtimeType (from the --containerd-runtime-type flag) rather than runtime probing.

The exec-based StreamIn runs tar -xf inside the container, but the target directory (e.g. /tmp/app) may not exist in a freshly created container. The nstar-based approach creates it implicitly. Use mkdir -p before tar to ensure the path exists.

…r compat)

… errors, add constants - Replace shell-based streamInViaExec (vulnerable to command injection via spec.Path) with two direct exec calls: /bin/mkdir then /bin/tar - Capture stderr in streamOutViaExec and log tar failures instead of silently discarding errors - Extract hardcoded "io.containerd.runc.v2" into defaultRuncRuntime constant - Add debug log to RunRunc.Start no-op for observability

…ting - Factor exec→wait→check-exit into execAndWait helper (eliminates repetition) - Set user once at top of each streaming function (was duplicated 3x) - Add stderr capture to streamInViaExec (symmetric with streamOut) - Replace magic constant with isNonRuncRuntime() method - Remove noisy debug log from RunRunc.Start no-op - Add one-line comments for spec.Windows and taskOpts skipping in nerd.go

Move exec-based and nstar-based streaming into separate implementations of a new Streamer interface. Containerizer delegates to the injected streamer without knowing which runtime is in use. - NstarStreamer: existing nstar/nsenter approach for runc containers - ExecStreamer: exec-based tar for runtimes where /proc/pid/ns/mnt is not accessible (e.g. gVisor) Selection happens once at construction time based on --containerd-runtime-type. Containerizer no longer carries runtimeType or any streaming logic.

Three patches for full Docker container support on gVisor: 1. execPeas (containerizer.go): When gVisor runtime detected, convert pea requests to exec inside existing container. Peas create separate sandboxes that can't share gVisor's per-sandbox netstack. The healthcheck/envoy binaries are already bind-mounted, so exec works directly. 2. SkipUserNamespace (pea_creator.go): Defense-in-depth — skip userns join for gVisor since setns(CLONE_NEWUSER) is rejected with EINVAL. 3. setupLoopback (external_networker.go): After silk CNI 'up', bring up the loopback interface in the container's network namespace. Silk only creates a veth pair; gVisor's netstack scrapes the netns and needs lo present to support 127.0.0.1 binding (required by envoy). Validated end-to-end on lod-aws-0515: - Docker python:3.12-slim with port healthcheck: PASS - Envoy container proxy (mTLS): PASS - HTTP routing through gorouter: PASS

vlast3k added 13 commits May 27, 2026 10:18

Regenerate fakes for Start() interface addition

4afd8d9

Fix containerizer_test: pass runtimeType parameter to New()

6468411

Fix runcontainerd tests for bundle-rootfs user lookup change

d974506

Update cgroup_manager_test: missing state.json now returns nil (gViso…

11ad841

…r compat)

cf-foundation-community-automation Bot added this to Application Runtime Platform Working Group May 27, 2026

cf-foundation-community-automation Bot moved this to Inbox in Application Runtime Platform Working Group May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cf gvisor deferred start#504

Cf gvisor deferred start#504
vlast3k wants to merge 14 commits into
cloudfoundry:mainfrom
vlast3k:cf-gvisor-deferred-start

vlast3k commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vlast3k commented May 27, 2026

Summary

Backward Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant