feat(egress-proxy): deploy-time egress-enforcement pre-flight (non-blocking) [client-runtime#104]#253
Merged
Merged
Conversation
…ocking) [#104] When the §8.2 lockdown is enabled (networkPolicy.training.allowExternalHttps=false) but the cluster's CNI doesn't actually enforce egress NetworkPolicy, the lockdown is a silent no-op (false sense of security — hit on EKS VPC-CNI during the #102 dev validation). This adds a post-install/post-upgrade Helm hook that, in that case, runs a tracebloc.io/workload=training-labelled probe which curls a canary host DIRECTLY: reachable => the CNI isn't enforcing => logs a loud WARNING. Always exits 0 (non-blocking), so it never fails an upgrade or the hourly auto-upgrade. - gated on enabled && !allowExternalHttps && enforcementProbeHost -> ships DORMANT (does nothing until a fleet flips the lockdown). - new values networkPolicy.training.enforcementProbeHost (default 1.1.1.1; "" disables, e.g. air-gapped); registered in values.schema.json. - curl probe image digest-pinned multi-arch, PSA-restricted, runAsUser 100. - helm-unittest egress_enforcement_check_test.yaml (gating both ways + shape); 248/248. - Chart 1.7.0 -> 1.7.1 (version + appVersion lockstep). Chart-only; no backend / client-runtime changes (enforcement treated as a per-fleet prerequisite, so no central audit needed). Refs tracebloc/client-runtime#104, #102. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 824adac. Configure here.
…blocking helm test [#104] - Gate default for enforcementProbeHost "" -> "1.1.1.1" (matches values.yaml + $host), so `helm upgrade --reuse-values` from a release predating the key still runs the check instead of silently skipping it. Explicit "" still disables it. - Convert the post-install/post-upgrade hook to a `helm test` hook. A post-upgrade hook fails the release if the probe pod can't run (image pull / PSA / OOM) regardless of the script's exit 0 — which would block the hourly auto-upgrade (not truly non-blocking). As a test hook it never runs during install/upgrade, so it can't block them, and it FAILS (exit 1) on non-enforcement — run `helm test <release>` after flipping the lockdown for a clear pass/fail. helm-unittest 248/248; lint clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
shujaatTracebloc
approved these changes
Jun 12, 2026
aptracebloc
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Closes part of tracebloc/client-runtime#104. Chart-only.
Why
The §8.2 lockdown (
networkPolicy.training.allowExternalHttps=false) only blocks egress on a CNI that enforces egress NetworkPolicy. On a non-enforcing CNI (EKS VPC-CNI withoutenableNetworkPolicy— hit live during the #102 dev validation) the flip is a silent no-op: the policy renders, training pods still reach the internet, the operator believes they're protected.What
A
helm testhook (templates/egress-enforcement-check.yaml, annotatedhelm.sh/hook: test) that renders only when the lockdown is enabled (networkPolicy.training.enabled && !allowExternalHttps && enforcementProbeHost):tracebloc.io/workload: training-labelled probe pod (so the real training-egress NetworkPolicy governs it) curls a canary host directly (--noproxy '*', 5s timeout).OK egress lockdown verified, test passes (exit 0). Reachable → not enforcing → loudWARNING EGRESS LOCKDOWN NOT ENFORCED …and the test fails (exit 1).helm test <release>after flipping the lockdown for a clear pass/fail.Scope / design
allowExternalHttps=false.networkPolicy.training.enforcementProbeHost(default1.1.1.1;""disables — e.g. air-gapped) + schema. Gatedig-defaults the host to1.1.1.1, so a--reuse-valuesupgrade from a release predating the key still renders the check.runAsUser: 100(curlimages/curl's non-numeric user needs an explicit uid — learned during the docs: fix README Deploy section (Helm not docker), surface in-repo docs #102 EKS probe).1.7.0 → 1.7.1(version + appVersion lockstep).Tests
helm lint+helm unittest248/248 — newegress_enforcement_check_test.yamlasserts the hook renders only when the lockdown is on (+ host set), is absent otherwise (default / training disabled / empty host), carrieshelm.sh/hook: test, and that the probe pod is training-labelled, PSA-restricted, digest-pinned, exits 1 on non-enforcement, and curls the configured host with--noproxy.e2e (manual, per-fleet)
After enabling
allowExternalHttps=false, runhelm test <release>: on a non-enforcing CNI the probe reaches the host and the test FAILS with theWARNINGblock; on an enforcing CNI egress is blocked and the test PASSES (OK).kubectl logs job/<release>-egress-enforcement-check.🤖 Generated with Claude Code
Note
Low Risk
Chart-only, dormant by default (
allowExternalHttpsstays true); the probe runs only on explicithelm testafter lockdown is enabled.Overview
Adds an optional
helm testJob that verifies the §8.2 training egress lockdown is actually enforced by the CNI whennetworkPolicy.training.allowExternalHttpsis false.The new template renders only when training NetworkPolicy is enabled, lockdown is on, and
enforcementProbeHostis non-empty (default1.1.1.1;""disables for air-gapped). Atracebloc.io/workload: trainingprobe curls the host with--noproxy; blocked egress passes the test, reachable HTTPS fails with loud warnings. The Job is annotatedhelm.sh/hook: test, so it does not run on install/upgrade—operators runhelm testafter flipping the lockdown.Also documents
networkPolicy.training.enforcementProbeHostinvalues.yaml/values.schema.json, addsegress_enforcement_check_test.yaml(render guards, PSA-hardened probe, digest-pinned curl), and bumps the chart to 1.7.1.Reviewed by Cursor Bugbot for commit 00104c2. Bugbot is set up for automated code reviews on this repo. Configure here.