Skip to content

fix(egress-proxy): enforcement check probes TCP reachability, not HTTP code [#104]#255

Open
saadqbal wants to merge 1 commit into
developfrom
fix/egress-enforcement-probe-tcp
Open

fix(egress-proxy): enforcement check probes TCP reachability, not HTTP code [#104]#255
saadqbal wants to merge 1 commit into
developfrom
fix/egress-enforcement-probe-tcp

Conversation

@saadqbal

@saadqbal saadqbal commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Bugbot follow-up on the v1.7.1 release PR #254 (rolls under the §8.2 egress epic, client-runtime#104). Chart-only.

Bug

The egress-enforcement helm test check (added in #253) treated curl HTTP code 000 as proof that egress is blocked. But a TCP connection that succeeds (egress is open) and then fails TLS verification also returns 000 — typical when probing the default 1.1.1.1 IP without -k (cert name mismatch). So on a non-enforcing CNI the test could falsely pass, reporting the lockdown enforced when training pods can still reach the internet — exactly the false sense of security the check exists to prevent.

Fix

Key the verdict off TCP reachability (curl's exit code), not the HTTP status, and add -k so TLS is explicitly irrelevant:

curl --noproxy '*' -k -sS -m 5 -o /dev/null "https://$HOST"; rc=$?
# 28 (timeout) / 7 (connect failed)  => TCP never established => egress BLOCKED => PASS (exit 0)
# anything else (0, or TLS/HTTP-layer errors 35/52/60 — all require the TCP connect to have succeeded)
#                                     => host REACHED => NOT enforced => FAIL (exit 1)

This is also the safer bias: common CNIs drop denied egress (→ timeout 28); reject-based ones give connect-failed (7). Any outcome where the TCP handshake completed — including a cert error — now correctly counts as "reached / not blocked" and fails the test loudly.

Tests

helm unittest 248/248, helm lint clean. The enforcement-check suite now asserts the verdict keys on the exit code (rc=$?, curl … -k) and guards against regression to the old logic (notMatchRegex on http_code and "000").

Chart stays at 1.7.1 (unreleased) — this corrects 1.7.1 before it ships. After merge to develop I'll re-sync the release branch so PR #254 carries the fix.

🤖 Generated with Claude Code


Note

Low Risk
Chart-only change to a helm test hook script and unit tests; no runtime workload or install/upgrade behavior changes.

Overview
Fixes a false pass in the §8.2 egress-lockdown helm test hook: the probe no longer treats curl HTTP 000 (e.g. TLS cert mismatch on 1.1.1.1) as proof that egress is blocked.

The enforcement Job now treats direct TCP reachability to :443 as the signal—curl --noproxy '*' -k with the verdict from curl’s exit code (7 / 28 = blocked → pass; any outcome that implies the TCP handshake succeeded → fail). Comments and warning text describe TCP/TLS behavior accordingly.

helm unittest coverage is tightened to require -k, rc=$?, and forbid the old http_code / "000" logic. values.yaml comments for enforcementProbeHost are aligned with the TCP probe semantics.

Reviewed by Cursor Bugbot for commit 028e984. Bugbot is set up for automated code reviews on this repo. Configure here.

…P code [#104]

Bugbot (PR #254): the check treated curl HTTP code 000 as proof of a block, but a
TCP connection that succeeds (egress OPEN) then fails TLS verification also yields
000 — so `helm test` could pass on a non-enforcing CNI when probing an IP whose
cert doesn't validate (the default 1.1.1.1 without -k). It conflated "TLS failed"
with "egress blocked", defeating the check.

Key the verdict off TCP reachability via curl's exit code instead of the HTTP
status, and add -k so TLS is explicitly irrelevant: exit 28 (timeout) / 7 (connect
failed) => egress blocked => test PASSES (exit 0); any other outcome (0 success, or
a TLS-/HTTP-layer error such as 35/52/60 — all of which require the TCP connect to
have already succeeded) => egress reached => NOT enforced => test FAILS (exit 1).

Tests assert the verdict keys on the exit code (rc=$?, -k) and guard against a
regression to the old http_code/000 logic. helm-unittest 248/248; lint clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@saadqbal saadqbal self-assigned this Jun 12, 2026
@LukasWodka

Copy link
Copy Markdown
Contributor

👋 Heads-up — Code review queue is at 9 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants