Skip to content

fix(proxy): make all egress workloads proxy-aware (wire the dead HTTP_PROXY_* values)#229

Merged
saadqbal merged 2 commits into
developfrom
fix/proxy-env-all-workloads
Jun 9, 2026
Merged

fix(proxy): make all egress workloads proxy-aware (wire the dead HTTP_PROXY_* values)#229
saadqbal merged 2 commits into
developfrom
fix/proxy-env-all-workloads

Conversation

@LukasWodka

Copy link
Copy Markdown
Contributor

Summary

Behind a corporate proxy, the chart's workload pods are proxy-blind: no pod received HTTP(S)_PROXY/NO_PROXY env, so any pod making a direct external call (jobs-manager → api.tracebloc.io) failed with [Errno 111] Connection refused → CrashLoopBackOff. The env.HTTP_PROXY_HOST/PORT/USERNAME/PASSWORD values were declared in values.yaml + values.schema.json but wired into nothing — a promise the chart never kept. The installer's proxy hardening (scripts/lib/cluster.sh) only covers the k3s node (image pulls), not the application pods.

This makes those values real: a tracebloc.proxyEnv helper derives HTTP_PROXY/HTTPS_PROXY/http_proxy/https_proxy (http://[user:pass@]host[:port]) + an auto-augmented NO_PROXY (always carrying the cluster-internal ranges from cluster.sh, so in-cluster + MySQL traffic never traverses the proxy), referenced on every external-egress workload. Renders nothing when no proxy is set → non-proxy installs are byte-unchanged.

Root-cause + egress analysis: tracebloc/backend#768.

Workloads covered (and excluded)

Workload External call Proxy env
jobs-manager — api + pods-monitor-container api.tracebloc.io
requests-proxy Service Bus / backend
image-refresh CronJob docker.io / ghcr.io
auto-upgrade CronJob helm repo + registries
mysql-client / resource-monitor none / in-cluster excluded (no egress)
ingestor sub-chart (dataset push) jobs-manager.<ns>.svc only excluded (in-cluster)

This makes the per-pod kubectl set env bridge patches used to recover the affected deployment unnecessary going forward.

Type

  • Bug fix (regression — corporate-proxy deployments)

Test plan (helm template, ci/bm-values.yaml)

  • With proxy (--set-string env.HTTP_PROXY_HOST=proxy.example.com --set-string env.HTTP_PROXY_PORT=8080): all 5 egress containers render HTTP_PROXY=http://proxy.example.com:8080; NO_PROXY includes 172.16.0.0/12, 10.0.0.0/8, .svc, .cluster.local; full manifest parses as valid YAML.
  • Without proxy: 0 HTTP_PROXY entries — non-proxy installs unchanged; parses cleanly.
  • Note: env.HTTP_PROXY_PORT is a string in the schema → set with --set-string.

Checklist

  • Targets develop
  • No customer identifiers
  • Reviewer to confirm CI (helm-ci, installer-tests) green

Follow-ups (in backend#768, not this PR)

  • Installer auto-pass: have the installer set env.HTTP_PROXY_HOST/PORT from the proxy it already detects, so a proxied install is zero-config (no manual values).
  • Extend e2e-proxy.sh: assert a workload pod (not just the node) reaches an allowlisted external host through the proxy — the test that would have caught this.
  • A helm-unittest assertion for the proxy env (couldn't add here without confirming the repo's unittest layout — flag for the reviewer).

…_PROXY_* values)

The env.HTTP_PROXY_HOST/PORT/USERNAME/PASSWORD values were declared in
values.yaml + values.schema.json but consumed by no template, and no workload
pod received HTTP(S)_PROXY/NO_PROXY env. Behind a corporate proxy the installer's
node-level proxy (scripts/lib/cluster.sh) handles image pulls, but the application
pods make direct external calls (jobs-manager -> api.tracebloc.io) the network
refuses -> CrashLoopBackOff.

Add a tracebloc.proxyEnv helper that derives HTTP(S)_PROXY + an auto-augmented
NO_PROXY (cluster-internal ranges always included, mirroring cluster.sh, so
in-cluster + MySQL traffic never traverses the proxy) from the env.HTTP_PROXY_*
values, and reference it on every external-egress workload: jobs-manager
(api + pods-monitor), requests-proxy, image-refresh CronJob, auto-upgrade CronJob.
Renders nothing when no proxy is set, so non-proxy installs are unchanged.
Excludes mysql-client and resource-monitor (no external egress) and the ingestor
sub-chart (talks only to jobs-manager.<ns>.svc, in-cluster).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@LukasWodka

Copy link
Copy Markdown
Contributor Author

👋 Heads-up — Code review queue is at 9 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

…red, none when not (backend#768)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@saadqbal saadqbal merged commit 877eb20 into develop Jun 9, 2026
14 checks passed
saadqbal added a commit that referenced this pull request Jun 9, 2026
fix(#229): dedupe NO_PROXY — exclude proxy keys from generic env passthrough
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants