Skip to content

test: cross-repo fresh-shell last-mile E2E harness (install -> CLI -> cluster info -> push)#214

Draft
LukasWodka wants to merge 1 commit into
developfrom
test/install-journey-737-fresh-shell-e2e
Draft

test: cross-repo fresh-shell last-mile E2E harness (install -> CLI -> cluster info -> push)#214
LukasWodka wants to merge 1 commit into
developfrom
test/install-journey-737-fresh-shell-e2e

Conversation

@LukasWodka

Copy link
Copy Markdown
Contributor

Part of #737

Summary

Adds the two-leg install-journey harness that closes the gap which let a PATH-persistence regression ship green: no existing test opens a fresh shell after install and asserts the documented next command works. distro-prereqs and e2e-cluster both assert in the same shell that ran the installer, so a binary that lands somewhere a new terminal can't see is invisible to them. This harness reproduces "customer opens a new terminal and types the next command."

Related

  • Part of tracebloc/backend#737 (keystone last-mile E2E)
  • Guards the PATH-persistence class fixed on cli's fix/install-path-persist branch
  • Leg 2 reproduces the namespace/context incident addressed by client#208 (installer sets the kube context — already merged)
  • A thin cli-side caller (separate PR) points Leg 1 at a cli PR's install.sh for pre-merge cross-repo coverage

What's in here

Leg 1 — scripts/tests/path-persist.sh (cheap, wide, no cluster, no creds)
Runs in a plain distro container. Installs the tracebloc CLI via cli's install.sh (ref configurable via TRACEBLOC_CLI_REF — URL or local path), then for each shell present among bash / zsh / fish spawns a fresh login AND non-login shell and asserts command -v tracebloc resolves and tracebloc version runs. A fresh non-login bash reads ~/.bashrc, not ~/.profile / ~/.bash_profile — so this is the cell that goes red on the pre-fix installer and green on the fixed one. Prints a PASS/FAIL line per shell x mode cell.

Leg 2 — scripts/tests/e2e-journey.sh (full journey on a real cluster, amd64, no secrets)
Extends the e2e-cluster.sh pattern (sources common.sh / setup-linux.sh / cluster.sh, isolated CLUSTER_NAME, TRACEBLOC_NO_AUTOSTART=1):

  1. create_cluster() + wait nodes Ready.
  2. Install the CLI via install.sh.
  3. Apply a credential-free stub matching the CLI's real discovery contract (internal/cluster/discover.go): a *-jobs-manager Deployment carrying app.kubernetes.io/name=client + app.kubernetes.io/managed-by=Helm + instance/version/chart labels (image registry.k8s.io/pause:3.9), plus an ingestor ServiceAccount (so the cluster info TokenRequest path doesn't exit 5). Point the kubeconfig context's namespace at it and assert tracebloc cluster info (a) succeeds and (b) succeeds from a fresh login + non-login shell.
  4. tracebloc dataset push --dry-run smoke on a tiny sample CSV (offline-validatable; no creds).
  5. Teardown via EXIT trap.

Every long step runs under a watchdog timeout so a hang FAILS (exit 124 -> hard error) instead of spinning to the GitHub ceiling.

Note on the stub labels: the issue's shorthand says app: manager; the CLI actually discovers on app.kubernetes.io/name=client,app.kubernetes.io/managed-by=Helm + a *-jobs-manager Deployment name. The manifest uses the real selector so the assertion exercises the actual code path, and keeps app: manager as a cosmetic extra label. Documented inline.

Note on context-on-default: client#208 (installer sets the kube context) is merged, so the supported state is context-namespace == workspace namespace — that's the core assertion. The opposite case (context left on default, CLI auto-discovers across namespaces) needs a CLI change that is not merged yet, so it runs as a non-fatal pending probe, flippable to a hard assertion via TB_EXPECT_NS_AUTODISCOVER=1 once that lands.

CI (.github/workflows/installer-tests.yaml)

  • path-persistdistro matrix (ubuntu:22.04 / 24.04, debian:12, fedora:latest, almalinux:9, opensuse/leap:15.6, alpine:3), fail-fast: false, one fresh container per distro; the script iterates shell x mode inside. Runs on the same scripts/** paths as the rest of the file.
  • e2e-journey — amd64, gated to nightly schedule + workflow_dispatch + the e2e PR label (mirrors cli's e2e.yml label gating), to control cluster cost.
  • Both new scripts added to the static job's shellcheck list (error + advisory passes).

Type of change

  • Tech-debt / refactor (test infrastructure)
  • Feature
  • Bug fix
  • Docs
  • Security / hardening
  • Breaking change

Test plan — Verified locally / Needs CI

Verified locally (green):

  • bash -n on path-persist.sh and e2e-journey.sh — both parse.
  • shellcheck --severity=error (the CI gate) on both new scripts — no findings; --severity=warning advisory pass — also clean.
  • actionlint on installer-tests.yamlclean (also runs shellcheck on the inline run blocks).
  • yaml.safe_load on the workflow — parses.
  • Isolated sanity-check of the fresh-shell assertion mechanism (capture command -v output, check non-empty + version exit code): positive case resolves a real on-PATH binary, negative control (PATH stripped) correctly yields empty -> the guard would fail. Confirms the assert logic.

Needs CI (requires GitHub runners / Docker — NOT run locally, not claimed to pass):

  • The full path-persist distro x shell x mode matrix (needs per-distro containers + zsh/fish install).
  • e2e-journey end to end (needs a real k3d cluster on a Linux runner with Docker, plus a reachable cli install.sh).
  • The "red on the pre-fix installer / green on the fixed one" proof — needs the matrix to actually run against both installer refs.

The default TRACEBLOC_CLI_REF currently points at cli's fix/install-path-persist raw install.sh (with an inline TODO(cli#61) to switch to releases/latest/download/install.sh once the fix ships in a public release — otherwise the guard would test the old installer from the latest release and report a false red).

Checklist

  • Tests added (this PR is the tests)
  • No secrets / credentials in the diff (stub is credential-free)
  • Customer identifiers scrubbed — generic phrasing only in added lines
  • Docs updated if behavior or config changed (N/A — test-only)

@LukasWodka

Copy link
Copy Markdown
Contributor Author

👋 Heads-up — Code review queue is at 17 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

Adds the two-leg install-journey harness that closes the gap which let a
PATH-persistence regression ship green: no existing test opens a fresh
shell after install and asserts the documented next command works.

Leg 1 — scripts/tests/path-persist.sh: runs in a plain distro container,
installs the tracebloc CLI via cli/install.sh (configurable ref via
TRACEBLOC_CLI_REF), then for each shell among bash/zsh/fish spawns a fresh
login AND non-login shell and asserts `command -v tracebloc` resolves and
`tracebloc version` runs. A fresh non-login bash reads ~/.bashrc (not
~/.profile), so this catches the whole PATH-persistence class — red on the
pre-fix installer, green on the fixed one.

Leg 2 — scripts/tests/e2e-journey.sh: extends the e2e-cluster pattern.
Brings the cluster up via create_cluster(), installs the CLI, applies a
credential-free stub matching the CLI's real discovery contract (a
*-jobs-manager Deployment with the chart's hallmark labels + an `ingestor`
ServiceAccount), points the kubeconfig context's namespace at it, and
asserts `tracebloc cluster info` succeeds AND resolves from a fresh shell,
then `dataset push --dry-run` on a tiny sample CSV. Long steps run under a
watchdog timeout so a hang fails instead of spinning. The context-on-default
namespace auto-discover sub-assertion is gated pending the CLI change.

CI: new path-persist job (distro matrix, like distro-prereqs, fail-fast
false) and e2e-journey job (amd64, nightly + `e2e` label only, mirroring
cli's e2e.yml gating). Both new scripts added to the static shellcheck list.

Part of #737.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@LukasWodka LukasWodka force-pushed the test/install-journey-737-fresh-shell-e2e branch from 7274175 to 3fa1a8f Compare June 8, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants