Skip to content

feat: SeiNode validator.operatorKeyring CRD surface (closes #219)#220

Open
bdchatham wants to merge 3 commits into
mainfrom
feat/operator-keyring-crd
Open

feat: SeiNode validator.operatorKeyring CRD surface (closes #219)#220
bdchatham wants to merge 3 commits into
mainfrom
feat/operator-keyring-crd

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

@bdchatham bdchatham commented May 12, 2026

Summary

  • Implements Phase 1 step 1 of the in-pod governance signing workstream (design: docs(design): in-pod governance transaction signing for Sei validators seictl#166 / docs/design/in-pod-governance-signing.md — Component A).
  • Adds validator.operatorKeyring to SeiNode — discriminated-union surface (Secret variant; future TMKMS/Vault/KMS reserved as comments and an additive CEL exactly-one rule) declaring the operator-account keyring used by the seictl sidecar for governance, MsgEditValidator, withdraw-rewards, and other operator-account transactions.
  • Trust boundary held at admission, build-time, AND runtime: four CEL invariants on ValidatorSpec; pod-spec mount + env wired into the sidecar container only (never seid main, never init, never bootstrap); bootstrap-pod isolation guard generalized to walk both Secret volumes AND valueFrom.secretKeyRef env entries.
  • Sidecar container security context tightened in the same workstream (the keyring mount is precisely the reason): RunAsNonRoot, RunAsUser/Group 65532, ReadOnlyRootFilesystem, AllowPrivilegeEscalation: false, Capabilities.Drop: [ALL], SeccompProfile: RuntimeDefault, pod-level FSGroup: 65532 (required for the non-root UID to read the 0o400 Secret mount).
  • Pre-flight validate-operator-keyring task mirrors validate-signing-key.go. Deliberately does NOT decrypt the keyring — that's the sidecar's startup smoke test (issue feat(controller): wire Spec.Genesis.Accounts via sidecar typed wrappers #162 in seictl), not the controller's TCB.

CRD invariants

Four CEL rules on ValidatorSpec enforce the trust boundary at admission:

  1. operatorKeyring.secret.secretName != signingKey.secret.secretName (data-keyring ≠ consensus-key Secret)
  2. operatorKeyring.secret.secretName != nodeKey.secret.secretName (data-keyring ≠ P2P-key Secret)
  3. operatorKeyring.secret.secretName != passphraseSecretRef.secretName (data-keyring ≠ passphrase Secret)
  4. secretName and passphraseSecretRef.secretName immutable via self == oldSelf

keyName is intentionally NOT immutable (allows key rotation within a Secret).

Discretionary decisions flagged for review

These weren't fully prescribed by the design — reviewers should sanity-check:

  1. Error wording uses Cosmos SDK terminology (e.g., "file-keyring requires at least one encrypted key blob"). Functional, not load-bearing.
  2. defaultStr helper inlined, not extracted. Design referenced a helper that doesn't exist in this repo; YAGNI says don't add a 4-line helper for two call sites. Defaults: KeyName→\"node_admin\", passphraseRef.Key→\"passphrase\" (matches the kubebuilder defaults).
  3. ptr.To(true) / ptr.To(int64(65532)) workaround: modernize/newexpr linter false-positively suggests new(true) (invalid Go). Assigned to local vars before address-taking. Functionally identical; less direct.
  4. Extra test for env-var leakage in init container (PassphraseEnvInInitContainer_Rejects) — not in the design but the bootstrap pod's sidecar runs as InitContainer, so this is the realistic env-leakage path. Cheap belt-and-suspenders.

Test plan

Verification on the implementer side:

  • make manifests generate succeeds and produces the expected new types/CRD fields
  • go build ./... clean
  • go test ./... all packages pass
  • golangci-lint run --new-from-rev=origin/main ./... reports 0 issues

For human reviewers:

  • Cross-check that the new mount + env appear on the sidecar container only (buildSidecarContainer) — NOT on buildNodeMainContainer, NOT on the init container, NOT on the bootstrap pod
  • Verify the four CEL invariants on ValidatorSpec via kubectl apply --dry-run=server on edge cases (collision Secret names, missing fields, immutability-on-update)
  • Confirm validate-operator-keyring does NOT attempt to decrypt the keyring (controller TCB stays small)
  • Spot-check assertNoValidatorSecretsOnBootstrapPod rejects each of the three forbidden Secret types and accepts NodeKey

Cross-review

This PR is being reviewed in parallel by platform-engineer, kubernetes-specialist, and security-specialist before being marked ready for merge. Their findings will be posted as PR review comments.

🤖 Generated with Claude Code


Note

Medium Risk
Introduces new validator secret handling (operator keyring + passphrase) and tightens pod security settings; mistakes could leak credentials or break validator deployments. Changes are scoped and include admission/planner/runtime guards plus tests, reducing but not eliminating risk.

Overview
Adds a new optional validator.operatorKeyring field to the SeiNode/SeiNodeDeployment CRDs (Secret-backed keyring + separate passphrase Secret), including defaults and CEL rules enforcing secret distinctness/immutability.

Wires the operator keyring into generated pod specs sidecar-only (volume mount under keyring-file/ plus SEI_KEYRING_PASSPHRASE env), adds sidecar hardening (RunAsNonRoot, read-only rootfs, dropped caps, runtime seccomp) and pod-level fsGroup//tmp emptyDir, and adds a fail-closed guard to prevent accidental keyring mounts on seid/non-sidecar init containers.

Extends planning/execution with a new validate-operator-keyring task and condition (OperatorKeyringReady) to preflight Secret shape, inserts it into bootstrap/running plans, and strengthens bootstrap-job isolation checks to forbid validator-owned secrets via both volume mounts and env references.

Reviewed by Cursor Bugbot for commit 1792267. Bugbot is set up for automated code reviews on this repo. Configure here.

Implements Phase 1 step 1 of the in-pod governance signing workstream
(design: sei-protocol/seictl#166).

Adds validator.operatorKeyring to the SeiNode CRD — a discriminated
union (Secret variant; future TMKMS/Vault/KMS reserved) declaring the
operator-account keyring used by the sidecar for governance,
MsgEditValidator, withdraw-rewards, and other operator-account txs.

The keyring data Secret (mounted as a volume at $SEI_HOME/keyring-file/)
and a separate passphrase Secret (projected as an env var) are mounted
exclusively on the sidecar container — never on the seid main container,
init container, or bootstrap pod. Four CEL invariants on ValidatorSpec
enforce the trust boundary at admission:
  - operatorKeyring.secret.secretName != signingKey.secret.secretName
  - operatorKeyring.secret.secretName != nodeKey.secret.secretName
  - operatorKeyring.secret.secretName != passphraseSecretRef.secretName
  - secretName / passphraseSecretRef.secretName immutable post-create

Pre-flight task validate-operator-keyring mirrors validate-signing-key:
checks both Secrets exist + non-empty, validates the file-keyring's
*.info and *.address index files, and confirms the named key entry is
present if KeyName is specified. Does NOT decrypt — that sits with the
sidecar's startup smoke test, not the controller's TCB.

Sidecar container security context tightened in the same workstream:
RunAsNonRoot, RunAsUser/Group 65532, ReadOnlyRootFilesystem,
AllowPrivilegeEscalation false, Capabilities.Drop ALL, SeccompProfile
RuntimeDefault, pod-level FSGroup 65532 (required for the non-root UID
to read the 0o400 Secret mount).

Bootstrap-pod isolation guard generalized — assertNoSigningKeyOnBootstrapPod
becomes assertNoValidatorSecretsOnBootstrapPod, walking volumes for any
of {signing-key, operator-keyring, passphrase} Secrets and env vars for
the passphrase Secret. NodeKey deliberately excluded (no signing authority).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6e4b468. Configure here.

Comment thread internal/noderesource/noderesource.go
Comment thread internal/noderesource/noderesource.go
Synthesizes platform-engineer / kubernetes-specialist / security-specialist
cross-review of #220. Addresses 9 review items; 1 deferred with explicit
documentation.

CEL invariants tightened (kubernetes-specialist finding):
  Adds two ValidatorSpec rules forbidding the passphrase Secret from
  colliding with signingKey.secret.secretName or nodeKey.secret.secretName.
  Closes a trust-boundary gap where a malicious config could project a
  Secret used by seid main into the sidecar's env-resolution path.

Planner-side defense-in-depth (security-specialist finding):
  validatorPlanner.Validate now mirrors all five operator-keyring
  distinctness rules. CEL alone requires K8s 1.25+ with the feature
  flag; mis-configured webhooks or --validate=false bypass CEL.

Production pod-spec runtime guard (security-specialist finding):
  GenerateStatefulSet asserts operatorKeyringVolumeName never appears
  on the seid main container or init containers. CI catches regressions
  the way assertNoValidatorSecretsOnBootstrapPod does for bootstrap pods.
  Signature changes to (*appsv1.StatefulSet, error); 42 call sites updated.

Bootstrap-pod guard extended (security-specialist finding):
  assertNoValidatorSecretsOnBootstrapPod now walks c.EnvFrom[].SecretRef
  in addition to c.Env[].ValueFrom.SecretKeyRef. Future-proofs against
  envFrom-shaped leakage.

Default constants exported (platform-engineer finding):
  api/v1alpha1.DefaultOperatorKeyName and DefaultPassphraseSecretKey
  replace magic strings in planner + noderesource. Kubebuilder defaults
  retain literal strings (markers can't reference Go consts) but carry
  comments tying them to the constants.

Operator-facing error wording reworded (platform-engineer finding):
  Conditions point at Secret data-key shape (*.info, *.address) instead
  of internal Cosmos SDK backend terminology.

Explicit AutomountServiceAccountToken (security-specialist finding):
  Pod spec now sets ptr.To(true) explicitly with #165 reference comment.
  Cluster-default flips that disable SA tokens would otherwise silently
  break the future TokenReview path.

ptr.To linter situation cleaned (platform-engineer finding):
  Restored ptr.To(true)/ptr.To(false) with //nolint:modernize and a
  reason string. CLAUDE.md "fix lint, don't suppress" applies to real
  issues; new(true) is invalid Go, so the modernize suggestion is a
  false-positive and explicit nolint is the idiomatic answer.

shareProcessNamespace = true documented as deferred (security blocker
   downgraded to deferred follow-up per orchestrator decision):
  A compromised seid container can today read /proc/<sidecar>/environ
  and /proc/<sidecar>/mem, exposing the passphrase and unlocked keyring.
  Not a one-way door — the broader sidecar/seid hardening (drop
  shareProcessNamespace, harden seid main SecurityContext, separate
  sidecar SA) is tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham
Copy link
Copy Markdown
Collaborator Author

Cross-review complete — findings integrated

3-specialist review (platform-engineer, kubernetes-specialist, security-specialist).

Verdicts: platform-engineer LGTM-with-fixes • kubernetes-specialist LGTM-with-fixes • security-specialist Block-pending-changes (one item).

Addressed in commit 6622404

# Source Item Fix
1 k8s-specialist Trust-boundary CEL gap — passphrase Secret could collide with signing/node-key Secret Two new XValidation rules on ValidatorSpec
2 security Planner-side CEL mirror — CEL alone bypassable on K8s <1.25 or with --validate=false validatorPlanner.Validate now enforces all 5 distinctness rules
3 security Production pod-spec runtime guard GenerateStatefulSet asserts operatorKeyringVolumeName never on seid main / init; signature changes to (*StatefulSet, error)
4 security Bootstrap guard missed EnvFrom assertNoValidatorSecretsOnBootstrapPod now walks both Env.ValueFrom.SecretKeyRef and EnvFrom.SecretRef
5 platform Magic-string defaults in 3 places Exported DefaultOperatorKeyName / DefaultPassphraseSecretKey constants from api/v1alpha1; planner + noderesource reference them; kubebuilder markers carry "must-match" comment
6 platform Cosmos-SDK terminology in operator-facing conditions Reworded to point at Secret data-key shape (*.info, *.address)
7 security AutomountServiceAccountToken left at kubelet-default Explicit ptr.To(true) with sei-protocol/seictl#165 reference comment
8 platform ptr.To(true) workaround inconsistent with rest of file Restored ptr.To(...) everywhere; //nolint:modernize with reason on the affected lines
9 security shareProcessNamespace decision Deferred with explicit code comment + follow-up issue

Deferred — tracked as follow-up

Issue filed: see "Harden sidecar/seid trust boundary against /proc-based leakage" in this repo for the broader sidecar/seid container hardening workstream.

The shareProcessNamespace gap, distinct UIDs for sidecar vs seid main, seid SecurityContext hardening, and separating the sidecar's SA from the controller-manager SA are coherent as a single workstream — orchestrator decision was to defer this PR's resolution rather than partial-fix here.

Gates

  • make manifests generate — idempotent (only the two new CEL rules show up in regenerated CRDs)
  • go test ./... — all packages pass
  • golangci-lint run --new-from-rev=origin/main ./... — 0 issues

Discretionary decisions during cleanup

  • GenerateStatefulSet signature change rippled to 42 test call sites + 1 real caller. Added mustGenerateStatefulSet test helper to keep ergonomics flat. Alternative considered: keep signature and call assertion from apply_statefulset.go — rejected because pushing the invariant out of the builder weakens the "every production pod-spec is checked" guarantee.
  • modernize/newexpr false-positive class also flagged ptr.To(sidecarNonRootUID) — same //nolint:modernize pattern applied; calling new() on a value is invalid Go.
  • goconst-driven container name constants (containerNameSeid, containerNameSidecar) extracted at package level because the new assertion code pushed occurrence counts past the linter threshold. Idiomatic.

This PR addresses all blockers identified by the cross-review. The deferred shareProcessNamespace hardening is tracked in the follow-up issue and is not a one-way door — the v1 trust posture documented in the design doc remains intact (operator keyring on sidecar only, mount-list discipline, CEL invariants), with the /proc leakage path being the explicit remaining work.

Cursor bugbot flagged two real regressions introduced by the sidecar
SecurityContext + pod-level FSGroup additions:

1. ReadOnlyRootFilesystem: true breaks any Go stdlib path that writes
   to /tmp (os.CreateTemp, net/http multipart, encoding/json large
   buffers). Adds an emptyDir volume mounted at /tmp on the sidecar
   container only. Tmpfs-backed, isolated from the seid container's
   TMPDIR=/sei/tmp.

2. Pod-level FSGroup without FSGroupChangePolicy defaults to "Always",
   causing kubelet to recursively chown every file on every mounted
   PVC at every pod start. For Sei archive PVCs (TBs of chain data)
   this adds minutes of startup latency on every restart. Setting
   FSGroupChangePolicy: OnRootMismatch chowns on first creation only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham
Copy link
Copy Markdown
Collaborator Author

Addressed Cursor bugbot findings (commit 1792267)

Both were real regressions from the sidecar hardening in 6622404. Fixed:

  1. Missing TMPDIR / writable /tmp — added an emptyDir volume mounted at /tmp on the sidecar container only. Tmpfs-backed, isolated from seid's TMPDIR=/sei/tmp. Resolves the Go-stdlib-temp-file failure mode introduced by ReadOnlyRootFilesystem: true.

  2. FSGroupChangePolicy: OnRootMismatch — added alongside FSGroup: 65532 on the pod SecurityContext. Avoids the kubelet's default Always policy recursively chowning every PVC file on every pod start — substantial savings for archive PVCs.

Tests + lint green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant