feat: SeiNode validator.operatorKeyring CRD surface (closes #219)#220
feat: SeiNode validator.operatorKeyring CRD surface (closes #219)#220bdchatham wants to merge 3 commits into
Conversation
Implements Phase 1 step 1 of the in-pod governance signing workstream (design: sei-protocol/seictl#166). Adds validator.operatorKeyring to the SeiNode CRD — a discriminated union (Secret variant; future TMKMS/Vault/KMS reserved) declaring the operator-account keyring used by the sidecar for governance, MsgEditValidator, withdraw-rewards, and other operator-account txs. The keyring data Secret (mounted as a volume at $SEI_HOME/keyring-file/) and a separate passphrase Secret (projected as an env var) are mounted exclusively on the sidecar container — never on the seid main container, init container, or bootstrap pod. Four CEL invariants on ValidatorSpec enforce the trust boundary at admission: - operatorKeyring.secret.secretName != signingKey.secret.secretName - operatorKeyring.secret.secretName != nodeKey.secret.secretName - operatorKeyring.secret.secretName != passphraseSecretRef.secretName - secretName / passphraseSecretRef.secretName immutable post-create Pre-flight task validate-operator-keyring mirrors validate-signing-key: checks both Secrets exist + non-empty, validates the file-keyring's *.info and *.address index files, and confirms the named key entry is present if KeyName is specified. Does NOT decrypt — that sits with the sidecar's startup smoke test, not the controller's TCB. Sidecar container security context tightened in the same workstream: RunAsNonRoot, RunAsUser/Group 65532, ReadOnlyRootFilesystem, AllowPrivilegeEscalation false, Capabilities.Drop ALL, SeccompProfile RuntimeDefault, pod-level FSGroup 65532 (required for the non-root UID to read the 0o400 Secret mount). Bootstrap-pod isolation guard generalized — assertNoSigningKeyOnBootstrapPod becomes assertNoValidatorSecretsOnBootstrapPod, walking volumes for any of {signing-key, operator-keyring, passphrase} Secrets and env vars for the passphrase Secret. NodeKey deliberately excluded (no signing authority). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6e4b468. Configure here.
Synthesizes platform-engineer / kubernetes-specialist / security-specialist cross-review of #220. Addresses 9 review items; 1 deferred with explicit documentation. CEL invariants tightened (kubernetes-specialist finding): Adds two ValidatorSpec rules forbidding the passphrase Secret from colliding with signingKey.secret.secretName or nodeKey.secret.secretName. Closes a trust-boundary gap where a malicious config could project a Secret used by seid main into the sidecar's env-resolution path. Planner-side defense-in-depth (security-specialist finding): validatorPlanner.Validate now mirrors all five operator-keyring distinctness rules. CEL alone requires K8s 1.25+ with the feature flag; mis-configured webhooks or --validate=false bypass CEL. Production pod-spec runtime guard (security-specialist finding): GenerateStatefulSet asserts operatorKeyringVolumeName never appears on the seid main container or init containers. CI catches regressions the way assertNoValidatorSecretsOnBootstrapPod does for bootstrap pods. Signature changes to (*appsv1.StatefulSet, error); 42 call sites updated. Bootstrap-pod guard extended (security-specialist finding): assertNoValidatorSecretsOnBootstrapPod now walks c.EnvFrom[].SecretRef in addition to c.Env[].ValueFrom.SecretKeyRef. Future-proofs against envFrom-shaped leakage. Default constants exported (platform-engineer finding): api/v1alpha1.DefaultOperatorKeyName and DefaultPassphraseSecretKey replace magic strings in planner + noderesource. Kubebuilder defaults retain literal strings (markers can't reference Go consts) but carry comments tying them to the constants. Operator-facing error wording reworded (platform-engineer finding): Conditions point at Secret data-key shape (*.info, *.address) instead of internal Cosmos SDK backend terminology. Explicit AutomountServiceAccountToken (security-specialist finding): Pod spec now sets ptr.To(true) explicitly with #165 reference comment. Cluster-default flips that disable SA tokens would otherwise silently break the future TokenReview path. ptr.To linter situation cleaned (platform-engineer finding): Restored ptr.To(true)/ptr.To(false) with //nolint:modernize and a reason string. CLAUDE.md "fix lint, don't suppress" applies to real issues; new(true) is invalid Go, so the modernize suggestion is a false-positive and explicit nolint is the idiomatic answer. shareProcessNamespace = true documented as deferred (security blocker downgraded to deferred follow-up per orchestrator decision): A compromised seid container can today read /proc/<sidecar>/environ and /proc/<sidecar>/mem, exposing the passphrase and unlocked keyring. Not a one-way door — the broader sidecar/seid hardening (drop shareProcessNamespace, harden seid main SecurityContext, separate sidecar SA) is tracked separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-review complete — findings integrated3-specialist review (platform-engineer, kubernetes-specialist, security-specialist). Verdicts: platform-engineer LGTM-with-fixes • kubernetes-specialist LGTM-with-fixes • security-specialist Block-pending-changes (one item). Addressed in commit
|
| # | Source | Item | Fix |
|---|---|---|---|
| 1 | k8s-specialist | Trust-boundary CEL gap — passphrase Secret could collide with signing/node-key Secret | Two new XValidation rules on ValidatorSpec |
| 2 | security | Planner-side CEL mirror — CEL alone bypassable on K8s <1.25 or with --validate=false |
validatorPlanner.Validate now enforces all 5 distinctness rules |
| 3 | security | Production pod-spec runtime guard | GenerateStatefulSet asserts operatorKeyringVolumeName never on seid main / init; signature changes to (*StatefulSet, error) |
| 4 | security | Bootstrap guard missed EnvFrom |
assertNoValidatorSecretsOnBootstrapPod now walks both Env.ValueFrom.SecretKeyRef and EnvFrom.SecretRef |
| 5 | platform | Magic-string defaults in 3 places | Exported DefaultOperatorKeyName / DefaultPassphraseSecretKey constants from api/v1alpha1; planner + noderesource reference them; kubebuilder markers carry "must-match" comment |
| 6 | platform | Cosmos-SDK terminology in operator-facing conditions | Reworded to point at Secret data-key shape (*.info, *.address) |
| 7 | security | AutomountServiceAccountToken left at kubelet-default |
Explicit ptr.To(true) with sei-protocol/seictl#165 reference comment |
| 8 | platform | ptr.To(true) workaround inconsistent with rest of file |
Restored ptr.To(...) everywhere; //nolint:modernize with reason on the affected lines |
| 9 | security | shareProcessNamespace decision | Deferred with explicit code comment + follow-up issue |
Deferred — tracked as follow-up
Issue filed: see "Harden sidecar/seid trust boundary against /proc-based leakage" in this repo for the broader sidecar/seid container hardening workstream.
The shareProcessNamespace gap, distinct UIDs for sidecar vs seid main, seid SecurityContext hardening, and separating the sidecar's SA from the controller-manager SA are coherent as a single workstream — orchestrator decision was to defer this PR's resolution rather than partial-fix here.
Gates
make manifests generate— idempotent (only the two new CEL rules show up in regenerated CRDs)go test ./...— all packages passgolangci-lint run --new-from-rev=origin/main ./...— 0 issues
Discretionary decisions during cleanup
GenerateStatefulSetsignature change rippled to 42 test call sites + 1 real caller. AddedmustGenerateStatefulSettest helper to keep ergonomics flat. Alternative considered: keep signature and call assertion fromapply_statefulset.go— rejected because pushing the invariant out of the builder weakens the "every production pod-spec is checked" guarantee.modernize/newexprfalse-positive class also flaggedptr.To(sidecarNonRootUID)— same//nolint:modernizepattern applied; callingnew()on a value is invalid Go.goconst-driven container name constants (containerNameSeid,containerNameSidecar) extracted at package level because the new assertion code pushed occurrence counts past the linter threshold. Idiomatic.
This PR addresses all blockers identified by the cross-review. The deferred shareProcessNamespace hardening is tracked in the follow-up issue and is not a one-way door — the v1 trust posture documented in the design doc remains intact (operator keyring on sidecar only, mount-list discipline, CEL invariants), with the /proc leakage path being the explicit remaining work.
Cursor bugbot flagged two real regressions introduced by the sidecar SecurityContext + pod-level FSGroup additions: 1. ReadOnlyRootFilesystem: true breaks any Go stdlib path that writes to /tmp (os.CreateTemp, net/http multipart, encoding/json large buffers). Adds an emptyDir volume mounted at /tmp on the sidecar container only. Tmpfs-backed, isolated from the seid container's TMPDIR=/sei/tmp. 2. Pod-level FSGroup without FSGroupChangePolicy defaults to "Always", causing kubelet to recursively chown every file on every mounted PVC at every pod start. For Sei archive PVCs (TBs of chain data) this adds minutes of startup latency on every restart. Setting FSGroupChangePolicy: OnRootMismatch chowns on first creation only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addressed Cursor bugbot findings (commit
|

Summary
docs/design/in-pod-governance-signing.md— Component A).validator.operatorKeyringtoSeiNode— discriminated-union surface (Secretvariant; futureTMKMS/Vault/KMSreserved as comments and an additive CELexactly-onerule) declaring the operator-account keyring used by the seictl sidecar for governance,MsgEditValidator, withdraw-rewards, and other operator-account transactions.ValidatorSpec; pod-spec mount + env wired into the sidecar container only (never seid main, never init, never bootstrap); bootstrap-pod isolation guard generalized to walk both Secret volumes ANDvalueFrom.secretKeyRefenv entries.RunAsNonRoot,RunAsUser/Group 65532,ReadOnlyRootFilesystem,AllowPrivilegeEscalation: false,Capabilities.Drop: [ALL],SeccompProfile: RuntimeDefault, pod-levelFSGroup: 65532(required for the non-root UID to read the 0o400 Secret mount).validate-operator-keyringtask mirrorsvalidate-signing-key.go. Deliberately does NOT decrypt the keyring — that's the sidecar's startup smoke test (issue feat(controller): wire Spec.Genesis.Accounts via sidecar typed wrappers #162 in seictl), not the controller's TCB.CRD invariants
Four CEL rules on
ValidatorSpecenforce the trust boundary at admission:operatorKeyring.secret.secretName != signingKey.secret.secretName(data-keyring ≠ consensus-key Secret)operatorKeyring.secret.secretName != nodeKey.secret.secretName(data-keyring ≠ P2P-key Secret)operatorKeyring.secret.secretName != passphraseSecretRef.secretName(data-keyring ≠ passphrase Secret)secretNameandpassphraseSecretRef.secretNameimmutable viaself == oldSelfkeyNameis intentionally NOT immutable (allows key rotation within a Secret).Discretionary decisions flagged for review
These weren't fully prescribed by the design — reviewers should sanity-check:
"file-keyring requires at least one encrypted key blob"). Functional, not load-bearing.defaultStrhelper inlined, not extracted. Design referenced a helper that doesn't exist in this repo; YAGNI says don't add a 4-line helper for two call sites. Defaults:KeyName→\"node_admin\",passphraseRef.Key→\"passphrase\"(matches the kubebuilder defaults).ptr.To(true)/ptr.To(int64(65532))workaround:modernize/newexprlinter false-positively suggestsnew(true)(invalid Go). Assigned to local vars before address-taking. Functionally identical; less direct.PassphraseEnvInInitContainer_Rejects) — not in the design but the bootstrap pod's sidecar runs asInitContainer, so this is the realistic env-leakage path. Cheap belt-and-suspenders.Test plan
Verification on the implementer side:
make manifests generatesucceeds and produces the expected new types/CRD fieldsgo build ./...cleango test ./...all packages passgolangci-lint run --new-from-rev=origin/main ./...reports 0 issuesFor human reviewers:
buildSidecarContainer) — NOT onbuildNodeMainContainer, NOT on the init container, NOT on the bootstrap podValidatorSpecviakubectl apply --dry-run=serveron edge cases (collision Secret names, missing fields, immutability-on-update)validate-operator-keyringdoes NOT attempt to decrypt the keyring (controller TCB stays small)assertNoValidatorSecretsOnBootstrapPodrejects each of the three forbidden Secret types and accepts NodeKeyCross-review
This PR is being reviewed in parallel by platform-engineer, kubernetes-specialist, and security-specialist before being marked ready for merge. Their findings will be posted as PR review comments.
🤖 Generated with Claude Code
Note
Medium Risk
Introduces new validator secret handling (operator keyring + passphrase) and tightens pod security settings; mistakes could leak credentials or break validator deployments. Changes are scoped and include admission/planner/runtime guards plus tests, reducing but not eliminating risk.
Overview
Adds a new optional
validator.operatorKeyringfield to theSeiNode/SeiNodeDeploymentCRDs (Secret-backed keyring + separate passphrase Secret), including defaults and CEL rules enforcing secret distinctness/immutability.Wires the operator keyring into generated pod specs sidecar-only (volume mount under
keyring-file/plusSEI_KEYRING_PASSPHRASEenv), adds sidecar hardening (RunAsNonRoot, read-only rootfs, dropped caps, runtime seccomp) and pod-levelfsGroup//tmpemptyDir, and adds a fail-closed guard to prevent accidental keyring mounts onseid/non-sidecar init containers.Extends planning/execution with a new
validate-operator-keyringtask and condition (OperatorKeyringReady) to preflight Secret shape, inserts it into bootstrap/running plans, and strengthens bootstrap-job isolation checks to forbid validator-owned secrets via both volume mounts and env references.Reviewed by Cursor Bugbot for commit 1792267. Bugbot is set up for automated code reviews on this repo. Configure here.