Skip to content

subcommand to summarise gateway state to json#20

Merged
solsson merged 7 commits intomainfrom
gateway-state-export
May 7, 2026
Merged

subcommand to summarise gateway state to json#20
solsson merged 7 commits intomainfrom
gateway-state-export

Conversation

@solsson
Copy link
Copy Markdown
Contributor

@solsson solsson commented May 6, 2026

No description provided.

Yolean k8s-qa and others added 2 commits May 6, 2026 13:39
…ntegration

Adds `y-cluster gateway state` -- a JSON snapshot of the
cluster's GatewayClass, Gateway, HTTPRoute, GRPCRoute,
ClientTrafficPolicy, and BackendTrafficPolicy resources --
and wires it into the appliance export pipeline so the
reconciled snapshot ships alongside the qcow2/OVA/gcp-tar
deliverables.

Subcommand:

  - `y-cluster gateway state [--context=NAME]` prints JSON to
    stdout. Each kind carries spec AND status, so consumers
    can answer maintenance-relevant questions deterministically:
    Is HTTPS ready? (walk gateways[].status.listeners[] for
    port==443, programmed==true, attachedRoutes>0). Is port 80
    redirect-only? (walk httpRoutes[].rules for filters of type
    RequestRedirect). Are ClientTrafficPolicy settings actually
    in effect? (walk clientTrafficPolicies[].status.ancestors[]
    for Accepted=True alongside spec.clientIPDetection.xForwardedFor.numTrustedHops).
  - `y-cluster gateway clear-dns-hint-ip [--context=NAME]
    [--gateway-class=y-cluster]` removes the
    yolean.se/dns-hint-ip annotation from the GatewayClass.
    Idempotent; used by prepare-export.

  The shape is documented as a generated JSON Schema at
  pkg/provision/schema/gateway-state.schema.json (added to
  schemagen alongside the provider config schemas).

prepare-export reshape:

  - Now requires a RUNNING cluster. Earlier behavior (require
    stopped cluster, error otherwise) is inverted: the new live
    phase runs `gateway clear-dns-hint-ip` (so the per-deploy
    LB IP doesn't bake into the customer snapshot) followed by
    `gateway state` (dumping to <cacheDir>/<name>-gateway-state.json),
    both needing the apiserver up. After the live phase,
    prepare-export stops the VM internally before the existing
    offline virt-customize phase. The previous explicit
    `y-cluster stop` step in callers becomes redundant.
  - Preflight reordering: virt-customize + kubectl LookPath
    checks fire first, so missing-tool errors surface before
    the running-state check.

Export changes:

  - `pkg/provision/qemu/export.go` copies the gateway-state.json
    sibling into BUNDLE_DIR. Best-effort: a build that skipped
    prepare-export (or one that ran before this change) won't
    have the file -- log + skip rather than fail the export.

Script update:

  - `scripts/appliance-qemu-to-gcp.sh` drops the explicit
    `y-cluster stop` line before `y-cluster prepare-export`.
    With the new live phase that step is wrong (would bring
    down the cluster prepare-export needs up).

Schema generation:

  - `cmd/internal/schemagen/main.go` gains a writeOutputSchema
    helper for non-provider-config schemas. Generates
    `gateway-state.schema.json` from gateway.State{} via the
    same invopop reflector, but with FieldNameTag=json (output
    is JSON, not YAML) and no provider-narrowing post-process.
  - `pkg/gateway.SchemaID` is the canonical $id; a fresh Fetch
    embeds it as `$schema` in the produced JSON so consumers
    can validate by URL.

Smoke-tested against the live appliance-gcp-build VM:
gateway state returns 1 GatewayClass (programmed listener on
port 80, attachedRoutes=3), 3 HTTPRoutes (external-http,
keycloak-admin, echo), 1 ClientTrafficPolicy (trust-lb-xff
with numTrustedHops=1, Accepted=True). The currently-set
dns-hint-ip annotation (`127.0.0.1` from the local-provision
default) is what prepare-export will clear before snapshot.

Tests:

  - `pkg/gateway/state_test.go` covers the targetRefs-shape
    flatten (singular vs plural), the case-insensitive
    Programmed-condition check, the SchemaID surfacing in
    the JSON output, and the zero-value-no-null-slices
    invariant.
  - `pkg/provision/qemu/prepare_export_test.go` updates the
    VM-state assertion (was: "expects stopped"; now: "expects
    running") and trims the unused filepath import.

E2e against /dev/kvm not run in this commit; the existing
qemu e2e suite's prepare-export coverage will now exercise
the live phase automatically when re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `schemaVersion` field to the gateway state JSON (currently
"1") and constrains it via a single-value enum on the generated
schema. Lets a consumer reading the JSON detect a snapshot
shape they don't recognise -- the enum value at fetch time
must match what the consumer's copy of the schema doc expects,
or validation fails.

The schema URL stays UNVERSIONED (the `$id` /
`https://yolean.se/y-cluster/schema/gateway-state.schema.json`
always points at the current schema doc). Per-document version
comes from the new schemaVersion field. Backward-incompatible
shape changes (renames, removals) require: bump
`gateway.SchemaVersion`, regenerate the schema (so the enum
catches up), update consumers in lockstep. Old snapshots
remain identifiable by their schemaVersion field; they would
validate against an archived copy of the previous schema doc
once we need to serve one. Additive changes (new omitempty
fields) do NOT require a bump.

Implementation:

  - `gateway.SchemaVersion = "1"` exported constant.
  - `State.SchemaVersion` field (json:"schemaVersion"),
    populated by Fetch() from the constant.
  - `cmd/internal/schemagen` gains an `enumPin` post-process
    helper -- a small (DefName, PropName, Values) tuple --
    plumbed through `writeOutputSchema` as variadic. The
    gateway-state schema is the only consumer today; the
    helper generalises cleanly for future single-value enums
    on other output schemas.
  - `pkg/gateway/state.schema.json` regenerated with the
    `schemaVersion: { enum: ["1"], type: "string" }` constraint.

Tests:

  - TestStateZeroValueMarshals updated to assert
    `"schemaVersion":"1"` in the marshalled output.
  - TestSchemaVersionMatchesEnum reads back the regenerated
    schema doc and asserts its schemaVersion enum equals
    [SchemaVersion]. Fails fast in CI if the constant gets
    bumped without a `go generate` follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@solsson
Copy link
Copy Markdown
Contributor Author

solsson commented May 6, 2026

v1 payload:

{
  "gatewayClass": {
    "name": "y-cluster",
    "controllerName": "gateway.envoyproxy.io/gatewayclass-controller",
    "conditions": [
      {
        "type": "Accepted",
        "status": "True",
        "reason": "Accepted",
        "message": "Valid GatewayClass"
      }
    ]
  },
  "gateways": [
    {
      "namespace": "y-cluster",
      "name": "y-cluster",
      "gatewayClassName": "y-cluster",
      "listeners": [
        {
          "name": "http",
          "port": 80,
          "protocol": "HTTP",
          "allowedRoutes": {
            "namespaces": {
              "from": "All"
            }
          }
        }
      ],
      "status": {
        "conditions": [
          {
            "type": "Accepted",
            "status": "True",
            "reason": "Accepted",
            "message": "The Gateway has been scheduled by Envoy Gateway"
          },
          {
            "type": "Programmed",
            "status": "True",
            "reason": "Programmed",
            "message": "Address assigned to the Gateway, 1/1 envoy replicas available"
          }
        ],
        "listeners": [
          {
            "name": "http",
            "attachedRoutes": 3,
            "conditions": [
              {
                "type": "Programmed",
                "status": "True",
                "reason": "Programmed",
                "message": "Sending translated listener configuration to the data plane"
              },
              {
                "type": "Accepted",
                "status": "True",
                "reason": "Accepted",
                "message": "Listener has been successfully translated"
              },
              {
                "type": "ResolvedRefs",
                "status": "True",
                "reason": "ResolvedRefs",
                "message": "Listener references have been resolved"
              }
            ],
            "programmed": true
          }
        ]
      }
    }
  ],
  "httpRoutes": [
    {
      "namespace": "my-app",
      "name": "external-http",
      "parentRefs": [
        {
          "group": "gateway.networking.k8s.io",
          "kind": "Gateway",
          "name": "y-cluster",
          "namespace": "y-cluster"
        }
      ],
      "hostnames": [
        "my-app.example.net"
      ],
      "rules": [
        {
          "backendRefs": [
            {
              "group": "",
              "kind": "Service",
              "name": "gateway-v4-cluster",
              "port": 8080,
              "weight": 1
            }
          ],
          "matches": [
            {
              "path": {
                "type": "PathPrefix",
                "value": "/"
              }
            }
          ]
        }
      ],
      "status": {
        "parents": [
          {
            "parentRef": {
              "group": "gateway.networking.k8s.io",
              "kind": "Gateway",
              "name": "y-cluster",
              "namespace": "y-cluster"
            },
            "controllerName": "gateway.envoyproxy.io/gatewayclass-controller",
            "conditions": [
              {
                "type": "Accepted",
                "status": "True",
                "reason": "Accepted",
                "message": "Route is accepted"
              },
              {
                "type": "ResolvedRefs",
                "status": "True",
                "reason": "ResolvedRefs",
                "message": "Resolved all the Object references for the Route"
              }
            ]
          }
        ]
      }
    },
    {
      "namespace": "keycloak-v3",
      "name": "keycloak-admin",
      "parentRefs": [
        {
          "group": "gateway.networking.k8s.io",
          "kind": "Gateway",
          "name": "y-cluster",
          "namespace": "y-cluster"
        }
      ],
      "hostnames": [
        "keycloak-admin"
      ],
      "rules": [
        {
          "backendRefs": [
            {
              "group": "",
              "kind": "Service",
              "name": "keycloak-proxied",
              "port": 8080,
              "weight": 1
            }
          ],
          "matches": [
            {
              "path": {
                "type": "PathPrefix",
                "value": "/"
              }
            }
          ],
          "timeouts": {
            "backendRequest": "120s",
            "request": "120s"
          }
        }
      ],
      "status": {
        "parents": [
          {
            "parentRef": {
              "group": "gateway.networking.k8s.io",
              "kind": "Gateway",
              "name": "y-cluster",
              "namespace": "y-cluster"
            },
            "controllerName": "gateway.envoyproxy.io/gatewayclass-controller",
            "conditions": [
              {
                "type": "Accepted",
                "status": "True",
                "reason": "Accepted",
                "message": "Route is accepted"
              },
              {
                "type": "ResolvedRefs",
                "status": "True",
                "reason": "ResolvedRefs",
                "message": "Resolved all the Object references for the Route"
              }
            ]
          }
        ]
      }
    },
    {
      "namespace": "y-cluster",
      "name": "echo",
      "parentRefs": [
        {
          "group": "gateway.networking.k8s.io",
          "kind": "Gateway",
          "name": "y-cluster"
        }
      ],
      "rules": [
        {
          "backendRefs": [
            {
              "group": "",
              "kind": "Service",
              "name": "echo",
              "port": 80,
              "weight": 1
            }
          ],
          "matches": [
            {
              "path": {
                "type": "PathPrefix",
                "value": "/q/envoy/echo"
              }
            }
          ]
        }
      ],
      "status": {
        "parents": [
          {
            "parentRef": {
              "group": "gateway.networking.k8s.io",
              "kind": "Gateway",
              "name": "y-cluster"
            },
            "controllerName": "gateway.envoyproxy.io/gatewayclass-controller",
            "conditions": [
              {
                "type": "Accepted",
                "status": "True",
                "reason": "Accepted",
                "message": "Route is accepted"
              },
              {
                "type": "ResolvedRefs",
                "status": "True",
                "reason": "ResolvedRefs",
                "message": "Resolved all the Object references for the Route"
              }
            ]
          }
        ]
      }
    }
  ],
  "grpcRoutes": [],
  "clientTrafficPolicies": [
    {
      "namespace": "y-cluster",
      "name": "trust-lb-xff",
      "targetRefs": [
        {
          "group": "gateway.networking.k8s.io",
          "kind": "Gateway",
          "name": "y-cluster"
        }
      ],
      "spec": {
        "clientIPDetection": {
          "xForwardedFor": {
            "numTrustedHops": 1
          }
        },
        "targetRefs": [
          {
            "group": "gateway.networking.k8s.io",
            "kind": "Gateway",
            "name": "y-cluster"
          }
        ]
      },
      "status": {
        "ancestors": [
          {
            "ancestorRef": {
              "group": "gateway.networking.k8s.io",
              "kind": "Gateway",
              "name": "y-cluster",
              "namespace": "y-cluster"
            },
            "controllerName": "gateway.envoyproxy.io/gatewayclass-controller",
            "conditions": [
              {
                "type": "Accepted",
                "status": "True",
                "reason": "Accepted",
                "message": "Policy has been accepted."
              }
            ]
          }
        ]
      }
    }
  ],
  "backendTrafficPolicies": [],
  "fetchedAt": "2026-05-06T11:21:21Z",
  "$schema": "https://yolean.se/y-cluster/schema/gateway-state.schema.json",
  "schemaVersion": "1"
}

solsson and others added 5 commits May 6, 2026 14:59
Two CI failures on this PR (run 25433079702):

- lint (staticcheck S1016) on pkg/gateway/fetch.go:
  toConditions and the listener projection in fetchGateways
  used full struct literals to copy from raw* types into the
  exported types they shadow field-for-field. Identical-shape
  conversions are clearer and what staticcheck flags. Replaced
  with `Condition(c)` and `Listener(l)`.

- pkg/provision/qemu test failures on ubuntu-latest:
  TestPrepareExport_NoSavedState and TestPrepareExport_VMNotRunning
  passed locally on hosts with libguestfs-tools installed but
  failed on stock GHA runners because PrepareExport's first step
  is a virt-customize LookPath guard. The tests want to assert
  the saved-state and not-running error paths that come AFTER
  the LookPath guards, so we stub virt-customize + kubectl on
  PATH (empty shims; the assertion-target branches return before
  invoking either binary). The existing
  TestPrepareExport_MissingVirtCustomize keeps its explicit
  PATH="" override and still proves the LookPath hint fires when
  the binary is genuinely absent.

Other notes from reviewing the PR (no changes needed, just
flagging things I confirmed are sound):

- The pkg/gateway split between rawCondition / Condition (and
  the parallel Listener / rawListener pair) is intentional --
  rawCondition is the kubectl-JSON unmarshal target, Condition
  is the public output type. They happen to be identical today;
  keeping them separate gives room to project / filter without
  breaking consumers when the kubectl shape evolves.

- schemagen now writes two distinct kinds of schema:
  provision-config schemas under pkg/provision/schema/ and
  output schemas alongside the Go type that produces them
  (e.g. pkg/gateway/state.schema.json). The split is documented
  in the package doc and the gen path is symmetric with the
  per-provider one.

- The unversioned $id + enum-of-one schemaVersion pattern on
  gateway.State (SchemaVersion = "1") is the right shape for
  forward compatibility: the canonical URL stays stable, the
  version stamp identifies any given snapshot, and a future
  bump means versioning the schema doc URL while leaving the
  unversioned pointer at the latest.

- gateway.Fetch issues one kubectl invocation per kind. That's
  fine for the volume here (~6 kinds) but a single
  `kubectl get gatewayclass,gateway,httproute,... -A -o json`
  is a non-blocking follow-up if the round-trip count ever
  matters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The raw reconciled-resource dump is hard to consume directly.
Add two top-level fields on the state JSON:

- summary: a fully-typed routing-tree projection in
  industry-neutral terms (listener -> host -> route ->
  match/backend). numTrustedHops + trustedCIDRs surface at
  listener level, where ClientTrafficPolicy actually attaches.
  Routes without a hostname bucket under "*" (sorted last).
  GRPC method matches render as "Method=Type:Service/Method"
  in the same Path field as HTTP path matches.
- envoy: a sample of dataplane state (version + verbatim
  /config_dump) from any one envoy-gateway proxy pod.
  envoy admin binds 127.0.0.1:19000 inside a distroless
  container, so we kubectl port-forward (kubelet's apiserver
  /pods/<n>:19000/proxy can't reach localhost-bound ports).
  Best-effort: skipped silently when no proxy pod runs yet.

Summary is unit-tested with Gateway API payloads + an empty
envoy object as input. Envoy.config is schema-typed as
type=object via a jsonschema struct tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prepare-export's live phase writes the reconciled Gateway
snapshot to <cacheDir>/<name>-gateway-state.json, but the
teardown artefact list didn't include it. The JSON survived
teardown, and the next prepare-export bundle picked up a stale
dump from the prior cluster.

Add the path to perVMArtefacts and update both the
explicit-list teardown test and the TestPerVMArtefacts pin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The appliance build flow's external HTTPS LoadBalancer stage
needs a SAN list for its self-signed cert. Today the operator
declares it twice -- once in HTTPRoute manifests, once in
TLS_DOMAINS=foo,bar -- and drift between the two means either
the cert covers hostnames the cluster doesn't serve, or the
cluster serves hostnames the cert doesn't cover.

Add `y-cluster gateway hostnames` that reads the existing
`gateway state` snapshot and projects unique non-wildcard
hostnames from the typed Summary
(.summary.listeners[].hosts[].hostname). Default output is one
hostname per line; --csv joins with `,` -- exactly the format
TLS_DOMAINS / do_tls_frontend expect.

Implementation is a small pure-Go helper (`Hostnames(*State)
[]string`) plus cobra wiring next to `gateway state`. The
filter logic (skip "" and "*", dedupe across listeners) is
unit-tested.

The "*" sentinel from Summary is the catch-all bucket for
routes that declare no `.spec.hostnames` -- not a hostname
suitable for a cert SAN. Wildcard support (e.g. *.example.com
literal SANs) is out of scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two loose props at the listener root (numTrustedHops,
trustedCIDRs) didn't carry their context: a consumer reading
the JSON saw the values but had to know they were XFF settings,
not generic listener tuning.

Group them under a `xForwardedFor` wrapper that mirrors the
source CRD shape (`spec.clientIPDetection.xForwardedFor` on a
ClientTrafficPolicy), at one wrapping level. Single-level wrap
matches the only currently-defined detection mechanism in
envoy-gateway; if `customHeader` (the alternate
clientIPDetection mechanism) becomes relevant, it lands as a
sibling at the same level.

Schema regenerated; tests updated to walk the new path
(`l.XForwardedFor.NumTrustedHops` / `.TrustedCIDRs`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@solsson solsson merged commit 6003a0b into main May 7, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant