Skip to content

Unconditional Helm upgrade in Observe() causes infinite reconciliation loop #184

@lukeffo

Description

@lukeffo

Describe the bug

In ./internal/composition/composition.go, Observe() unconditionally executes a Helm upgrade on every reconciliation loop iteration, even when the composition is already up-to-date. This creates a new identical Helm release revision on every reconcile cycle, causing an infinite reconciliation loop and unnecessary load on the Kubernetes API server:

NAMESPACE                 NAME                                                              CPU(cores)   MEMORY(bytes)
krateo-system             wpblueprints-v2-0-0-controller-647f97c99d-7t6rl                 72m          4976Mi
NAME                                    NAMESPACE       REVISION        UPDATED                                         STATUS          CHART                                           APP VERSION
wp-area51-vcwx5x4b                    krateo-system   16017           2026-05-28 20:49:17.860510731 +0000 UTC         deployed        wp-blueprint-2.0.0

To Reproduce

  1. Deploy any composition (e.g. WPBlueprint) and wait for it to reach Ready: True
  2. Observe the controller logs — External resource is up to date is logged every few seconds
  3. Check the Helm release history:
helm -n <namespace> history <release-name>
  1. Diff two consecutive revisions:
helm -n <namespace> get manifest <release-name> --revision=<N> > rev_a.yaml
helm -n <namespace> get manifest <release-name> --revision=<N+1> > rev_b.yaml
diff rev_a.yaml rev_b.yaml
  1. The diff is empty — revisions are identical

Expected behavior

Once a composition reaches steady state (digest == previousDigest, chart version unchanged), Observe() should skip the Helm upgrade and avoid writing to the Kubernetes status, preventing unnecessary watch events and re-queues.

Root cause

In ./internal/composition/composition.go, Observe() calls hc.Upgrade() (or helmchart.Update()) unconditionally before computing and comparing the digest. The digest check happens after the upgrade, when a new Helm revision has already been created. Additionally, tools.UpdateStatus() is called unconditionally at the end of Observe() even when nothing has changed, which triggers a new watch event → re-queue → new Observe() → infinite loop.

Issue seems confirmed on 0.20.2 but also on latest version.

Observable symptoms

Controller logs show continuous reconciliation at steady state:

[20:00:50.903] INFO: External resource is up to date {
  "apiVersion": "composition.krateo.io/v2-0-0",
  "event": "observe",
  "kind": "WPBlueprint",
  "name": "wp-area51",
  "namespace": "krateo-system",
  "traceId": "93kiXlJvR"
}
[20:00:54.764] INFO: External resource is up to date {
  "apiVersion": "composition.krateo.io/v2-0-0",
  "event": "observe",
  "kind": "WPBlueprint",
  "name": "wp-area51",
  "namespace": "krateo-system",
  "traceId": "nPnmulJDg"
}

The Synced condition lastTransitionTime updates on every reconcile while the Ready condition timestamp stays fixed:

status:
  conditions:
  - lastTransitionTime: "2026-05-28T02:56:33Z"  # never changes
    reason: Available
    type: Ready
  - lastTransitionTime: "2026-05-28T20:02:38Z"  # updates every few seconds
    reason: ReconcileSuccess
    type: Synced
  digest: a5510d332d129654
  previousDigest: a5510d332d129654  # always equal at steady state

Additional context

  • Helm revision count grows by ~3 every few seconds after about 24h the controller deployment
  • status.digest and status.previousDigest are always equal at steady state, confirming no actual drift exists
  • The bug seems triggered by the UpdateStatus() call at the end of Observe() — writing to status generates a new Kubernetes watch event, which re-queues the resource, which triggers a new Observe(), closing the loop

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions