fix(gitops-update): refuse semver downgrade unless explicitly allowed#396
Draft
alexgarzao wants to merge 1 commit into
Draft
fix(gitops-update): refuse semver downgrade unless explicitly allowed#396alexgarzao wants to merge 1 commit into
alexgarzao wants to merge 1 commit into
Conversation
The release pipeline currently overwrites image tags by simple inequality
("$CURRENT_TAG" != "$TAG"). When a production release fires (IS_PRODUCTION
loop = "dev stg prd sandbox"), it writes the production tag into every
env — including dev paths that were already on a higher pre-release.
Recent incident: on 2026-05-31 a release of flowker v1.1.1 downgraded
4 dev paths (firmino, anacleto, benedita, clotilde) from 1.2.0-beta.12
back to 1.1.1. The audit_db schema in dev had been migrated to v2 by
the beta deploys; the older image only shipped migration 001, so the
app panicked with "source directory missing or empty" on startup. Two
dev environments stayed in CrashLoopBackOff for 2+ days. Fix shipped
out-of-band via LerianStudio/midaz-firmino-gitops#814 by manually
restoring tag to 1.2.0-beta.12.
This change:
- Adds new boolean input `allow_downgrade` (default false). Callers who
legitimately need to roll back (e.g. emergency revert) opt in.
- Adds a semver-correct `semver_gt` bash function (no external deps).
Pure bash so the step keeps zero install cost; sort -V was rejected
because GNU coreutils does not implement prerelease precedence per
semver.org#spec-item-11 (it sorts "1.2.0" before "1.2.0-beta.X").
- Wraps the three update sites with the guard:
* helmfile values.yaml — image tag mappings
* helmfile values.yaml — configmap key mappings
* kustomization.yaml — `kustomize edit set image`
- Treats empty/non-semver current values as "skip the check, allow
write" with a warning, so first installs and exotic tag schemes
(e.g. branch SHAs) are not blocked.
Behavior:
- 1.1.1 over 1.2.0-beta.12 → REFUSED with ::warning::, exit clean,
no commit. Caller sees the warning and decides next step.
- 1.2.0 over 1.2.0-beta.12 → ALLOWED (release > prerelease, semver).
- 1.2.0-beta.13 over 1.2.0-beta.12 → ALLOWED.
- Equal values → no-op as before.
- allow_downgrade: true → previous behavior preserved verbatim.
Tested locally against 12 precedence cases including the failure that
triggered this fix; all pass.
Follow-up (not in this PR): the IS_PRODUCTION env loop "dev stg prd
sandbox" is overly broad — production releases should arguably never
touch dev. That is a wider conversation; this PR is the minimal safety
net that catches the failure without changing the semantics callers
already rely on.
Refs: LerianStudio/midaz-firmino-gitops#814
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Contributor
🔍 Lint Analysis
|
Contributor
🛡️ CodeQL Analysis ResultsLanguages analyzed: Found 2 issue(s): 2 Medium
🔍 View full scan logs | 🛡️ Security tab |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gitops-update.ymlcurrently writes image tags intovalues.yamlwheneverCURRENT_TAG != TAG, regardless of semver precedence. When a production tag fires, the env loop isdev stg prd sandbox— production releases overwrite dev paths indiscriminately. This caused a multi-day incident (details below).This PR adds a semver-correct precedence guard with an explicit override.
1.1.1overwrites1.2.0-beta.12silently1.1.1→1.2.0-beta.12is refused with::warning::; commit is skippedallow_downgrade: bool(defaultfalse); settruefor intentional rollbacksRoot incident (the bug this prevents)
On 2026-05-31 00:49 UTC, the auto-bump commit
LerianStudio/midaz-firmino-gitops@2e46be40(ci(flowker): update image tags (production)) downgraded 4 dev paths (firmino, anacleto, benedita, clotilde) from1.2.0-beta.12to1.1.1. The audit_db schema in dev had been migrated to v2 by the beta deploys; image:1.1.1only ships migration 001. Result: 3 of 4 dev pods stuck in CrashLoopBackOff for 2+ days (firmino-flowker-devreached 270 restarts before the manual fix). Out-of-band fix:LerianStudio/midaz-firmino-gitops#814.Changes
allow_downgrade(boolean, defaultfalse).semver_gtfunction — no install step, no external dependency.sort -Vwas tried first and rejected: GNU coreutils does not implement semver §11 precedence for prereleases (it sorts1.2.0 < 1.2.0-beta.12, which is backwards).values.yaml— image tag mappingsvalues.yaml— configmap key mappingskustomize edit set image::warning::(first installs and exotic schemes are not blocked).Local test matrix (12 cases, all passing)
Risk / backward compatibility
allow_downgrade: trueat the call site.argocd_sync,commit, orslack-notifyjobs.Test plan
self-pr-validation.yml?) with1.1.1over a dev path on1.2.0-beta.X— confirm the workflow log shows the refusal warning and the commit step is a no-op.allow_downgrade: true— confirm the downgrade is applied (backward-compatible escape hatch).1.2.0-beta.13over1.2.0-beta.12) — confirm it still works (no regression).Follow-up not in this PR
The
IS_PRODUCTIONenv loop isdev stg prd sandbox. Even with the semver guard, this still writes todev(it just no-ops on downgrades). A stricter policy might be: production releases never touch dev. That is a behavioral change that deserves its own discussion with the DevOps team and is out of scope here.cc @LerianStudio/g_github_devops