Skip to content

waitForDS() can report success prematurely during statefulset rolling updates #718

Description

@m4um4u1

We had an issue with the wait command when waiting for DS replicas:

waitForDS() {
    local ds=$1
  
    waitResourceExists "sts/${ds}"
    local rep=$($K_CMD $NAMESPACE_OPT get sts $ds -o=jsonpath='{.spec.replicas}')
    ((rep--))
    message "rep=${rep}" "debug"
    for i in $(seq 0 $rep) ; do
      waitForResource ready "pod/${ds}-${i}"
    done
  }

This function checks individual pods for readiness, but does not check whether they have actually been updated. Additionally, it iterates in the wrong order (low to high), while statefulsets by default restart the pods from the highest number to the lowest.

This means the command can report success prematurely: the lower numbered pod hasn't been restarted yet (so it's still "ready" from the previous deployment), while the higher numbered pod has already been restarted and the wait command correctly waited for it.

Example:
In a cluster with two ds replicas

  1. The rollout terminates ds-*-1 first (highest number).
  2. The wait command checks ds-*-0 which is still ready from the previous deployment, so it passes immediately.
  3. The wait command checks ds-*-1 and waits for the new pod to become ready.
  4. The wait command reports success.
  5. The rollout now terminates ds-*-0 to update it, causing a brief period where other components need to realize the downtime.

In our case, this caused flaky amster imports in our deploy pipeline because amster was started right after the wait command succeeded, but the old and ready ds pod was just terminated.

We where able to get around this issue with the following:

waitForDS() {
  local ds=$1
  waitForRollout "statefulset/${ds}"
}
  
waitForRollout() {
  local resource=$1
  echo "Waiting for ${resource} rollout to complete."
  if waitResourceExists "$resource" ; then
    kube rollout status ${resource} --timeout="${TIMEOUT}s"
  else
    echo "ERROR: $resource not found. Skipping."
  fi
}

This waits for the full rollout to complete (all pods updated to the new version and ready), regardless of order.

The question is of course, should it wait for the full rollout to complete (safer), or just for at least one pod to be ready (faster, sufficient for basic platform availability)?
For am and idm it waits for the deployments, why not for the statefulset for ds?
What was the intention here?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions