waitForDS() can report success prematurely during statefulset rolling updates

We had an issue with the wait command when waiting for DS replicas:

```
waitForDS() {
    local ds=$1
  
    waitResourceExists "sts/${ds}"
    local rep=$($K_CMD $NAMESPACE_OPT get sts $ds -o=jsonpath='{.spec.replicas}')
    ((rep--))
    message "rep=${rep}" "debug"
    for i in $(seq 0 $rep) ; do
      waitForResource ready "pod/${ds}-${i}"
    done
  }
```

This function checks individual pods for readiness, but does not check whether they have actually been updated. Additionally, it iterates in the wrong order (low to high), while statefulsets by default restart the pods from the highest number to the lowest.
  
This means the command can report success prematurely: the lower numbered pod hasn't been restarted yet (so it's still "ready" from the previous deployment), while the higher numbered pod has already been restarted and the wait command correctly waited for it.
  
Example:
In a cluster with two ds replicas
1. The rollout terminates ds-*-1 first (highest number).
2. The wait command checks ds-*-0 which is still ready from the previous deployment, so it passes immediately.
3. The wait command checks ds-*-1 and waits for the new pod to become ready.
4. The wait command reports success.
5. The rollout now terminates ds-*-0 to update it, causing a brief period where other components need to realize the downtime.
  
In our case, this caused flaky amster imports in our deploy pipeline because amster was started right after the wait command succeeded, but the old and ready ds pod was just terminated.

We where able to get around this issue with the following:

```
waitForDS() {
  local ds=$1
  waitForRollout "statefulset/${ds}"
}
  
waitForRollout() {
  local resource=$1
  echo "Waiting for ${resource} rollout to complete."
  if waitResourceExists "$resource" ; then
    kube rollout status ${resource} --timeout="${TIMEOUT}s"
  else
    echo "ERROR: $resource not found. Skipping."
  fi
}
```
This waits for the full rollout to complete (all pods updated to the new version and ready), regardless of order.
  
The question is of course, should it wait for the full rollout to complete (safer), or just for at least one pod to be ready (faster, sufficient for basic platform availability)? 
For am and idm it waits for the deployments, why not for the statefulset for ds?
What was the intention here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

waitForDS() can report success prematurely during statefulset rolling updates #718

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

waitForDS() can report success prematurely during statefulset rolling updates #718

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions