Fix stale kind cluster setup for certs and valkey #146
Fix stale kind cluster setup for certs and valkey #146Nina Polshakova (npolshakova) wants to merge 1 commit into
Conversation
Signed-off-by: npolshakova <nina.polshakova@solo.io>
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Benjamin Elder (@BenTheElder) Dmitry Berkovich (@dberkov) saw you had some recent changes in the setup scripts. Any thoughts about adding this for us with long-lived kind clusters 😅 |
Benjamin Elder (BenTheElder)
left a comment
There was a problem hiding this comment.
Heh, I think we're all iterating so much with clean installs every time.
I've also been wondering if we should drop the number of replicas etc at least on kind to make startup faster and use less resources ... but there's a tradeoff since that's our main source of continuous test coverage right now (and we expect "real" clusters) to need to deal with shards (which affects ate-apiserver).
| echo " --deploy-ate-system Deploy core system (CRDs, atelet, apiserver)" | ||
| echo " --delete-ate-system Delete core system" | ||
| echo " --delete-all Delete core system and all registered demos" | ||
| echo " --refresh-local-certs Regenerate local cert/JWT prerequisites and restart cert consumers" |
There was a problem hiding this comment.
We should probably figure out how to fix this generally so it doesn't require manual intervention, kind aside, cc Grant McCloskey (@MushuEE) Julian Gutierrez Oschmann (@juli4n) (anyone is welcome to look at this please, I just know these two are looking at valkey)
|
x-ref #225 |
| ./hack/install-ate-kind.sh --delete-all | ||
| ``` | ||
|
|
||
| For local `kind` clusters that have been running for several days, generated |
There was a problem hiding this comment.
it is a chance the issue exists for non kind version too. Yesterday I tried to re-deploy ate-api and it failed. It was complaining on expired token at startup. Deleting everything and installing from scratch solved the issue.
Fixes #<issue_number_goes_here>
Had a kind cluster up for a couple days and ran into these issues:
./hack/install-ate-kind.sh --deploy-ate-systemsetup script won't refresh the certsThis adds a new
--refresh-local-certsflag to recreate local CA roots used by the podcertificate controller, local session identity JWT/CA pools, CA bundle used by API server to trust Valkey TLS and restarts the pods that mount generated certs.ate-api-server-deploymentit kept timing out since the old IPs were being used.The new
--reset-local-valkeydeletesjob/valkey-cluster-initand Valkey PVCs with labelapp=valkey-clusterso the Valkey Cluster can be initialized from scratch.