Add CRR cascaded tests#2447
Conversation
Hello sylvainsenechal,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
| # yq eval 'sortKeys(.)' -i deps.yaml | ||
| backbeat: | ||
| sourceRegistry: ghcr.io/scality | ||
| sourceRegistry: ghcr.io/scality/playground/sylvainsenechal |
There was a problem hiding this comment.
backbeat and cloudserver point to a personal playground registry (ghcr.io/scality/playground/sylvainsenechal) with dev tags (9.4.101, 9.3.101). Per review criteria, Scality-internal images must resolve to tags published on ghcr.io/scality. These need to be reverted before merge.
ede39d5 to
56052d7
Compare
56052d7 to
ee17d07
Compare
32f34cc to
664e381
Compare
| CRR_DESTINATION_LOCATION_NAME=${CRR_DESTINATION_LOCATION_NAME} \ | ||
| CRR_SOURCE_ACCOUNT_NAME=${CRR_SOURCE_ACCOUNT_NAME} \ | ||
| CRR_DESTINATION_ACCOUNT_NAME=${CRR_DESTINATION_ACCOUNT_NAME} \ | ||
| CRR_CASCADE_LOCATION_A_NAME=${CRR_CASCADE_LOCATION_A_NAME} \ |
There was a problem hiding this comment.
I know, in the past, we talked about avoiding creating too many locations but 🤔
I don't know, I think it's probably fine ?
- we still don't have that many locations
- It's probably not a bad idea to create many locations in ci, to see if it actually creates issues or not
664e381 to
c3043bf
Compare
| while (Date.now() < deadline) { | ||
| Identity.useIdentity(IdentityEnum.ACCOUNT, location); | ||
| try { | ||
| await this.createS3Client().send(new HeadObjectCommand({ Bucket: bucket, Key: objectName })); |
There was a problem hiding this comment.
Kept it light and very functional for this test : only check that the object is here, not checking further thing like specific object metadata/status. Could add, not sure it's super useful for functional tests, although I know sometimes we did it in the past
There was a problem hiding this comment.
I think we should test replication-status ; but not use if this can be check here (may be race conditions -esp. when using cascade- so the status may have intermediate value...)
There was a problem hiding this comment.
yeah there are 2 things to check :
Metadata: { 'crr-location-c-replication-status': 'PENDING' },
ReplicationStatus: 'REPLICA',
ReplicationStatus will always be REPLICA, because its based on the new isReplica boolean
But in metadata, depending how fast we check, it will be either PENDING or REPLICA, as a cascade transition will turn a replica into an object thats pending a replication.
I'm adding some check here, and i will add a step at the end to check all the status
d7560c3 to
c8df3be
Compare
c8df3be to
c70930c
Compare
| When an object "cascade-loop-obj" is put in location "crr-location-a" | ||
| Then the object should replicate to location "crr-location-b" within 300 seconds | ||
| And the object should replicate to location "crr-location-c" within 300 seconds | ||
| And the object at location "crr-location-a" should never have replication status PENDING within 30 seconds |
There was a problem hiding this comment.
should also check the "final" replication status on each location : SUCCESS / REPLICA / REPLICA ?
| When an object "cascade-loop-obj" is put in location "crr-location-a" | ||
| Then the object should replicate to location "crr-location-b" within 300 seconds | ||
| And the object should replicate to location "crr-location-c" within 300 seconds | ||
| And the object at location "crr-location-a" should never have replication status PENDING within 30 seconds |
There was a problem hiding this comment.
should we have more complex scenario, where we write on multiple points in the loop?
may not be able to control how he operations race to ensure a specific scenario, but worth adding to ensure the system is "eventually" consistent and stable?
There was a problem hiding this comment.
I came up with a more complex scenario which I think is decent :
- Loop config
- Concurrent writes on all location, writing a specific random metadata marker
=> Assert that they all converge on the same metadataMarker
Tbh, still not the perfect test but I think worth doing
| And replication is configured from location "crr-location-a" to "crr-location-b" | ||
| When an object "cascade-obj" is put in location "crr-location-a" | ||
| Then the object should replicate to location "crr-location-b" within 300 seconds | ||
| And the object should replicate to location "crr-location-c" within 300 seconds |
There was a problem hiding this comment.
should also check the "final" replication status on each location : SUCCESS / REPLICA / REPLICA ?
There was a problem hiding this comment.
Yes I'm gonna add a step at the end to check all the status
| bucketMatch: false | ||
| repoId: [] | ||
| - name: "${CRR_LOCATION_A_NAME}" | ||
| locationType: "location-scality-crr-v1" |
There was a problem hiding this comment.
nit: do we really need multiple locations to test the feature? could we not test with a single location and multiple buckets, yet get a similar result & good confidence?
There was a problem hiding this comment.
I think I discussed this with Maël too and I was thinking the same thing, technically it should be possible since the destination bucket is not bounded to the location, but in the end I think I prefer working with multiple locations as it forces us to build more realistic test scenarios
| while (Date.now() < deadline) { | ||
| Identity.useIdentity(IdentityEnum.ACCOUNT, location); | ||
| try { | ||
| await this.createS3Client().send(new HeadObjectCommand({ Bucket: bucket, Key: objectName })); |
There was a problem hiding this comment.
I think we should test replication-status ; but not use if this can be check here (may be race conditions -esp. when using cascade- so the status may have intermediate value...)
|
|
||
| Then( | ||
| 'the object at location {string} should never have replication status PENDING within {int} seconds', | ||
| { timeout: 120_000 }, |
There was a problem hiding this comment.
kind of redundant, deadline/timeout is implemented in the loop itself ?
There was a problem hiding this comment.
yeah but prefer to keep it, it could avoid ending up with 5h ci in case of a bug
0f6db0b to
b3de8ff
Compare
There was a problem hiding this comment.
Realising here I have 3 scenarios, and each new scenario pretty much covers the case of the previous scenario so in theory we could just run the last one to test the feature, I think still good to have a few scenarios
b3de8ff to
900f7de
Compare
Issue: ZENKO-5263
Crr cascaded tests, didn't add very complex scenarios, let me know if you can come up with other test ideas but really not sure. I think there is only one happy path to test, and for the potential problematic situation, there is loop which I tested, and the rest not really sure we can test easily (like concurrency on write and stale micro version id)
Wont work in ci, as deps.yaml isn't updated with cloudserver/backbeat.
BUT : "It works on my
computercodespace"