DAOS-17519 test: Automate dlck testing (basic/fault_injection)#17307
DAOS-17519 test: Automate dlck testing (basic/fault_injection)#17307rpadma2 wants to merge 11 commits into
Conversation
Test-tag: DlckBasicFaultTest DlckBasicTest Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
|
Ticket title is 'Automation - Basic dlck test: scan the DAOS system by running the dlck tool.' |
Test-tag: DlckBasicTest DlckBasicFaultTest Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: false Skip-unit-test: true Skip-fault-injection-test: true Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
| 'id': '131584', | ||
| 'probability_x': '100', | ||
| 'probability_y': '100', | ||
| 'interval': '2'}, |
There was a problem hiding this comment.
IMHO without a comment it is hard to understand why the interval value is as it is.
| 'interval': '2'}, | |
| 'interval': '2'}, # skip sys_db |
| 'id': '131584', | ||
| 'probability_x': '100', | ||
| 'probability_y': '100', | ||
| 'interval': '2'}, |
There was a problem hiding this comment.
Why you decided to skip the max_faults parameter for a number of faults?
| 'interval': '2'}, | |
| 'interval': '2', | |
| 'max_faults': '1'}, |
| 'id': '131586', | ||
| 'probability_x': '100', | ||
| 'probability_y': '100', | ||
| 'interval': '2'}, |
There was a problem hiding this comment.
The same here. Please apply to all instances below.
| 'interval': '2'}, | |
| 'interval': '2'}, # skip sys_db |
| 'probability_x': '100', | ||
| 'probability_y': '100', |
There was a problem hiding this comment.
As far as I know these are defaults so adding them to all of the records just bloats the code with no added value.
| 'probability_x': '100', | |
| 'probability_y': '100', |
| 'DAOS_FAULT_BTREE_OPEN_INV_CLASS': { | ||
| 'id': '131590', | ||
| 'probability_x': '100', | ||
| 'probability_y': '100', | ||
| 'interval': '28', | ||
| 'max_faults': '1'}, |
There was a problem hiding this comment.
This fault can be fine-tuned to hit either the containers tree (interval = 28) or the gc tree (interval = 29). Hence it seems we need two records with distinctive keys. I am sorry it was not obvious from the fault_injection_dlck.yaml file.
And IMHO we really need to give comments here explaining where 28 and 29 come from. Otherwise you won't reverse-engineer their meaning and considering these values are fine-tuned it could be crucial to adjust them in the future.
| 'DAOS_FAULT_BTREE_OPEN_INV_CLASS': { | |
| 'id': '131590', | |
| 'probability_x': '100', | |
| 'probability_y': '100', | |
| 'interval': '28', | |
| 'max_faults': '1'}, | |
| 'DAOS_FAULT_BTREE_OPEN_INV_CLASS_28': { | |
| 'id': '131590', | |
| 'probability_x': '100', | |
| 'probability_y': '100', | |
| 'interval': '28', # containers tree fine-tuned | |
| 'max_faults': '1'}, | |
| 'DAOS_FAULT_BTREE_OPEN_INV_CLASS_29': { | |
| 'id': '131590', | |
| 'probability_x': '100', | |
| 'probability_y': '100', | |
| 'interval': '29', # gc tree fine-tuned | |
| 'max_faults': '1'}, |
| if self.server_managers[0].manager.job.using_control_metadata: | ||
| dlck_cmd = DlckCommand(host, self.bin, pool_uuids[0], nvme_conf=nvme_conf, | ||
| storage_mount=scm_mount, env_str=env_str) | ||
| else: | ||
| dlck_cmd = DlckCommand(host, self.bin, pool_uuids[0], storage_mount=scm_mount, | ||
| env_str=env_str) |
There was a problem hiding this comment.
The same as above. There is no need to have an if just to provide nvme_conf=None for some cases. Please reduce.
| result = dlck_cmd.run() | ||
| if not result.passed: | ||
| errors.append(f"dlck failed on {result.failed_hosts}") | ||
| self.log.info("dlck basic test output:\n%s", result) |
There was a problem hiding this comment.
The same as with dumping the contents of fault injection file you are processing and printing the command result twice in this code. Please write a helper function.
| self.log.info("dlck basic test output:\n%s", result) | ||
| dmg.system_start() | ||
| if not errors: | ||
| self.fail("No Errors detected:\n{}".format("\n".join(errors))) |
There was a problem hiding this comment.
It is a very elaborate way of printing an empty list. 😉
| self.fail("No Errors detected:\n{}".format("\n".join(errors))) | |
| self.fail("No Errors detected.") |
| 1: | ||
| targets: 4 | ||
| storage: auto |
There was a problem hiding this comment.
The same here. A single engine should be enough.
There was a problem hiding this comment.
Oh, I see... I can reduce it to single engine testing.
| result = dlck_cmd.run() | ||
| if not result.passed: | ||
| errors.append(f"dlck failed on {result.failed_hosts}") | ||
| self.log.info("dlck basic test output:\n%s", result) |
There was a problem hiding this comment.
| self.log.info("dlck basic test output:\n%s", result) | |
| self.log.info(f"dlck basic test output:\n{result}") |
Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: false Skip-unit-test: true Skip-fault-injection-test: true Skip-nlt: true Test-tag: DlckBasicFaultTest DlckBasicTest
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17307/3/execution/node/932/log |
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17307/3/testReport/ |
Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: false Skip-unit-test: true Skip-fault-injection-test: true Skip-nlt: true Test-tag: DlckBasicFaultTest DlckBasicTest Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17307/4/execution/node/966/log |
Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: false Skip-unit-test: true Skip-fault-injection-test: true Skip-nlt: true Test-tag: DlckBasicFaultTest DlckBasicTest Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: false Skip-unit-test: true Skip-fault-injection-test: true Skip-nlt: true Test-tag: DlckBasicFaultTest DlckBasicTest Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17307/5/execution/node/988/log |
Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: false Skip-unit-test: true Skip-fault-injection-test: true Skip-nlt: true Test-tag: DlckBasicFaultTest DlckBasicTest Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17307/6/execution/node/867/log |
Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: false Skip-unit-test: true Skip-fault-injection-test: true Skip-nlt: true Test-tag: DlckBasicFaultTest DlckBasicTest Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17307/7/execution/node/866/log |
Skip-func-hw-test-medium: false
Skip-func-hw-test-medium-md-on-ssd: false
Skip-unit-test: true
Skip-fault-injection-test: true
Skip-nlt: true
Test-tag: DlckBasicFaultTest DlckBasicTest
Steps for the author:
After all prior steps are complete: