Skip to content

DAOS-18618 pool: fix pool destroy hang during stop (#18284)#18325

Merged
NiuYawei merged 1 commit into
release/2.8from
shilongw/DAOS-18618-pool-stop-2.8
May 25, 2026
Merged

DAOS-18618 pool: fix pool destroy hang during stop (#18284)#18325
NiuYawei merged 1 commit into
release/2.8from
shilongw/DAOS-18618-pool-stop-2.8

Conversation

@wangshilong
Copy link
Copy Markdown
Contributor

Pool destroy can hang with an active scrubber because two scrubber cleanup bugs combine during teardown:

  1. cont_iter_is_loaded_cb() can exit without calling sc_cont_teardown(), leaving sc_scrubbing set and blocking destroy on sc_scrub_cond.
  2. sc_ensure_containers_are_loaded() can loop forever on DER_CONT_NONEXIST after the container has already been removed.

Fix the teardown path so scrubber state is always released, skip removed containers, and stop scrubber work promptly when the pool is stopping.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Pool destroy can hang with an active scrubber because two scrubber
cleanup bugs combine during teardown:

1. cont_iter_is_loaded_cb() can exit without calling
   sc_cont_teardown(), leaving sc_scrubbing set and blocking destroy
   on sc_scrub_cond.
2. sc_ensure_containers_are_loaded() can loop forever on
   DER_CONT_NONEXIST after the container has already been removed.

Fix the teardown path so scrubber state is always released, skip
removed containers, and stop scrubber work promptly when the pool is
stopping.

Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
@wangshilong wangshilong marked this pull request as ready for review May 22, 2026 08:39
@wangshilong wangshilong requested review from a team as code owners May 22, 2026 08:39
@github-actions
Copy link
Copy Markdown

Ticket title is 'daos_test/suite.py:DaosCoreTest.test_daos_rebuild_simple - test timeout w/ DER_CSUM err'
Status is 'Reopened'
Labels: '2.8.0tb5,ci_2.6_daily,ci_2.8_daily,ci_master_daily,pr_test,scrubbed_2.8,test_2.8'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-18618

@github-actions github-actions Bot added the priority Ticket has high priority (automatically managed) label May 22, 2026
@wangshilong wangshilong added clean-cherry-pick Cherry-pick from another branch that did not require additional edits and removed priority Ticket has high priority (automatically managed) labels May 22, 2026
@wangshilong wangshilong requested review from NiuYawei and liw May 22, 2026 08:39
@wangshilong wangshilong requested a review from a team May 25, 2026 01:55
@NiuYawei NiuYawei merged commit a99acbd into release/2.8 May 25, 2026
42 checks passed
@NiuYawei NiuYawei deleted the shilongw/DAOS-18618-pool-stop-2.8 branch May 25, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean-cherry-pick Cherry-pick from another branch that did not require additional edits

Development

Successfully merging this pull request may close these issues.

3 participants