Skip to content

DAOS-18235 test: pool/verify_space.py - Remove system_ram_reserved#18298

Merged
phender merged 3 commits into
masterfrom
makito/DAOS-18235
May 28, 2026
Merged

DAOS-18235 test: pool/verify_space.py - Remove system_ram_reserved#18298
phender merged 3 commits into
masterfrom
makito/DAOS-18235

Conversation

@shimizukko
Copy link
Copy Markdown
Contributor

@shimizukko shimizukko commented May 20, 2026

The test intermittently fails during storage format due to "Available memory (RAM) insufficient" error. This is because too little memory is allocated to non-DAOS system processes due to "system_ram_reserved: 1" (unit is GiB) in server config (test yaml). If we increase it, the system would have more space. The error message indicated that "want 159 GiB RAM but only have 158 GiB", increasing it by 1 should resolve, but use the default 64GiB by removing the field because that's the common real-world use case.

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_verify_pool_space
Test-repeat: 5

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

The test intermittently fails during storage format
due to "Available memory (RAM) insufficient" error.
This is because too much memory is allocated to non-DAOS
system processes. The system usage reservation was set
as "system_ram_reserved: 1" in server config (test yaml),
so remove it.

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_verify_pool_space
Test-repeat: 5
Signed-off-by: Makito Kano <makito.kano@hpe.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Ticket title is 'pool/verify_space.py:VerifyPoolSpace.test_verify_pool_space - Available memory (RAM) insufficient for configured 177 GiB ram-disk size'
Status is 'In Review'
Labels: '2.8.0tb1,ci_2.8_weekly,ci_master_weekly,weekly_test'
https://daosio.atlassian.net/browse/DAOS-18235

@shimizukko shimizukko marked this pull request as ready for review May 20, 2026 07:35
@shimizukko shimizukko requested review from a team as code owners May 20, 2026 07:35
daltonbohning
daltonbohning previously approved these changes May 20, 2026
phender
phender previously approved these changes May 20, 2026
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_verify_pool_space
Test-repeat: 5
Signed-off-by: Makito Kano <makito.kano@hpe.com>
@shimizukko shimizukko dismissed stale reviews from phender and daltonbohning via 7bede57 May 20, 2026 22:52
@shimizukko shimizukko changed the title DAOS-18235 test: pool/verify_space.py - Remove system_ram_reserved: 1 DAOS-18235 test: pool/verify_space.py - Increase system_ram_reserved to 5 May 21, 2026
@shimizukko
Copy link
Copy Markdown
Contributor Author

shimizukko commented May 21, 2026

Removing system_ram_reserved field would allocate the default 64GiB to system, but we just need 1 GiB more, so use 5 instead.

Comment thread src/tests/ftest/pool/verify_space.yaml Outdated
engines:
0:
storage: auto
system_ram_reserved: 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't increasing the reserved memory make the "Available memory (RAM) insufficient" error worse? We should also target testing on the hdr-[142-145] cluster - where this failure has been seen a few times.

Copy link
Copy Markdown
Contributor

@phender phender May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought this up in the MD on SSD WG and the recommendation is to not specify a system_ram_reserved value for HW tests. Since this will result in the default value being used, it should also be a better representation of a real-world test.

We have passing results in https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18298/1/artifact/Functional%20Hardware%20Medium%20MD%20on%20SSD/pool/verify_space.py with system_ram_reserved removed that ran on the hdr-[142-145] cluster.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @tanabarr and it sounds like "Available memory (RAM) insufficient" error is coming from the system (i.e., non-DAOS), so if we increase system_ram_reserved, the system would have more RAM and it'll be resolved. I removed the field. Thanks.

Copy link
Copy Markdown
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Phil that we should use the default if that works

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_verify_pool_space
Test-repeat: 5
Signed-off-by: Makito Kano <makito.kano@hpe.com>
@shimizukko shimizukko changed the title DAOS-18235 test: pool/verify_space.py - Increase system_ram_reserved to 5 DAOS-18235 test: pool/verify_space.py - Remove system_ram_reserved May 23, 2026
@shimizukko shimizukko added the waiting-for-merge-approval Waiting for merge approval label May 26, 2026
@phender phender removed the waiting-for-merge-approval Waiting for merge approval label May 28, 2026
@phender phender requested a review from a team May 28, 2026 13:49
@phender phender added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label May 28, 2026
@phender phender merged commit 934100b into master May 28, 2026
35 checks passed
@phender phender deleted the makito/DAOS-18235 branch May 28, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed.

Development

Successfully merging this pull request may close these issues.

3 participants