e2e/qa: remove capacity pre-filtering, add threshold-based failure#3697
Merged
Conversation
8ad7d12 to
7ac5830
Compare
Mirror the onchain semantic in qa.Test.ValidDevices: a per-type max of zero (max_unicast_users, max_multicast_publishers, max_multicast_subscribers) means the cap is not enforced. The create_core user processor only fails with MaxUnicastUsersExceeded when max > 0 && count >= max, so the QA filter should skip the per-type bucket entirely when max is zero and fall through to the aggregate users check. Fixes the regression where qa.alldevices on mainnet-beta dropped from testing 85 devices to testing 3 — 92 of 96 activated mainnet-beta devices have max_unicast_users == 0 and were silently excluded. malbeclabs/infra#1294
The QA user pubkey is now on the onchain qa_allowlist, so the smart contract bypasses all capacity limits for QA connections. Remove the client-side capacity pre-filtering (DeviceUserType, capacityFor, minCapacity, skipCapacityCheck) from ValidDevices — it was a heuristic that drifted from the onchain semantic and is no longer needed. ValidDevices now only filters out devices with "test" in their code. Also add a hint to ConnectUserUnicast error messages when a capacity error is detected, directing the operator to verify the qa-allowlist. malbeclabs/infra#1294
Replace per-device t.Errorf/assert.NoError with t.Logf so individual device failures are logged but do not fail the test. Instead, evaluate overall and per-host failure rates after all batches complete, and only fail the test if either rate exceeds --failure-threshold (default 20%).
47e6723 to
e10e14b
Compare
martinsander00
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of Changes
qa.Test.ValidDevices— the QA user pubkey is on the onchainqa_allowlist, so the smart contract bypasses capacity limits for QA connections and the test-side heuristic is unnecessaryDeviceUserType,capacityFor,minCapacity/skipCapacityCheckparameters, and the--skip-capacity-checkCLI flagConnectUserUnicastand hint to verify the qa-allowlist withdoublezero global-config qa-allowlist listt.Errorf/assert.NoErrorwitht.Logfso individual device failures are logged but do not fail the test; evaluate overall and per-host failure rates after all batches and only fail if either exceeds--failure-threshold(default 20%)Testing Verification
qa.alldevicesworkflow withdoublezero_branchinput pointing to this branchDEVICE FAILURE:log lines without marking the test as failed