Skip to content

e2e/qa: remove capacity pre-filtering, add threshold-based failure#3697

Merged
nikw9944 merged 6 commits into
mainfrom
nikw9944/infra-1294
May 22, 2026
Merged

e2e/qa: remove capacity pre-filtering, add threshold-based failure#3697
nikw9944 merged 6 commits into
mainfrom
nikw9944/infra-1294

Conversation

@nikw9944
Copy link
Copy Markdown
Contributor

@nikw9944 nikw9944 commented May 13, 2026

Summary of Changes

  • Remove client-side capacity pre-filtering from qa.Test.ValidDevices — the QA user pubkey is on the onchain qa_allowlist, so the smart contract bypasses capacity limits for QA connections and the test-side heuristic is unnecessary
  • Delete DeviceUserType, capacityFor, minCapacity/skipCapacityCheck parameters, and the --skip-capacity-check CLI flag
  • Detect capacity errors in ConnectUserUnicast and hint to verify the qa-allowlist with doublezero global-config qa-allowlist list
  • Replace per-device t.Errorf/assert.NoError with t.Logf so individual device failures are logged but do not fail the test; evaluate overall and per-host failure rates after all batches and only fail if either exceeds --failure-threshold (default 20%)
  • Fixes malbeclabs/infra#1294

Testing Verification

  • Live-verified against mainnet-beta via qa.alldevices workflow with doublezero_branch input pointing to this branch
  • Individual device failures appear as DEVICE FAILURE: log lines without marking the test as failed
  • Overall and per-host failure rates are logged after all batches complete; test fails only when a rate exceeds the threshold

@nikw9944 nikw9944 changed the title e2e/qa: skip per-type capacity check when onchain max is zero e2e/qa: remove client-side capacity pre-filtering from ValidDevices May 21, 2026
@nikw9944 nikw9944 force-pushed the nikw9944/infra-1294 branch from 8ad7d12 to 7ac5830 Compare May 21, 2026 23:06
@nikw9944 nikw9944 changed the title e2e/qa: remove client-side capacity pre-filtering from ValidDevices e2e/qa: remove capacity pre-filtering, add threshold-based failure May 22, 2026
nikw9944 added 6 commits May 22, 2026 18:58
Mirror the onchain semantic in qa.Test.ValidDevices: a per-type max of zero (max_unicast_users, max_multicast_publishers, max_multicast_subscribers) means the cap is not enforced. The create_core user processor only fails with MaxUnicastUsersExceeded when max > 0 && count >= max, so the QA filter should skip the per-type bucket entirely when max is zero and fall through to the aggregate users check.

Fixes the regression where qa.alldevices on mainnet-beta dropped from testing 85 devices to testing 3 — 92 of 96 activated mainnet-beta devices have max_unicast_users == 0 and were silently excluded.

malbeclabs/infra#1294
The QA user pubkey is now on the onchain qa_allowlist, so the smart contract bypasses all capacity limits for QA connections. Remove the client-side capacity pre-filtering (DeviceUserType, capacityFor, minCapacity, skipCapacityCheck) from ValidDevices — it was a heuristic that drifted from the onchain semantic and is no longer needed.

ValidDevices now only filters out devices with "test" in their code.

Also add a hint to ConnectUserUnicast error messages when a capacity error is detected, directing the operator to verify the qa-allowlist.

malbeclabs/infra#1294
Replace per-device t.Errorf/assert.NoError with t.Logf so individual device failures are logged but do not fail the test. Instead, evaluate overall and per-host failure rates after all batches complete, and only fail the test if either rate exceeds --failure-threshold (default 20%).
@nikw9944 nikw9944 force-pushed the nikw9944/infra-1294 branch from 47e6723 to e10e14b Compare May 22, 2026 18:59
@nikw9944 nikw9944 marked this pull request as ready for review May 22, 2026 19:03
@nikw9944 nikw9944 merged commit 7c9b115 into main May 22, 2026
37 of 38 checks passed
@nikw9944 nikw9944 deleted the nikw9944/infra-1294 branch May 22, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants