Skip to content

Add integration and unit tests for monitoring JSON API endpoints#373

Open
gimballock wants to merge 3 commits intostratum-mining:mainfrom
fossatmara:feat/329-api-integration-tests
Open

Add integration and unit tests for monitoring JSON API endpoints#373
gimballock wants to merge 3 commits intostratum-mining:mainfrom
fossatmara:feat/329-api-integration-tests

Conversation

@gimballock
Copy link
Copy Markdown

Closes #329

Exercise every JSON REST API endpoint against live SV2 topologies and strengthen unit-test coverage for edge cases.

Integration tests (monitoring_integration.rs):

  • 15 new tests covering /, /api/v1/global, /api/v1/server, /api/v1/server/channels, /api/v1/clients, /api/v1/clients/{id}, /api/v1/clients/{id}/channels, /api/v1/sv1/clients, and /api/v1/sv1/clients/{id} for both Pool and tProxy topologies
  • Topology setup helpers (PoolWithSv2Miner, TproxyWithSv1Miner) eliminate duplicated boilerplate across tests
  • 404 endpoints assert both HTTP status code and JSON error body via new assert_api_not_found helper

Assertion helpers (prometheus_metrics_assertions.rs):

  • fetch_api_json: parse JSON API response
  • fetch_api_with_status: return (status_code, json) without panicking on non-2xx, enabling 404 testing
  • poll_until_api_field_gte: poll JSON endpoint until a field reaches a threshold (analogous to poll_until_metric_gte for Prometheus)
  • assert_api_root, assert_api_global: reusable structure checks
  • assert_api_not_found: verify HTTP 404 + error field
  • POLL_TIMEOUT constant to reduce repetition

HTTP helper (utils.rs):

  • make_get_request_with_status: returns (status, body) without panicking on 4xx responses, unlike make_get_request

Unit tests (http_server.rs):

  • 11 new tests for pagination boundaries (limit=0, offset beyond total, limit exceeding MAX_LIMIT), missing data sources returning 404, invalid/non-existent client IDs, SV1 data in global endpoint, and Prometheus metrics with no sources

Dependencies:

  • Add serde_json to integration-tests for JSON parsing

@gimballock gimballock force-pushed the feat/329-api-integration-tests branch 7 times, most recently from 85a3f7e to f7f95d2 Compare March 30, 2026 14:21
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from f7f95d2 to f3c2e96 Compare April 4, 2026 02:22
Copy link
Copy Markdown
Member

@Shourya742 Shourya742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't monitoring module have abstractions for most of these?

Comment thread integration-tests/lib/prometheus_metrics_assertions.rs
Comment thread integration-tests/lib/prometheus_metrics_assertions.rs Outdated
Comment thread integration-tests/lib/prometheus_metrics_assertions.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from f3c2e96 to 1f12145 Compare April 5, 2026 14:01
@gimballock
Copy link
Copy Markdown
Author

I think the test failure here is the target of PR #338, once we rebase off of that this failure should go away i think.

Copy link
Copy Markdown
Member

@Shourya742 Shourya742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still dont see any changes requested. Now, we have some random files in the PR

Comment thread logs/jdc.pid Outdated
Comment thread logs/jds.pid Outdated
Comment thread logs/pool.pid Outdated
Comment thread logs/tproxy.pid Outdated
Comment thread dashboard-mockups.md Outdated
Comment thread run-local.sh Outdated
Comment thread pool_scale_testing_plan.md Outdated
Comment thread vardiff-timing-data.md Outdated
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch 5 times, most recently from d7dfb76 to 65dd528 Compare April 7, 2026 01:22
@gimballock
Copy link
Copy Markdown
Author

I still dont see any changes requested. Now, we have some random files in the PR

My bad, all those extra files are removed now.

@gimballock gimballock force-pushed the feat/329-api-integration-tests branch 6 times, most recently from 1f5ce90 to 6ecedf7 Compare April 13, 2026 18:23
@gimballock
Copy link
Copy Markdown
Author

gimballock commented Apr 13, 2026

Rebased onto main and addressed all review feedback. Summary of changes in the current commit:

All review feedback addressed:

  • Typed response structsGlobalResponse, ServerResponse, ServerChannelsResponse, PaginatedResponse<T>, ClientMetadata, ClientChannelsResponse, Sv1Client defined in prometheus_metrics_assertions.rs; all integration tests use fetch_api_typed instead of stringly-typed serde_json::Value field access
  • Route constantsmod routes block in http_server.rs test module defines all route strings; used throughout all unit tests
  • Shutdown calls — all integration tests call shutdown_all! or .shutdown().await at the end
  • No topology abstractions — each test sets up its own topology explicitly
  • No assert helper wrappers — assertions inlined directly in each test
  • No stray files — only the 6 files that belong to this PR are changed
  • Rebased — single clean commit on top of current main (includes the shares_acknowledged rename from update/fix ServerStandardChannelInfo share accounting #425 and the shutdown_all! macro from Remove broadcast channel #317)

Additional cleanup done during rebase:

  • Extracted a single generic poll_until<T, F> internal helper to eliminate duplicated poll loop boilerplate across the 5 typed polling functions
  • assert_api_health now uses typed deserialization (fetch_api_typed) instead of string matching

@Shourya742 if you could take a look when you have a chance

Copy link
Copy Markdown
Collaborator

@xyephy xyephy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK
LGTM.

@xyephy
Copy link
Copy Markdown
Collaborator

xyephy commented Apr 14, 2026

Note:when running the monitoring integration tests sequentially (--test-threads=1), 9 tProxy tests fail with Connection refused on the monitoring endpoint. The first tProxy test passes but subsequent ones can't reach the monitoring server — looks like shutdown_all! returns before the port is fully released. Verified the same 10/9 pattern on main (3773dd5), so this is pre-existing and unrelated to this PR.

@gimballock
Copy link
Copy Markdown
Author

Thanks for investigating. I believe this is pre-existing and unrelated to this PR — opened #430 to track it.

Looking at the code, TPROXY_MODE and VARDIFF_ENABLED are process-wide static OnceLock<_> statics in the translator. With test-threads = 1 all tests share one process, so a second TranslatorSv2::start() call would panic inside tokio::spawn (silently swallowed), meaning the translator never starts and the test sees "Connection refused". That would explain the 10/9 pattern, and the reviewer confirmed the same pattern on 3773dd5 before any of our changes.

This PR's tests use isolated ports via get_available_address() and private Registry::new(), so I don't think we're introducing any new conflicts — but happy to dig further if you see something I'm missing.

@GitGab19
Copy link
Copy Markdown
Member

I would say we need to proceed with #338 before evaluating this PR, since #338 is touching things in the ITF as well.

Eric Price added 3 commits April 29, 2026 14:27
- Add typed response structs for JSON API parsing (minimal fields only)
- Add typed polling functions to replace stringly-typed JSON access
- Add prometheus_metrics_assertions module with polling helpers
- Add #[track_caller] to assert helpers for better debugging
- Add shutdown calls to all integration tests
- Use POLL_TIMEOUT constant instead of magic numbers
- Add unit tests for http_server.rs endpoints
…ename lib types

Quality refactors of the new monitoring API integration tests, scoped
strictly to the test infrastructure (no production API changes):

* **Consolidate API tests.** The 15 narrow per-endpoint API tests
  collapse to 4 broader topology-shaped tests
  (`pool_api_endpoints_static`, `pool_api_endpoints_with_miner`,
  `tproxy_api_endpoints_static`, `tproxy_api_endpoints_with_miner`),
  each spinning up its topology once and exercising every relevant
  endpoint. Roughly halves CI time for these tests with no loss of
  coverage.

* **Cross-validate JSON API and Prometheus.** The two with-miner tests
  now also assert on the matching Prometheus counters using the
  `Metric::with_labels` selector, catching drift between the two
  surfaces.

* **Rename lib typed structs with an `Api*` prefix** (`ApiGlobalResponse`,
  `ApiServerSummary`, `ApiClientMetadata`, etc.) to remove the silent
  collision with the production response types of the same names defined
  inside `stratum_apps::monitoring::http_server`.

* **Unify polling timeout.** Drop `POLL_TIMEOUT` from the lib; all
  polling callers use the test-file-local `METRIC_POLL_TIMEOUT` (5 s).

* **Drop dead comment** in the SV1-client lookup test.

A separate follow-up will consider exposing the production response
types (currently private to `http_server`) as a public API for
downstream consumers; this PR keeps the test-only mirror types.
Coverage expansion on top of the consolidation refactor:

* **JDC API endpoints test.** Adds `jdc_api_endpoints_with_miner`,
  parallel to the existing Pool and tProxy with-miner tests. JDC sits
  between pool and tProxy: it is an SV2 client to the pool (so
  `server` is populated on /api/v1/global) and an SV2 server to the
  tProxy (so `sv2_clients` is populated). JDC has no SV1 surface, so
  /api/v1/sv1/clients must 404. Exercises root, /api/v1/global,
  /api/v1/server, /api/v1/server/channels, /api/v1/clients,
  /api/v1/clients/{id}, /api/v1/clients/{id}/channels and the SV1 404.

* **shares_rejected field-shape regression guard.** Adds inline JSON
  asserts to the tProxy and JDC with-miner tests verifying that the
  upstream-channel detail under /api/v1/server/channels exposes both
  `shares_rejected` (number) and `shares_rejected_by_reason` (object).
  This guards against rename or type regressions of the kind PR stratum-mining#468
  surfaced. The guard intentionally lives only on roles whose channels
  use the server-side `ServerExtendedChannelInfo` struct (tProxy and
  JDC); the pool's downstream-client channels use `ExtendedChannelInfo`/
  `StandardChannelInfo`, which by design do not carry these fields.
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from cf90755 to d71e1a3 Compare April 29, 2026 19:31
@gimballock
Copy link
Copy Markdown
Author

I rebased and tried to account for all the changes since last update now that #338 is merged

Copy link
Copy Markdown
Collaborator

@xyephy xyephy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK d71e1a3

Re-tested after #338 rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand monitoring API test coverage (unit + integration)

4 participants