Skip to content

Add integration and unit tests for monitoring JSON API endpoints#373

Open
gimballock wants to merge 1 commit into
stratum-mining:mainfrom
fossatmara:feat/329-api-integration-tests
Open

Add integration and unit tests for monitoring JSON API endpoints#373
gimballock wants to merge 1 commit into
stratum-mining:mainfrom
fossatmara:feat/329-api-integration-tests

Conversation

@gimballock
Copy link
Copy Markdown

Closes #329

Exercise every JSON REST API endpoint against live SV2 topologies and strengthen unit-test coverage for edge cases.

Integration tests (monitoring_integration.rs):

  • 15 new tests covering /, /api/v1/global, /api/v1/server, /api/v1/server/channels, /api/v1/clients, /api/v1/clients/{id}, /api/v1/clients/{id}/channels, /api/v1/sv1/clients, and /api/v1/sv1/clients/{id} for both Pool and tProxy topologies
  • Topology setup helpers (PoolWithSv2Miner, TproxyWithSv1Miner) eliminate duplicated boilerplate across tests
  • 404 endpoints assert both HTTP status code and JSON error body via new assert_api_not_found helper

Assertion helpers (prometheus_metrics_assertions.rs):

  • fetch_api_json: parse JSON API response
  • fetch_api_with_status: return (status_code, json) without panicking on non-2xx, enabling 404 testing
  • poll_until_api_field_gte: poll JSON endpoint until a field reaches a threshold (analogous to poll_until_metric_gte for Prometheus)
  • assert_api_root, assert_api_global: reusable structure checks
  • assert_api_not_found: verify HTTP 404 + error field
  • POLL_TIMEOUT constant to reduce repetition

HTTP helper (utils.rs):

  • make_get_request_with_status: returns (status, body) without panicking on 4xx responses, unlike make_get_request

Unit tests (http_server.rs):

  • 11 new tests for pagination boundaries (limit=0, offset beyond total, limit exceeding MAX_LIMIT), missing data sources returning 404, invalid/non-existent client IDs, SV1 data in global endpoint, and Prometheus metrics with no sources

Dependencies:

  • Add serde_json to integration-tests for JSON parsing

@gimballock gimballock force-pushed the feat/329-api-integration-tests branch 7 times, most recently from 85a3f7e to f7f95d2 Compare March 30, 2026 14:21
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from f7f95d2 to f3c2e96 Compare April 4, 2026 02:22
Copy link
Copy Markdown
Member

@Shourya742 Shourya742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't monitoring module have abstractions for most of these?

Comment thread integration-tests/lib/prometheus_metrics_assertions.rs
Comment thread integration-tests/lib/prometheus_metrics_assertions.rs Outdated
Comment thread integration-tests/lib/prometheus_metrics_assertions.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from f3c2e96 to 1f12145 Compare April 5, 2026 14:01
@gimballock
Copy link
Copy Markdown
Author

I think the test failure here is the target of PR #338, once we rebase off of that this failure should go away i think.

Copy link
Copy Markdown
Member

@Shourya742 Shourya742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still dont see any changes requested. Now, we have some random files in the PR

Comment thread logs/jdc.pid Outdated
Comment thread logs/jds.pid Outdated
Comment thread logs/pool.pid Outdated
Comment thread logs/tproxy.pid Outdated
Comment thread dashboard-mockups.md Outdated
Comment thread run-local.sh Outdated
Comment thread pool_scale_testing_plan.md Outdated
Comment thread vardiff-timing-data.md Outdated
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch 5 times, most recently from d7dfb76 to 65dd528 Compare April 7, 2026 01:22
@gimballock
Copy link
Copy Markdown
Author

I still dont see any changes requested. Now, we have some random files in the PR

My bad, all those extra files are removed now.

@gimballock gimballock force-pushed the feat/329-api-integration-tests branch 6 times, most recently from 1f5ce90 to 6ecedf7 Compare April 13, 2026 18:23
@gimballock
Copy link
Copy Markdown
Author

gimballock commented Apr 13, 2026

Rebased onto main and addressed all review feedback. Summary of changes in the current commit:

All review feedback addressed:

  • Typed response structsGlobalResponse, ServerResponse, ServerChannelsResponse, PaginatedResponse<T>, ClientMetadata, ClientChannelsResponse, Sv1Client defined in prometheus_metrics_assertions.rs; all integration tests use fetch_api_typed instead of stringly-typed serde_json::Value field access
  • Route constantsmod routes block in http_server.rs test module defines all route strings; used throughout all unit tests
  • Shutdown calls — all integration tests call shutdown_all! or .shutdown().await at the end
  • No topology abstractions — each test sets up its own topology explicitly
  • No assert helper wrappers — assertions inlined directly in each test
  • No stray files — only the 6 files that belong to this PR are changed
  • Rebased — single clean commit on top of current main (includes the shares_acknowledged rename from update/fix ServerStandardChannelInfo share accounting #425 and the shutdown_all! macro from Remove broadcast channel #317)

Additional cleanup done during rebase:

  • Extracted a single generic poll_until<T, F> internal helper to eliminate duplicated poll loop boilerplate across the 5 typed polling functions
  • assert_api_health now uses typed deserialization (fetch_api_typed) instead of string matching

@Shourya742 if you could take a look when you have a chance

Copy link
Copy Markdown
Collaborator

@xyephy xyephy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK
LGTM.

@xyephy
Copy link
Copy Markdown
Collaborator

xyephy commented Apr 14, 2026

Note:when running the monitoring integration tests sequentially (--test-threads=1), 9 tProxy tests fail with Connection refused on the monitoring endpoint. The first tProxy test passes but subsequent ones can't reach the monitoring server — looks like shutdown_all! returns before the port is fully released. Verified the same 10/9 pattern on main (3773dd5), so this is pre-existing and unrelated to this PR.

@gimballock
Copy link
Copy Markdown
Author

Thanks for investigating. I believe this is pre-existing and unrelated to this PR — opened #430 to track it.

Looking at the code, TPROXY_MODE and VARDIFF_ENABLED are process-wide static OnceLock<_> statics in the translator. With test-threads = 1 all tests share one process, so a second TranslatorSv2::start() call would panic inside tokio::spawn (silently swallowed), meaning the translator never starts and the test sees "Connection refused". That would explain the 10/9 pattern, and the reviewer confirmed the same pattern on 3773dd5 before any of our changes.

This PR's tests use isolated ports via get_available_address() and private Registry::new(), so I don't think we're introducing any new conflicts — but happy to dig further if you see something I'm missing.

@GitGab19
Copy link
Copy Markdown
Member

I would say we need to proceed with #338 before evaluating this PR, since #338 is touching things in the ITF as well.

@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from cf90755 to d71e1a3 Compare April 29, 2026 19:31
@gimballock
Copy link
Copy Markdown
Author

I rebased and tried to account for all the changes since last update now that #338 is merged

Copy link
Copy Markdown
Collaborator

@xyephy xyephy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK d71e1a3

Re-tested after #338 rebase

@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from d71e1a3 to f44cd52 Compare May 6, 2026 14:05
@gimballock
Copy link
Copy Markdown
Author

clean rebase w/ latest

Comment thread integration-tests/lib/prometheus_metrics_assertions.rs Outdated
Comment thread integration-tests/lib/prometheus_metrics_assertions.rs
gimballock pushed a commit to fossatmara/sv2-apps that referenced this pull request May 10, 2026
… API test, unpack race fix

Originally the third commit in this PR's series, now amended to fold
in review-feedback fixes from stratum-mining#373.

Coverage expansion (from the original commit):

* **JDC API endpoints test.** Adds `jdc_api_endpoints_with_miner`,
  parallel to the existing Pool and tProxy with-miner tests. JDC sits
  between pool and tProxy: it is an SV2 client to the pool (so
  `server` is populated on /api/v1/global) and an SV2 server to the
  tProxy (so `sv2_clients` is populated). JDC has no SV1 surface, so
  /api/v1/sv1/clients must 404. Exercises root, /api/v1/global,
  /api/v1/server, /api/v1/server/channels, /api/v1/clients,
  /api/v1/clients/{id}, /api/v1/clients/{id}/channels and the SV1 404.

* **shares_rejected field-shape regression guard.** Adds inline JSON
  asserts to the tProxy and JDC with-miner tests verifying that the
  upstream-channel detail under /api/v1/server/channels exposes both
  `shares_rejected` (number) and `shares_rejected_by_reason` (object).
  This guards against rename or type regressions of the kind PR stratum-mining#468
  surfaced. The guard intentionally lives only on roles whose channels
  use the server-side `ServerExtendedChannelInfo` struct (tProxy and
  JDC); the pool's downstream-client channels use `ExtendedChannelInfo`/
  `StandardChannelInfo`, which by design do not carry these fields.

Address review feedback — reuse production types instead of mirrors:

Production-side changes (stratum-apps):

* Make the JSON response wrapper structs in `monitoring::http_server`
  (`HealthResponse`, `ErrorResponse`, `ServerResponse`,
  `ServerChannelsResponse`, `Sv2ClientsResponse`, `Sv2ClientResponse`,
  `Sv2ClientChannelsResponse`, `Sv1ClientsResponse`) `pub` and derive
  `Deserialize` + `Debug` so they can serve both as the wire-format
  layer in handlers and as the parse target in tests and downstream
  consumers. Re-export from `monitoring::mod`.

Test-side changes:

* `integration-tests/lib/prometheus_metrics_assertions.rs`: drop the
  `Api*` mirror types entirely. The typed `poll_until_*_gte` helpers
  now return the production types directly (`GlobalInfo`,
  `Sv2ClientsResponse`, `Sv1ClientsResponse`, `ServerResponse`).
  `assert_api_health` parses into `HealthResponse`. Delete unused
  `fetch_api_json` (callers use `fetch_api_typed::<T>` instead).

* `integration-tests/tests/monitoring_integration.rs`: switch
  `fetch_api_typed` call sites to the production types
  (`Sv2ClientResponse`, `Sv2ClientChannelsResponse`, `Sv1ClientInfo`).
  The three root-endpoint tests (`/` is hand-rolled JSON in
  `handle_root` with no production struct) read `serde_json::Value`
  directly.

* `stratum-apps/src/monitoring/http_server.rs`: drop the `*Body`
  shadow types from the test module and parse responses directly into
  the production response types. Clean up three stale
  `super::super::*` paths to bare names (artifacts from before the
  consolidation refactor) and align the `routes::*` URL helpers on
  `usize` (matching the production `client_id` type).

Also restore the `///` doc comment on `assert_metric_gte` that was
accidentally dropped while adding `#[track_caller]`.

Unrelated fix kept in this commit because the new tests surfaced it:

* `integration-tests/lib/utils.rs::tarball::unpack` was writing every
  tarball to a fixed `<destination>/temp.tar.gz`. When several tests
  start a pool/tProxy in parallel, both extractions land in the same
  shared `template-provider/` cache dir and truncate each other's
  tarball mid-extraction, manifesting as `tar: ...: truncated gzip
  input`. Switch to a process-unique path under `std::env::temp_dir()`
  with an `AtomicU64` nonce, and clean up the temp file even when
  extraction fails. The remaining redundant extraction (two threads
  both pass the "is bitcoin already extracted?" guard and both unpack)
  is benign — same tarball, same destination, idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from f44cd52 to f715d3f Compare May 10, 2026 01:29
Comment thread integration-tests/lib/utils.rs Outdated
Comment thread integration-tests/lib/utils.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread integration-tests/tests/monitoring_integration.rs Outdated
Comment thread stratum-apps/src/monitoring/http_server.rs Outdated
Comment thread stratum-apps/src/monitoring/http_server.rs
gimballock pushed a commit to fossatmara/sv2-apps that referenced this pull request May 12, 2026
… API test, unpack race fix

Originally the third commit in this PR's series, now amended to fold
in review-feedback fixes from stratum-mining#373.

Coverage expansion (from the original commit):

* **JDC API endpoints test.** Adds `jdc_api_endpoints_with_miner`,
  parallel to the existing Pool and tProxy with-miner tests. JDC sits
  between pool and tProxy: it is an SV2 client to the pool (so
  `server` is populated on /api/v1/global) and an SV2 server to the
  tProxy (so `sv2_clients` is populated). JDC has no SV1 surface, so
  /api/v1/sv1/clients must 404. Exercises root, /api/v1/global,
  /api/v1/server, /api/v1/server/channels, /api/v1/clients,
  /api/v1/clients/{id}, /api/v1/clients/{id}/channels and the SV1 404.

* **shares_rejected field-shape regression guard.** Adds inline JSON
  asserts to the tProxy and JDC with-miner tests verifying that the
  upstream-channel detail under /api/v1/server/channels exposes both
  `shares_rejected` (number) and `shares_rejected_by_reason` (object).
  This guards against rename or type regressions of the kind PR stratum-mining#468
  surfaced. The guard intentionally lives only on roles whose channels
  use the server-side `ServerExtendedChannelInfo` struct (tProxy and
  JDC); the pool's downstream-client channels use `ExtendedChannelInfo`/
  `StandardChannelInfo`, which by design do not carry these fields.

Address review feedback — reuse production types instead of mirrors:

Production-side changes (stratum-apps):

* Make the JSON response wrapper structs in `monitoring::http_server`
  (`HealthResponse`, `ErrorResponse`, `ServerResponse`,
  `ServerChannelsResponse`, `Sv2ClientsResponse`, `Sv2ClientResponse`,
  `Sv2ClientChannelsResponse`, `Sv1ClientsResponse`) `pub` and derive
  `Deserialize` + `Debug` so they can serve both as the wire-format
  layer in handlers and as the parse target in tests and downstream
  consumers. Re-export from `monitoring::mod`.

Test-side changes:

* `integration-tests/lib/prometheus_metrics_assertions.rs`: drop the
  `Api*` mirror types entirely. The typed `poll_until_*_gte` helpers
  now return the production types directly (`GlobalInfo`,
  `Sv2ClientsResponse`, `Sv1ClientsResponse`, `ServerResponse`).
  `assert_api_health` parses into `HealthResponse`. Delete unused
  `fetch_api_json` (callers use `fetch_api_typed::<T>` instead).

* `integration-tests/tests/monitoring_integration.rs`: switch
  `fetch_api_typed` call sites to the production types
  (`Sv2ClientResponse`, `Sv2ClientChannelsResponse`, `Sv1ClientInfo`).
  The three root-endpoint tests (`/` is hand-rolled JSON in
  `handle_root` with no production struct) read `serde_json::Value`
  directly.

* `stratum-apps/src/monitoring/http_server.rs`: drop the `*Body`
  shadow types from the test module and parse responses directly into
  the production response types. Clean up three stale
  `super::super::*` paths to bare names (artifacts from before the
  consolidation refactor) and align the `routes::*` URL helpers on
  `usize` (matching the production `client_id` type).

Also restore the `///` doc comment on `assert_metric_gte` that was
accidentally dropped while adding `#[track_caller]`.

Unrelated fix kept in this commit because the new tests surfaced it:

* `integration-tests/lib/utils.rs::tarball::unpack` was writing every
  tarball to a fixed `<destination>/temp.tar.gz`. When several tests
  start a pool/tProxy in parallel, both extractions land in the same
  shared `template-provider/` cache dir and truncate each other's
  tarball mid-extraction, manifesting as `tar: ...: truncated gzip
  input`. Switch to a process-unique path under `std::env::temp_dir()`
  with an `AtomicU64` nonce, and clean up the temp file even when
  extraction fails. The remaining redundant extraction (two threads
  both pass the "is bitcoin already extracted?" guard and both unpack)
  is benign — same tarball, same destination, idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from f715d3f to 2b6d0be Compare May 12, 2026 14:02
@GitGab19
Copy link
Copy Markdown
Member

@gimballock can you please reword/squash commits here?

In both titles and descriptions we have some stale changes IMO.

Adds integration and unit-test coverage for the monitoring HTTP JSON
API and Prometheus surfaces of pool, tProxy, and JDC. Also exposes the
existing response types in `stratum_apps::monitoring` as the canonical
parse targets so tests don't reimplement them.

Integration tests (`integration-tests/tests/monitoring_integration.rs`):

* Topology-shaped tests rather than per-endpoint. Each test spins up
  its topology once and exercises every relevant endpoint:
  - `pool_api_endpoints_static` / `pool_api_endpoints_with_miner`
  - `tproxy_api_endpoints_static` / `tproxy_api_endpoints_with_miner`
  - `jdc_api_endpoints_with_miner`

  The JDC test in particular is interesting because JDC sits between
  pool and tProxy: it is an SV2 client to the pool (so `server` is
  populated on /api/v1/global) and an SV2 server to the tProxy (so
  `sv2_clients` is populated). JDC has no SV1 surface, so
  /api/v1/sv1/clients must 404.

* Cross-validate JSON API and Prometheus. The two with-miner tests
  assert matching Prometheus counters via the `Metric::with_labels`
  selector, catching drift between the two surfaces.

* `shares_rejected` field-shape regression guard. Inline JSON asserts
  on the tProxy and JDC tests verify that the upstream-channel detail
  under /api/v1/server/channels exposes both `shares_rejected`
  (number) and `shares_rejected_by_reason` (object). This guards
  against rename / type regressions of the kind PR stratum-mining#468 surfaced.
  The guard intentionally lives only on roles whose channels use
  `ServerExtendedChannelInfo` (tProxy and JDC); the pool's
  downstream-client channels use `ExtendedChannelInfo` /
  `StandardChannelInfo`, which by design do not carry these fields.

Unit tests (`stratum-apps/src/monitoring/http_server.rs`):

* Per-endpoint unit tests against `build_test_app` with mock data,
  exercising 200/404 paths, pagination edges, and the
  stale-channel-label cleanup behaviour.

Test infrastructure (`integration-tests/lib/prometheus_metrics_assertions.rs`):

* `fetch_api`, `fetch_api_typed::<T>`, `fetch_api_with_status` plus
  typed `poll_until_*_gte` helpers for `/api/v1/global`,
  `/api/v1/clients`, `/api/v1/server`, and `/api/v1/sv1/clients`.
* `Metric::with_labels` selector for matching Prometheus exposition
  lines with order- and subset-tolerant label matching.
* `assert_metric_*` / `assert_metric_not_present` / `assert_uptime` /
  `assert_api_health`, all `#[track_caller]` so panic locations point
  at the call site.

Production-side changes (`stratum-apps`):

* Make the JSON response wrapper structs in `monitoring::http_server`
  (`HealthResponse`, `ErrorResponse`, `ServerResponse`,
  `ServerChannelsResponse`, `Sv2ClientsResponse`, `Sv2ClientResponse`,
  `Sv2ClientChannelsResponse`, `Sv1ClientsResponse`) `pub` and derive
  `Deserialize` + `Debug` so they serve as both the wire-format layer
  in handlers and the parse target in tests and downstream consumers.
  Re-export from `monitoring::mod`. Tests deserialize directly into
  these production types rather than redeclaring shadow copies. The
  one exception is the `/` root endpoint, which is hand-rolled JSON
  in `handle_root` with no production struct; tests parse it as
  `serde_json::Value`.
@gimballock gimballock force-pushed the feat/329-api-integration-tests branch from 2b6d0be to 94edcb3 Compare May 12, 2026 14:55
@GitGab19
Copy link
Copy Markdown
Member

@gimballock can you rebase this PR?

Once you do that, I'll merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand monitoring API test coverage (unit + integration)

4 participants