Skip to content

fix: enforce Prometheus metric label cardinality limits (#967)#1017

Open
euniceamoni wants to merge 1 commit into
solutions-plug:mainfrom
euniceamoni:fix/967-prometheus-cardinality
Open

fix: enforce Prometheus metric label cardinality limits (#967)#1017
euniceamoni wants to merge 1 commit into
solutions-plug:mainfrom
euniceamoni:fix/967-prometheus-cardinality

Conversation

@euniceamoni

@euniceamoni euniceamoni commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Closes #967


What this PR does

  • Fixes a compile bug where observe_request (3-arg: route, status_code, duration) was called with only 2 arguments in handlers.rs
  • Fixes a compile bug where record_pool_metrics was called on the Metrics struct but never defined
  • Adds normalize_label/normalize_label_values helpers that sanitise every label value before it reaches the prometheus registry — this applies a hard cap of 48 characters and appends _hotlbl for overflow values
  • All public observation methods on Metrics now call these normalisers
  • Adds record_pool_metrics(max, idle) to Metrics so pool gauges are emitted from a single call site with a bounded label value
  • Adds unit tests covering normalization, full metric construction and cardinality assertion
  • Adds services/api/METRICS.md documenting the complete cardinality policy and a hard-ban list

Files changed

  • `services/api/src/metrics.rs` — helpers, normalisation, record_pool_metrics, unit tests
  • `services/api/src/handlers.rs` — fix observe_request call sites to pass status code + duration as f64
  • `services/api/METRICS.md` — new: cardinality policy and label inventory

Risk

This changes the `route` label values for `http_request_duration_seconds` to be normalised. All currently emitted values are present in the allow-list in METRICS.md as known-bounded; queries that group by `route` must account for the lowercase-plus-underscore convention (they already were, since Prometheus labels are case-sensitive).

…ug#967)

- Fix observe_request signature: status_code is u16, duration is f64
  (call sites were passing Duration which would not compile)
- Fix observe_request callers in statistics, featured_markets, content
  to pass explicit status codes
- Fix missing record_pool_metrics method on Metrics struct
  (was called from db.rs and handlers.rs but never defined)
- Add normalize_label / normalize_label_values helpers to sanitise
  every label value before it reaches the prometheus registry
- Add _hotlbl suffix for values exceeding 48 chars so queries can
  bucket them and ops can track potential cardinality violations
- Add unit tests for normalization, construction, record_pool_metrics
  and label cardinality assertions
- Add services/api/METRICS.md documenting the full cardinality policy
  and the hard-ban list (IPs, user IDs, market IDs, tx hashes, etc.)
@drips-wave

drips-wave Bot commented Jun 27, 2026

Copy link
Copy Markdown

@euniceamoni Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prometheus metric labels have no cardinality cap — high-cardinality labels can OOM Prometheus

1 participant