fix: enforce Prometheus metric label cardinality limits (#967)#1017
Open
euniceamoni wants to merge 1 commit into
Open
fix: enforce Prometheus metric label cardinality limits (#967)#1017euniceamoni wants to merge 1 commit into
euniceamoni wants to merge 1 commit into
Conversation
…ug#967) - Fix observe_request signature: status_code is u16, duration is f64 (call sites were passing Duration which would not compile) - Fix observe_request callers in statistics, featured_markets, content to pass explicit status codes - Fix missing record_pool_metrics method on Metrics struct (was called from db.rs and handlers.rs but never defined) - Add normalize_label / normalize_label_values helpers to sanitise every label value before it reaches the prometheus registry - Add _hotlbl suffix for values exceeding 48 chars so queries can bucket them and ops can track potential cardinality violations - Add unit tests for normalization, construction, record_pool_metrics and label cardinality assertions - Add services/api/METRICS.md documenting the full cardinality policy and the hard-ban list (IPs, user IDs, market IDs, tx hashes, etc.)
|
@euniceamoni Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits. You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #967
What this PR does
observe_request(3-arg: route, status_code, duration) was called with only 2 arguments inhandlers.rsrecord_pool_metricswas called on theMetricsstruct but never definednormalize_label/normalize_label_valueshelpers that sanitise every label value before it reaches the prometheus registry — this applies a hard cap of 48 characters and appends_hotlblfor overflow valuesMetricsnow call these normalisersrecord_pool_metrics(max, idle)toMetricsso pool gauges are emitted from a single call site with a bounded label valueservices/api/METRICS.mddocumenting the complete cardinality policy and a hard-ban listFiles changed
Risk
This changes the `route` label values for `http_request_duration_seconds` to be normalised. All currently emitted values are present in the allow-list in METRICS.md as known-bounded; queries that group by `route` must account for the lowercase-plus-underscore convention (they already were, since Prometheus labels are case-sensitive).