Skip to content

feat(observability): add db query duration histogram and pool exhaustion monitoring#2

Open
winniebyte wants to merge 1 commit into
mainfrom
feature/964-965-db-query-histogram-pool-exhaustion-alert
Open

feat(observability): add db query duration histogram and pool exhaustion monitoring#2
winniebyte wants to merge 1 commit into
mainfrom
feature/964-965-db-query-histogram-pool-exhaustion-alert

Conversation

@winniebyte

Copy link
Copy Markdown
Owner

Summary

Implements two monitoring features:

Issue solutions-plug#964 - Database Query Duration Histogram

  • Added db_query_duration_seconds Prometheus histogram with buckets [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5]
  • Instrumented all key query paths in db.rs via with_timeout() to record query durations
  • Added P50/P95/P99 query latency Grafana panel to SLO dashboard
  • Added Prometheus alert DBSlowQueryP95 firing when P95 exceeds 500ms for 5 minutes

Issue solutions-plug#965 - Database Connection Pool Exhaustion Alert

  • Added db_pool_exhaustion_total counter in metrics.rs, incremented on pool timeout errors
  • Added Prometheus alert DBPoolExhaustion firing when rate exceeds 1/minute
  • Added Grafana panel showing pool utilisation as a percentage
  • Updated alerts.yaml with pool exhaustion alert rule

Testing

  • Added unit tests for observe_db_query_duration and observe_db_pool_exhaustion metrics
  • Existing db error tests remain passing

Closes solutions-plug#964
Closes solutions-plug#965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No Prometheus alert for database connection pool exhaustion No histogram for database query duration — slow queries are invisible

1 participant