Skip to content

Add server-side aggregation endpoints for dashboard charts #20

@Boanerges1996

Description

@Boanerges1996

Problem

The PeerMetrics dashboard (/apps/<id>) downloads all raw data into the browser and aggregates it with JavaScript. On production:

  • GET /v1/conferences?appId=...&created_at_gte=... returns 42,177 conferences in one response
  • Response size: 21.7 MB of JSON
  • Response time: ~13 seconds
  • Gunicorn worker timeout is 30s — one more doubling of data = every dashboard load returns 504

Even after the PRs we merged (pagination #19, expand_fields removal, annotations #22), the graphs tab still pulls the full dataset because charts aggregate client-side.

Context — Scale

Sessions Health has ~3 apps total (dev / staging / prod). This changes the optimal strategy significantly:

  • Cache hit rate will be near 100% once warm (all users hit the same small key set)
  • Pre-warming every app is trivially cheap (~180 KB Redis, ~1 qps on DB)
  • Cold-start latency can be fully eliminated

Goal

Three compounding layers that together make the dashboard load in ~50ms regardless of data volume, with zero 504s.

┌────────────────────────────────────────────────────────────────┐
│  LAYER 3: Vue Query (browser-side cache + dedup + SWR)         │
│  → Repeat navigations hit in-memory cache instantly            │
│  → Multiple charts requesting same data = one HTTP call        │
└──────────────────────┬─────────────────────────────────────────┘
                       ▼
┌────────────────────────────────────────────────────────────────┐
│  LAYER 2: Redis cache (server-side, 30-60s TTL)                │
│  → With ~3 apps, a pre-warming job keeps cache always hot      │
│  → 2 KB responses cache cheaply (vs 21 MB today)               │
└──────────────────────┬─────────────────────────────────────────┘
                       ▼
┌────────────────────────────────────────────────────────────────┐
│  LAYER 1: Server-side aggregation (PostgreSQL GROUP BY)        │
│  → 21 MB → 2 KB responses                                      │
│  → Query runs in <100ms with proper indexes                    │
│  → This is the foundational fix; nothing above works without it │
└────────────────────────────────────────────────────────────────┘

Expected results after all three layers ship (with pre-warming):

Scenario Response time Response size
Any dashboard load, any user ~50ms 2 KB
Same user revisiting (Vue Query cache) <10ms (no network) 0 bytes

With only ~3 apps total, pre-warming keeps all Redis caches always hot. No user ever pays the cold-start Postgres cost.


Layer 1 — Server-Side Aggregation (foundational)

Charts that need aggregation (use the big conferences array — priority)

Chart Current behavior Summary endpoint needed
Conferences (stacked bar) .reduce() groups by day, counts success/warning/error/ongoing GET /v1/conferences/summary?group_by=day
Most common issues Groups issues by code, counts per conference GET /v1/issues/summary?group_by=code
Conference duration Buckets conferences into duration ranges GET /v1/conferences/duration-summary
Call setup time Computes from connections, buckets in JS GET /v1/connections/setup-time-summary
Number of participants Counts participants per conference GET /v1/conferences/participant-count-summary

Charts using sessions/connections (smaller, lower priority)

Chart Data Current size Priority
Browsers sessions ~3 MB Medium
OS sessions ~3 MB Medium
Map sessions (geo_ip) ~3 MB Medium
GUM errors issues <1 KB Low
Relayed connections connections <1 KB Low

Click-to-Detail Pattern

When a user clicks a bar/segment, a modal shows matching conferences. Today this filters the in-memory array. With aggregation, we fetch just that slice via the paginated list endpoint:

async onChartClick(e) {
  const dayStart = new Date(e.xValue);
  const dayEnd = new Date(dayStart);
  dayEnd.setDate(dayEnd.getDate() + 1);

  const resp = await peermetrics.get('/v1/conferences', {
    appId: peermetrics.app.id,
    created_at_gte: dayStart.toISOString(),
    created_at_lte: dayEnd.toISOString(),
    limit: 50,
  });
  this.modalConferences = resp.results;
}

API filters needed for modals

Chart Filter Status
Conferences by day created_at_gte, created_at_lte ✓ supported
Most common issues issue_code=xxx add
Conference duration duration_gte, duration_lt add
Call setup time setup_time_gte, setup_time_lt complex (setup time is per-connection)

Example aggregation query

SELECT DATE(created_at) AS date,
  COUNT(*) FILTER (WHERE ongoing) AS ongoing,
  COUNT(*) FILTER (WHERE has_errors AND NOT ongoing) AS error,
  COUNT(*) FILTER (WHERE has_warnings AND NOT has_errors AND NOT ongoing) AS warning,
  COUNT(*) FILTER (WHERE NOT has_errors AND NOT has_warnings AND NOT ongoing) AS success
FROM conference
WHERE app_id = ? AND created_at >= ?
GROUP BY DATE(created_at);

Layer 2 — Redis Caching (server-side)

Redis is already deployed (rate limiter uses it). Add a thin cache wrapper around summary endpoints.

Memory footprint

  • 3 apps × ~10 summary endpoints × 3 date ranges = ~180 KB total
  • Completely negligible

TTL strategy

Endpoint TTL
Conference daily summary 60s
Duration/participant distributions 120s
Session browser/OS aggregates 300s
Click-to-detail paginated lists no cache

Pattern

def cached_summary(cache_key, ttl, compute_fn):
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)
    result = compute_fn()
    redis.setex(cache_key, ttl, json.dumps(result))
    return result

Invalidation

TTL-based. No active invalidation. Add ?no_cache=true param for admin tools that need fresh data.

Cache stampede mitigation

Add randomized TTL jitter (60s ± 10s) so cache keys don't all expire simultaneously.


Layer 3 — Vue Query (browser-side)

Add vue-query@1.x (Vue 2 compatible) to the dashboard.

Benefits

  • Instant cache hits on repeat navigation — dashboard → conference detail → back: renders in <10ms from memory
  • Automatic deduplication — 3 charts requesting the same summary = 1 HTTP call
  • Stale-while-revalidate — show cached data instantly, refetch in background
  • Automatic retries on network errors

Pattern

const { data, isLoading } = useQuery({
  queryKey: ['conferences-summary', appId, dateRange],
  queryFn: () => peermetrics.get('/v1/conferences/summary', { appId, created_at_gte: dateRange }),
  staleTime: 60_000,
  cacheTime: 300_000,
});

Implementation Phases

Phase 0 — Prerequisites

Phase 1 — Conferences chart (MVP)

  1. API: GET /v1/conferences/summary?appId=...&created_at_gte=...
  2. API: Add created_at_lte support to paginated /v1/conferences
  3. Web: conferencesChart.vue uses summary + paginated click handler
  4. Web: Remove big /v1/conferences call from app.vue graphs tab

Success metric: Conferences chart renders in <200ms.

Phase 2 — Remaining conference/issue/connection charts

  1. API: /conferences/duration-summary, /conferences/participant-count-summary
  2. API: /issues/summary?group_by=code
  3. API: /connections/setup-time-summary, /connections/summary?group_by=type
  4. API: Add issue_code, duration_gte/lt filters on /conferences
  5. Web: Migrate conferenceDurationChart, noParticipantsChart, mostCommongIssuesChart, callSetupTimeChart, connectionTypeChart

Phase 3 — Session charts

  1. API: /sessions/summary?group_by=browser|os|country
  2. API: Geo data endpoint for mapChart
  3. Web: Migrate browsersChart, osChart, mapChart
  4. Web: Remove the full /v1/sessions call (currently 3 MB)

Phase 4 — Redis caching

  1. Add cached_summary() helper
  2. Wrap each summary endpoint with appropriate TTL
  3. Cache keys include app_id, date range, filters
  4. Monitor cache hit rate in CloudWatch

Phase 5 — Vue Query

  1. Install vue-query@1.x (Vue 2 compat)
  2. Wrap dashboard with query client provider
  3. Refactor app.vue data fetching to use useQuery
  4. Migrate chart components to consume queries instead of props
  5. Remove ad-hoc state (this.data.conferences, this.paginatedConferences)

Phase 6 (recommended) — Cache pre-warming

Sessions Health has ~3 apps total. Pre-warming every app is cheap and worthwhile.

Implementation:

# management/commands/warm_dashboard_cache.py
def handle(self):
    for app in App.objects.filter(is_active=True):
        for endpoint in SUMMARY_ENDPOINTS:
            key, ttl = cache_key_for(app, endpoint)
            result = endpoint.compute(app)
            redis.setex(key, ttl, json.dumps(result))
  • Run every 30 seconds (TTL is 60s, so cache always fresh)
  • Total DB load: ~30 queries per 30s cycle = 1 qps
  • Effect: no user ever hits a cold cache, not even first load of the day

Edge Cases & Decisions

Dashboard filters (browser/OS/country/app version)

Filters run client-side today. With aggregation, they must be passed as query params to summary endpoints. Decision: include filter params from Phase 1 onward. Cache keys include filters so filtered/unfiltered queries are cached separately.

Redis key schema

conf_summary:{app_id}:{date_range_hash}:{filter_hash}
issues_summary:{app_id}:{date_range_hash}:{filter_hash}

Vue Query + Vue 2 compatibility

Use vue-query@1.x (Vue 2 compat). @tanstack/vue-query@5.x is Vue 3 only.

Timezone

Aggregate in UTC for Phase 1. Frontend shows in browser timezone. Add timezone_offset param later if users complain.

Ongoing + mutually exclusive status

Conferences chart shows success | warning | error | ongoing as mutually exclusive. SQL uses FILTER with explicit AND NOT clauses (see query example above).

Rate limiting

Summary endpoints hit GenericView (2000/min per IP). Vue Query's dedup helps — multiple charts asking for the same data fire one request, not several.

Backward compatibility

Keep the old /v1/conferences list working. Summary endpoints are additive. Migrate chart-by-chart.

Cache stampede

Randomized TTL jitter (60s ± 10s) so keys don't expire together.

Pre-warming scope

With only ~3 apps, pre-warm every app every 30 seconds. No complex "which apps to warm" logic needed.


What to Measure

Metric Before Target
Dashboard load time (warm Redis) 13s ~50ms
Dashboard load time (Vue Query hit) N/A <10ms
Largest API response size 21 MB <50 KB
504 error rate growing 0
Postgres dashboard query time 3-5s <100ms
Redis cache hit rate N/A ~100% with pre-warming

Dependencies


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions