Add worker health monitoring with automatic restart on stall

### Overview
`src/workers/health/worker-health-check.service.ts` exists but there is no evidence it triggers automatic recovery when a worker stalls (stops processing jobs without crashing). Silent worker stalls can halt notification delivery or subscription renewals indefinitely.

### Specifications

**Features:**
- Monitor each worker's last-processed-job timestamp.
- Trigger a graceful worker restart when the timestamp exceeds a configurable stall threshold.

**Tasks:**
- In each worker, update a Redis key `worker:heartbeat:{workerId}` on every successful job.
- Create a `WorkerStalledDetector` scheduled task that checks heartbeats every 60 seconds.
- If heartbeat is older than `WORKER_STALL_THRESHOLD_SECONDS` (default 300), emit a `worker.stalled` event and initiate graceful restart.
- Add a Prometheus counter `worker_restarts_total{worker_name}`.

**Impacted Files:**
- `src/workers/health/worker-health-check.service.ts`
- All processor files.

### Acceptance Criteria
- Stalled worker is restarted within 2x the stall threshold.
- Prometheus counter increments on each automatic restart.
- Test simulates a stall by freezing the heartbeat key.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add worker health monitoring with automatic restart on stall #855

Overview

Specifications

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add worker health monitoring with automatic restart on stall #855

Description

Overview

Specifications

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions