Skip to content

axiom: surface events dropped when IngestChannel exhausts retry budget #428

@lukasmalkmus

Description

@lukasmalkmus

After #425, IngestChannel retries up to 3 consecutive flush failures before returning the error to the adapter's outer loop (axiom/datasets.go, maxConsecutiveErrors). When this happens, the adapter (adapters/logrus, adapters/slog) restarts IngestChannel with a fresh, empty batch. Any events still held in the local batch slice are silently dropped.

There is currently no way to measure this drop rate from outside the SDK:

  • No metric counter is incremented.
  • No structured log line is emitted on the drop path.
  • No span event is recorded.

This makes it impossible for users to correlate ingestion gaps in their dashboards with SDK-level drops versus upstream outages or misconfiguration.

Proposal

On the maxConsecutiveErrors branch in IngestChannel, before returning the error:

  1. Emit a log line that includes len(batch) and a short reason. The adapters already run a stderr logger; reusing the same pattern keeps things consistent.
  2. Record a span event on the tracing span with the drop count.
  3. Optionally expose a package-level counter (e.g. via an opt-in metric.Meter hook) so users can plumb it into their observability stack.

Minimum viable version is the log line. The metric is the nice-to-have.

Related: #425.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions