Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions crates/datadog-agent-trace-sampler/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Copyright 2025-Present Datadog, Inc. https://www.datadoghq.com/
# SPDX-License-Identifier: Apache-2.0

[package]
name = "datadog-agent-trace-sampler"
version = "0.1.0"
edition.workspace = true
license.workspace = true
homepage.workspace = true
repository.workspace = true

[dependencies]
63 changes: 63 additions & 0 deletions crates/datadog-agent-trace-sampler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Datadog Agent Trace Sampler

Agent-side trace sampling shared across the serverless agents (bottlecap and the
Serverless Compatibility Layer).

This crate is a dependency-free 1:1 port of the Go trace agent's **error sampler**
(`ScoreSampler` targeting `ErrorTPS`, from `pkg/trace/sampler/` in
`DataDog/datadog-agent`). The error sampler is a *rescue* sampler: after an agent
decides to drop a trace, the trace gets a second look, and if it contains an
error it is kept, up to a budget of `target_tps` error traces per second
distributed fairly across distinct trace signatures. This guarantees error
visibility even under aggressive sampling.

## Why dependency-free

The public API takes primitives in (`SpanView` / `TraceView`) and returns a
`SampleDecision` out; it never exposes a protobuf `Span` type. This lets
consumers that pin different `libdatadog` revisions share the crate without
compiling incompatible `pb::Span` types into their build graphs.

## Usage

```rust
use datadog_agent_trace_sampler::{
ErrorSamplerConfig, ErrorsSampler, SampleDecision, SpanView, TraceView,
};

let mut sampler = ErrorsSampler::new(ErrorSamplerConfig::default());

let spans = [SpanView {
service: "web",
name: "web.request",
resource: "GET /",
error: true,
http_status_code: Some("500"),
error_type: None,
}];
let trace = TraceView {
env: "prod",
trace_id: 0xdead_beef,
root_index: 0,
root_global_sample_rate: 1.0,
spans: &spans,
};

// `now_unix_secs` drives the rolling window and is passed in (not read from a
// clock) so the crate stays dependency-free and deterministically testable.
match sampler.sample(1_700_000_000, &trace) {
SampleDecision::Keep { errors_sr } => {
// caller stamps `_dd.errors_sr = errors_sr` on the root span
}
SampleDecision::Drop => {
// the pending agent-side drop proceeds
}
}
```

`ErrorsSampler::sample` takes `&mut self` (the rolling buffer and rate map mutate
on every call). Consumers that share one sampler across threads wrap it in
`Arc<Mutex<ErrorsSampler>>`.

Setting `target_tps` to `0.0` disables the sampler: every candidate returns
`SampleDecision::Drop` (i.e. nothing is rescued).
116 changes: 116 additions & 0 deletions crates/datadog-agent-trace-sampler/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
// Copyright 2025-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

#![cfg_attr(not(test), deny(clippy::panic))]
#![cfg_attr(not(test), deny(clippy::unwrap_used))]
#![cfg_attr(not(test), deny(clippy::expect_used))]
#![cfg_attr(not(test), deny(clippy::todo))]
#![cfg_attr(not(test), deny(clippy::unimplemented))]

//! Agent-side trace sampling shared across serverless agents (bottlecap and the
//! Serverless Compatibility Layer).
//!
//! This crate is a dependency-free 1:1 port of the Go trace agent's error
//! sampler (`ScoreSampler` targeting `ErrorTPS`). The error sampler is a
//! *rescue* sampler: after an agent decides to drop a trace, the trace gets a
//! second look, and if it contains an error it is kept, up to a budget of
//! `target_tps` error traces per second distributed fairly across distinct trace
//! signatures. This guarantees error visibility even under aggressive sampling.
//!
//! The public API takes primitives in and returns a decision out (no protobuf
//! `Span` type), so consumers pinning different `libdatadog` revisions can share
//! it without compiling incompatible span types into their build graphs.
//!
//! # Example
//!
//! ```
//! use datadog_agent_trace_sampler::{
//! ErrorSamplerConfig, ErrorsSampler, SampleDecision, SpanView, TraceView,
//! };
//!
//! let mut sampler = ErrorsSampler::new(ErrorSamplerConfig::default());
//! let spans = [SpanView {
//! service: "web",
//! name: "web.request",
//! resource: "GET /",
//! error: true,
//! http_status_code: Some("500"),
//! error_type: None,
//! }];
//! let trace = TraceView {
//! env: "prod",
//! trace_id: 0xdead_beef,
//! root_index: 0,
//! root_global_sample_rate: 1.0,
//! spans: &spans,
//! };
//! match sampler.sample(/* now_unix_secs */ 1_700_000_000, &trace) {
//! SampleDecision::Keep { errors_sr } => {
//! // caller stamps `_dd.errors_sr = errors_sr` on the root span
//! let _ = errors_sr;
//! }
//! SampleDecision::Drop => { /* the pending drop proceeds */ }
//! }
//! ```

mod score_sampler;
mod signature;

pub use score_sampler::ErrorsSampler;
pub use signature::Signature;

/// A read-only view of a single span, holding only the fields the sampler needs.
///
/// `http_status_code` and `error_type` come from the span's `meta` map keys
/// `http.status_code` and `error.type` respectively.
#[derive(Debug, Clone, Copy)]
pub struct SpanView<'a> {
pub service: &'a str,
pub name: &'a str,
pub resource: &'a str,
pub error: bool,
pub http_status_code: Option<&'a str>,
pub error_type: Option<&'a str>,
}

/// A read-only view of a trace chunk to be sampled.
#[derive(Debug, Clone, Copy)]
pub struct TraceView<'a> {
pub env: &'a str,
pub trace_id: u64,
/// Index of the root span within `spans`.
pub root_index: usize,
/// The root span's global sample rate (`metrics["_sample_rate"]`), default 1.0.
pub root_global_sample_rate: f64,
pub spans: &'a [SpanView<'a>],
}

/// Configuration for the error sampler.
#[derive(Debug, Clone, Copy)]
pub struct ErrorSamplerConfig {
/// Target error traces per second (`ErrorTPS`). `0.0` disables the sampler
/// (every candidate is dropped, i.e. never rescued).
pub target_tps: f64,
/// Extra raw sampling rate applied on top of the computed rate.
pub extra_sample_rate: f64,
}

impl Default for ErrorSamplerConfig {
/// Matches the Go agent defaults: `ErrorTPS = 10`, `ExtraSampleRate = 1.0`.
fn default() -> Self {
ErrorSamplerConfig {
target_tps: 10.0,
extra_sample_rate: 1.0,
}
}
}

/// The outcome of sampling a trace.
#[derive(Debug, PartialEq)]
pub enum SampleDecision {
/// Keep (rescue) the trace. The caller should stamp `_dd.errors_sr` on the
/// root span with `errors_sr`.
Keep { errors_sr: f64 },
/// Drop the trace (do not rescue it); the pending agent-side drop proceeds.
Drop,
}
Loading
Loading