Skip to content

feat(heap-profiling): add initial heap profiling primitives [PROF-15190]#2166

Open
scottgerring wants to merge 12 commits into
mainfrom
sgg/heap-prof-poc
Open

feat(heap-profiling): add initial heap profiling primitives [PROF-15190]#2166
scottgerring wants to merge 12 commits into
mainfrom
sgg/heap-prof-poc

Conversation

@scottgerring

@scottgerring scottgerring commented Jun 26, 2026

Copy link
Copy Markdown
Member

What does this PR do?

Adds initial support for eBPF heap profiling to libdatadog for the ebpf heap profiling project. These changes provide the ability for libdatadog users to introduce a USDT into consuming applications allocation and deallocation paths which is fired behind a poisson distributed sampling path every ~0.5 MiB of allocation. This will give us something to hook in user applications from the eBPF side to sample allocations for allocation tracking and sample allocations+frees for live heap.

I expect we will make more changes to this as we go along, but as this is a standalone feature and is largely based on prior work in ddprof it would be nice to get things in now.

A brief description of the change being made with this pull request.

Interesting Details

  • I have vendored usdt.h; it is not consistently available on all our build environments (read: old alpine) and this seems to be the common pattern. The API is stable and I see no concerns with this and have updated the license stuff appropriately
  • I have gated the binding generation behind an env var and committed the bindings. This is because the generation imposes a fair bit of extra stuff (llvm and supporting bits) on every build job and developer. I've added a CI job to validate that the bindings generated reflect what's checked in to avoid the footgun.

Motivation

An enthusiasm for heap profiling

Additional Notes

Anything else we should know when reviewing?

How to test the change?

Describe here in detail how the change can be validated.

@datadog-prod-us1-4

datadog-prod-us1-4 Bot commented Jun 26, 2026

Copy link
Copy Markdown

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: adc2f14 | Docs | Datadog PR Page | Give us feedback!

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

📚 Documentation Check Results

⚠️ 627 documentation warning(s) found

📦 libdd-heap-allocator - 44 warning(s)

📦 libdd-heap-gotter-ffi - 322 warning(s)

📦 libdd-heap-gotter - 44 warning(s)

📦 libdd-heap-sampler - 44 warning(s)

📦 tools - 173 warning(s)


Updated: 2026-07-03 13:07:48 UTC | Commit: e2cedb0 | missing-docs job results

@github-actions

Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/sgg/heap-prof-poc

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 22 22 No change (0%)
datadog-live-debugger 4 4 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-sidecar 45 45 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 6 6 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-remote-config 3 3 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 11 11 No change (0%)
Total 182 182 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

🔒 Cargo Deny Results

⚠️ 4 issue(s) found, showing only errors (advisories, bans, sources)

📦 libdd-heap-allocator - ✅ No issues

📦 libdd-heap-gotter-ffi - 1 error(s)

Show output
error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:121:1
    │
121 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
    │
    ├ ID: RUSTSEC-2026-0097
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
    ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
      
      - The `log` and `thread_rng` features are enabled
      - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
      - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
      - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
      - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
      
      `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
    ├ Announcement: https://github.com/rust-random/rand/pull/1763
    ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
    ├ rand v0.8.5
      └── (dev) libdd-common v5.0.0
          └── libdd-common-ffi v37.0.0
              └── libdd-heap-gotter-ffi v37.0.0

advisories FAILED, bans ok, sources ok

📦 libdd-heap-gotter - ✅ No issues

📦 libdd-heap-sampler - ✅ No issues

📦 tools - 3 error(s)

Show output
error[vulnerability]: Quadratic run time when checking a start tag for duplicate attribute names
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:73:1
   │
73 │ quick-xml 0.37.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0194
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0194
   ├ `BytesStart::attributes()` returns an `Attributes` iterator which, by default
     (`with_checks(true)`), rejects a start tag that repeats an attribute name. For
     each attribute yielded, the iterator compared the new name against every name
     seen so far in the same tag using a linear scan, so a start tag with `N`
     distinct attribute names cost `O(N²)` byte comparisons. There was no bound on
     `N` other than the size of the buffered start tag.
     
     ## Impact
     
     Any code that parses untrusted XML and iterates a start tag's attributes with
     the default duplicate check enabled can be made to spend CPU time quadratic in
     the number of attributes on a single tag. Because the check is pure computation
     with no `.await`/I/O, an I/O-based timeout on the consumer (for example a read
     or request timeout) cannot interrupt it while it runs.
     
     Measured cost of a single start tag, release build:
     
     | Attributes on one tag | Time |
     |---|---|
     | 80,000  | ~6 s   |
     | 800,000 | ~10 min |
     
     The cost grows with the square of the attribute count, so a start tag of a few
     tens of megabytes can stall a parsing thread for hours. No memory is exhausted
     and the parser does not crash; the effect is CPU exhaustion on the thread doing
     the parsing: a single crafted start tag can pin a CPU core for minutes to hours,
     denying service to that worker. A deployment that places a wall-clock bound on
     parsing, or confines it to a non-critical thread, may consider the availability
     impact lower.
     
     ## Affected code paths
     
     * `BytesStart::attributes()` / `Attributes` iterated with checks enabled (the
       default), and `BytesStart::try_get_attribute`.
     * `NsReader`, which resolves namespaces by iterating a tag's attributes and so
       reaches the same check internally.
     
     Consumers that iterate attributes with `.attributes().with_checks(false)` and do
     not use `NsReader` are not affected.
     
     This was reported as reachable by a remote, unauthenticated attacker in a
     real-world RPKI relying party (NLnet Labs Routinator) via a crafted RRDP
     `snapshot.xml`.
     
     ## Remediation
     
     Upgrade to `quick-xml >= 0.41.0`, where the duplicate check keeps the linear
     scan for start tags with a small number of attributes and switches to an `O(1)`
     hash pre-filter above a threshold, making the whole tag `O(N)`. The reported
     `AttrError::Duplicated` positions are unchanged.
     
     If upgrading is not possible and duplicate-name detection is not required,
     disable it with `.attributes().with_checks(false)` (this does not help
     `NsReader` consumers, which have no equivalent opt-out before 0.41.0).
   ├ Announcement: https://github.com/tafia/quick-xml/issues/969
   ├ Solution: Upgrade to >=0.41.0 (try `cargo update -p quick-xml`)
   ├ quick-xml v0.37.5
     └── tools v37.0.0

error[vulnerability]: Unbounded namespace-declaration allocation in `NsReader` enables memory-exhaustion denial of service
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:73:1
   │
73 │ quick-xml 0.37.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0195
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0195
   ├ `NsReader` resolves namespaces by calling `NamespaceResolver::push` for every
     `Start`/`Empty` event *before* the event is returned to the caller. `push`
     iterated all `xmlns` / `xmlns:*` attributes on the start tag and, for each one,
     appended the prefix bytes to an internal buffer and pushed a `NamespaceBinding`
     (32 bytes on 64-bit) to an internal `Vec`, with no upper bound on the number of
     declarations.
     
     ## Impact
     
     A start tag with `N` namespace declarations drove roughly `3×` the tag's byte
     size in `NamespaceResolver` heap, allocated *inside* `quick-xml` before the
     `NsReader` consumer ever received the event and could inspect or reject it. A
     consumer that bounds its *input* size therefore still cannot bound this
     allocation: an `M`-byte start tag yields on the order of `3 × M` bytes of
     resolver heap the caller never sees.
     
     On untrusted XML this lets a remote, unauthenticated attacker force large heap
     allocations with a single start tag. With several `NsReader`s running
     concurrently on independent inputs (a common server pattern), the allocations
     stack and can exhaust process memory, causing the operating system to kill the
     process (OOM). This was confirmed against a real-world RPKI relying party (NLnet
     Labs Routinator), where concurrent RRDP validation workers parsing a crafted
     `snapshot.xml` exceeded the memory limit and the process was OOM-killed.
     
     ## Affected code paths
     
     Consumers using `NsReader` (which always calls `NamespaceResolver::push` before
     yielding `Start`/`Empty`), or calling `NamespaceResolver::push` directly. A plain
     `Reader` that does not perform namespace resolution is not affected.
     
     ## Remediation
     
     Upgrade to `quick-xml >= 0.41.0`. `NamespaceResolver::push` now rejects a start
     tag that declares more than `DEFAULT_MAX_DECLARATIONS_PER_ELEMENT` (256)
     namespace bindings, returning the new `NamespaceError::TooManyDeclarations`
     instead of allocating without limit. The limit is configurable via
     `NamespaceResolver::set_max_declarations_per_element` (use `usize::MAX` to
     restore the previous unbounded behavior), and `NsReader::resolver_mut()` is
     provided to reach it.
     
     There is no clean workaround for `NsReader` consumers before 0.41.0, as the
     allocation happens inside the reader with no configuration knob to cap it.
   ├ Announcement: https://github.com/tafia/quick-xml/issues/970
   ├ Solution: Upgrade to >=0.41.0 (try `cargo update -p quick-xml`)
   ├ quick-xml v0.37.5
     └── tools v37.0.0

error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:76:1
   │
76 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
   │
   ├ ID: RUSTSEC-2026-0097
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
   ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
     
     - The `log` and `thread_rng` features are enabled
     - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
     - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
     - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
     - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
     
     `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
   ├ Announcement: https://github.com/rust-random/rand/pull/1763
   ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
   ├ rand v0.8.5
     └── (dev) libdd-common v5.0.0
         └── tools v37.0.0

advisories FAILED, bans ok, sources ok

Updated: 2026-07-03 13:08:57 UTC | Commit: e2cedb0 | dependency-check job results

@scottgerring scottgerring force-pushed the sgg/heap-prof-poc branch 2 times, most recently from b80ec4a to 11cf066 Compare June 26, 2026 14:37
@scottgerring scottgerring changed the title [PROF-15190]: DRAFT - add initial heap profiling crates feat(heap-profiling): add initial heap profiling primitives [PROF-15190] Jun 29, 2026
@scottgerring scottgerring force-pushed the sgg/heap-prof-poc branch 19 times, most recently from 6fe0523 to d10a58f Compare July 1, 2026 07:01
@dd-octo-sts

dd-octo-sts Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.88 MB 7.88 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 85.64 MB 85.64 MB 0% (0 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 96.81 MB 96.81 MB 0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.60 MB 10.60 MB 0% (0 B) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 25.41 MB 25.41 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 88.04 KB 88.04 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 184.50 MB 184.50 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 945.48 MB 945.48 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 8.29 MB 8.29 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 88.04 KB 88.04 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 24.52 MB 24.52 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 48.85 MB 48.85 MB 0% (0 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 22.02 MB 22.02 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 89.42 KB 89.42 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 188.51 MB 188.51 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 932.67 MB 932.67 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.41 MB 6.41 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 89.42 KB 89.42 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 26.32 MB 26.32 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 46.48 MB 46.48 MB 0% (0 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 76.38 MB 76.38 MB 0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.76 MB 8.76 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 91.86 MB 91.86 MB 0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.66 MB 10.66 MB 0% (0 B) 👌

@scottgerring scottgerring marked this pull request as ready for review July 1, 2026 08:54
@scottgerring scottgerring requested review from a team as code owners July 1, 2026 08:54
@scottgerring scottgerring requested a review from r1viollet July 1, 2026 09:05

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be77a1cb93

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread libdd-heap-sampler/src/allocation_requested.c
Comment thread libdd-heap-sampler/src/allocation_requested.c Outdated
Comment thread libdd-heap-gotter/src/hooks.rs Outdated
Comment thread libdd-heap-gotter/src/hooks.rs
Comment thread libdd-heap-sampler/src/allocation_created.c Outdated
Comment thread libdd-heap-gotter/src/elf.rs Outdated
Comment thread libdd-heap-gotter/src/elf.rs Outdated
Comment thread libdd-heap-sampler/src/sample_flag.c Outdated
Comment thread libdd-heap-sampler/build.rs
Comment thread libdd-heap-gotter/src/lib.rs Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the first set of heap-profiling building blocks to libdatadog: a Linux-focused sampler that emits ddheap:* USDT probes, plus Rust and runtime-injection frontends (GlobalAlloc wrapper + GOT interposition) and CI to keep the checked-in bindgen artifacts in sync.

Changes:

  • Introduces libdd-heap-sampler (C implementation + checked-in bindgen outputs) with vendored usdt.h and a workflow to verify regenerated bindings match the committed artifacts.
  • Adds libdd-heap-allocator (Rust GlobalAlloc wrapper) and libdd-heap-gotter (ELF GOT-patching injector) with demos and tests.
  • Adds libdd-heap-gotter-ffi to expose install/update/restore over a small C ABI for use by other runtimes.

Reviewed changes

Copilot reviewed 42 out of 47 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tools/docker/Dockerfile.build Adds new heap crates to Docker build caching/stubs.
NOTICE Notes bundled usdt.h license attribution.
Cargo.toml Adds new heap crates to workspace members.
Cargo.lock Records new crate deps (incl. bindgen split versions).
.github/workflows/verify-heap-sampler-bindings.yml CI job to regenerate/verify checked-in bindgen artifacts.
.github/CODEOWNERS Assigns heap crates to profiling team.
libdd-heap-sampler/Cargo.toml New sampler crate metadata + build deps.
libdd-heap-sampler/build.rs Builds C sources on Linux; optional bindgen regen via env var.
libdd-heap-sampler/README.md High-level design and binding-regeneration instructions.
libdd-heap-sampler/vendor/usdt.h Vendored libbpf/usdt single-header implementation.
libdd-heap-sampler/vendor/README.md Documents vendoring source/license and refresh procedure.
libdd-heap-sampler/src/lib.rs Rust façade over generated bindings + basic layout/smoke tests.
libdd-heap-sampler/src/generated/bindings.rs Checked-in bindgen Rust declarations.
libdd-heap-sampler/src/generated/dd_heap_sampler_static_wrappers.c Checked-in bindgen static-inline wrappers.
libdd-heap-sampler/src/tl_state.c TLS state init/seed logic for per-thread sampling.
libdd-heap-sampler/src/sample_flag.c Arch-specific sampled-pointer marking (x86 header, arm64 TBI).
libdd-heap-sampler/src/probes.c Non-inline USDT probe emission functions.
libdd-heap-sampler/src/allocation_requested.c Poisson sampler slow-path + size bumping logic.
libdd-heap-sampler/src/allocation_created.c Applies sample flag + emits alloc probe + closes reentry guard.
libdd-heap-sampler/src/allocation_freed.c Emits free probe and returns raw pointer/size adjustments.
libdd-heap-sampler/src/allocation_realloc.c Realloc helpers for sampled-old teardown behavior.
libdd-heap-sampler/include/datadog/heap/tl_state.h Public TLS state struct + fast getters.
libdd-heap-sampler/include/datadog/heap/sample_flag.h Public flagging API + x86/arm64 fast check helpers.
libdd-heap-sampler/include/datadog/heap/probes.h Public probe API + Linux USDT include.
libdd-heap-sampler/include/datadog/heap/allocation_requested.h Public pre-alloc hook API + fast path inline.
libdd-heap-sampler/include/datadog/heap/allocation_created.h Public post-alloc hook API + fast path inline.
libdd-heap-sampler/include/datadog/heap/allocation_freed.h Public pre-free hook API + fast path inline.
libdd-heap-sampler/include/datadog/heap/allocation_realloc.h Public realloc helper API.
libdd-heap-allocator/Cargo.toml New Rust allocator wrapper crate.
libdd-heap-allocator/src/lib.rs Exposes SampledAllocator.
libdd-heap-allocator/src/allocator.rs Implements GlobalAlloc wrapper + tests.
libdd-heap-allocator/README.md Usage docs + example pointer.
libdd-heap-allocator/examples/usdt_demo.rs Demo binary emitting USDT alloc samples.
libdd-heap-gotter/Cargo.toml New GOT interposition crate.
libdd-heap-gotter/src/lib.rs Public API + Linux/no-op cross-platform surface.
libdd-heap-gotter/src/hooks.rs Allocator/dlopen/pthread_create hook implementations.
libdd-heap-gotter/src/elf.rs ELF/DYNAMIC parsing + relocation patching + mprotect guard.
libdd-heap-gotter/README.md GOTter overview documentation.
libdd-heap-gotter/examples/gotter_usdt_demo.rs Demo using GOT overrides to drive sampler via Rust allocations.
libdd-heap-gotter/tests/install.rs Linux integration smoke test for install/restore.
libdd-heap-gotter-ffi/Cargo.toml New C-ABI wrapper crate around gotter.
libdd-heap-gotter-ffi/build.rs Generates C header via cbindgen.
libdd-heap-gotter-ffi/cbindgen.toml C header generation configuration.
libdd-heap-gotter-ffi/src/lib.rs Exposes install/update/restore/is-installed over C ABI.
libdd-heap-gotter-ffi/README.md Documents C ABI and lifetime constraints.
libdd-heap-gotter-ffi/examples/cdylib_demo.rs Demo dynamic-loading and calling the C ABI.
libdd-heap-gotter-ffi/tests/install.rs FFI smoke test for install/restore roundtrip.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libdd-heap-gotter/src/elf.rs Outdated
Comment thread libdd-heap-gotter/src/elf.rs Outdated
Comment thread libdd-heap-gotter/src/elf.rs
Comment thread libdd-heap-sampler/src/allocation_requested.c
Comment thread libdd-heap-gotter/README.md
Comment thread libdd-heap-sampler/README.md Outdated
Comment thread libdd-heap-sampler/README.md Outdated
Comment thread libdd-heap-sampler/README.md Outdated
Comment thread libdd-heap-gotter/src/hooks.rs Outdated
@jbachorik jbachorik added the sphinx:critical Sphinx: critical — human review required label Jul 2, 2026
@scottgerring scottgerring force-pushed the sgg/heap-prof-poc branch 2 times, most recently from 17c7bba to 38fba33 Compare July 2, 2026 14:47
Adds a Criterion benchmark comparing direct System allocation, the
SampledAllocator hot path, and direct sampler calls, plus a small
bench.sh helper for libdd-heap-sampler.

Kept on a separate change because it pulls criterion (and thus
bindgen via libdd-heap-sampler's build) into the bench compile set,
which the libdatadog benchmarking-platform image cannot build
without libclang-dev.

@yannham yannham left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review: libdd-heap-allocator, libdd-heap-gotter-ffi.

sleep(Duration::from_secs(1));
}
}
} // mod linux

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, we're not in C 😛

Suggested change
} // mod linux
}

Comment on lines +18 to +23
To wrap a custom allocator instead; note that this is kind of ill-advised; we want to see _all_ allocations for the process:

```rust
#[global_allocator]
static ALLOC: SampledAllocator<MyAllocator> = SampledAllocator::new(MyAllocator::new());
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not sure to understand this mention. Why wrapping a custom allocator is ill-advised? Doesn't the wrapping reach the same blast radius anyway, which is the current project/compilation unit/whatever (This is not a rhetorical question, I don't remember how custom allocators are scoped) ?

Suggested change
To wrap a custom allocator instead; note that this is kind of ill-advised; we want to see _all_ allocations for the process:
```rust
#[global_allocator]
static ALLOC: SampledAllocator<MyAllocator> = SampledAllocator::new(MyAllocator::new());
```
To wrap a custom allocator instead:
```rust
#[global_allocator]
static ALLOC: SampledAllocator<MyAllocator> = SampledAllocator::new(MyAllocator::new());
```
Note that this is kind of ill-advised: we want to see _all_ allocations for the process.

#![cfg_attr(not(test), deny(clippy::unwrap_used))]
#![cfg_attr(not(test), deny(clippy::expect_used))]
#![cfg_attr(not(test), deny(clippy::todo))]
#![cfg_attr(not(test), deny(clippy::unimplemented))]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#![cfg_attr(not(test), deny(clippy::unimplemented))]
#![cfg_attr(not(test), deny(clippy::unimplemented))]
#![cfg_attr(not(test), deny(clippy::unreachable))]

/// could be installed; the rest of the API can still be called safely.
#[no_mangle]
#[must_use]
#[named]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you use the function name anymore? It doesn't look so.

Comment on lines +17 to +22
fn assert_ok(result: VoidResult, what: &str) {
match result {
VoidResult::Ok => {}
VoidResult::Err(err) => panic!("{what} failed: {err}"),
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think you can just do assert_eq!(foo, Ok(_)) to the same effect instead of defining this helper. It should print foo and Err(e) in debug mode.

The `cdylib_demo` example loads the generated shared library with `dlopen`, resolves the C ABI symbols with `dlsym`, installs the GOT hooks, and produces allocation pressure.

```bash
cargo build -p libdd-heap-gotter-ffi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but cargo run will build whatever it needs to automatically I believe?

Suggested change
cargo build -p libdd-heap-gotter-ffi

dd_allocation_created(raw.cast(), req).cast()
}

#[cfg(not(target_os = "linux"))]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure rustc does a good job but it's one of those cases where we're pretty sure we want this to always be inlined.

Suggested change
#[cfg(not(target_os = "linux"))]
#[cfg(not(target_os = "linux"))]
#[inline(always)]

self.inner.dealloc(freed.ptr.cast(), inner_layout);
}

#[cfg(not(target_os = "linux"))]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(not(target_os = "linux"))]
#[cfg(not(target_os = "linux"))]
#[inline(always)]

}

unsafe impl<A: GlobalAlloc> GlobalAlloc for SampledAllocator<A> {
#[cfg(target_os = "linux")]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure how it works for allocators, but in general in Rust you need to add #[inline] for any cross-crate (that is cross-compilation-unit) inlining to work (or the compiler just won't include required LLVM bytecode to do so to avoid having huge artifacts). This annotation still lets rustc decide (it can refuse to inline it), but without it there can't cross-crate inlining, so I believe it's safe to add it.

Suggested change
#[cfg(target_os = "linux")]
#[cfg(target_os = "linux")]
#[inline]

self.inner.alloc(layout)
}

#[cfg(target_os = "linux")]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(target_os = "linux")]
#[cfg(target_os = "linux")]
#[inline]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-build sphinx:critical Sphinx: critical — human review required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants