Add reproducing test for scrape failure with >70k metrics#1003
Closed
jaqx0r wants to merge 629 commits into
Closed
Add reproducing test for scrape failure with >70k metrics#1003jaqx0r wants to merge 629 commits into
jaqx0r wants to merge 629 commits into
Conversation
…bazelbuild/rules_go-0.58.0 build(deps): bump github.com/bazelbuild/rules_go from 0.57.0 to 0.58.0
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 5. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v4...v5) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/bazelbuild/rules_go](https://github.com/bazelbuild/rules_go) from 0.58.0 to 0.58.1. - [Release notes](https://github.com/bazelbuild/rules_go/releases) - [Commits](bazel-contrib/rules_go@v0.58.0...v0.58.1) --- updated-dependencies: - dependency-name: github.com/bazelbuild/rules_go dependency-version: 0.58.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
…bazelbuild/rules_go-0.58.1 build(deps): bump github.com/bazelbuild/rules_go from 0.58.0 to 0.58.1
…/upload-artifact-5 build(deps): bump actions/upload-artifact from 4 to 5
Bumps [github.com/bazelbuild/rules_go](https://github.com/bazelbuild/rules_go) from 0.58.1 to 0.58.2. - [Release notes](https://github.com/bazelbuild/rules_go/releases) - [Commits](bazel-contrib/rules_go@v0.58.1...v0.58.2) --- updated-dependencies: - dependency-name: github.com/bazelbuild/rules_go dependency-version: 0.58.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
…bazelbuild/rules_go-0.58.2 build(deps): bump github.com/bazelbuild/rules_go from 0.58.1 to 0.58.2
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.67.1 to 0.67.2. - [Release notes](https://github.com/prometheus/common/releases) - [Changelog](https://github.com/prometheus/common/blob/main/CHANGELOG.md) - [Commits](prometheus/common@v0.67.1...v0.67.2) --- updated-dependencies: - dependency-name: github.com/prometheus/common dependency-version: 0.67.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
…prometheus/common-0.67.2 build(deps): bump github.com/prometheus/common from 0.67.1 to 0.67.2
Bumps [github.com/bazelbuild/rules_go](https://github.com/bazelbuild/rules_go) from 0.58.2 to 0.58.3. - [Release notes](https://github.com/bazelbuild/rules_go/releases) - [Commits](bazel-contrib/rules_go@v0.58.2...v0.58.3) --- updated-dependencies: - dependency-name: github.com/bazelbuild/rules_go dependency-version: 0.58.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
…bazelbuild/rules_go-0.58.3 build(deps): bump github.com/bazelbuild/rules_go from 0.58.2 to 0.58.3
Bumps [github.com/bazelbuild/rules_go](https://github.com/bazelbuild/rules_go) from 0.58.3 to 0.59.0. - [Release notes](https://github.com/bazelbuild/rules_go/releases) - [Commits](bazel-contrib/rules_go@v0.58.3...v0.59.0) --- updated-dependencies: - dependency-name: github.com/bazelbuild/rules_go dependency-version: 0.59.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
…bazelbuild/rules_go-0.59.0 build(deps): bump github.com/bazelbuild/rules_go from 0.58.3 to 0.59.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.37.0 to 0.38.0. - [Commits](golang/sys@v0.37.0...v0.38.0) --- updated-dependencies: - dependency-name: golang.org/x/sys dependency-version: 0.38.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
…x/sys-0.38.0 build(deps): bump golang.org/x/sys from 0.37.0 to 0.38.0
Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.38.0 to 0.39.0. - [Release notes](https://github.com/golang/tools/releases) - [Commits](golang/tools@v0.38.0...v0.39.0) --- updated-dependencies: - dependency-name: golang.org/x/tools dependency-version: 0.39.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
…x/tools-0.39.0 build(deps): bump golang.org/x/tools from 0.38.0 to 0.39.0
chore(deps): update dependency aspect_bazel_lib to v2.21.2
chore(deps): update distroless_base docker digest to 9e9b50d
chore(deps): update dependency gazelle to v0.47.0
…tions chore(deps): update github artifact actions (major)
chore(deps): update dependency rules_go to v0.59.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.67.2 to 0.67.3. - [Release notes](https://github.com/prometheus/common/releases) - [Changelog](https://github.com/prometheus/common/blob/main/CHANGELOG.md) - [Commits](prometheus/common@v0.67.2...v0.67.3) --- updated-dependencies: - dependency-name: github.com/prometheus/common dependency-version: 0.67.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
…prometheus/common-0.67.3 build(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.3
chore(deps): update distroless_base docker digest to f2df870
fix(deps): update module golang.org/x/sys to v0.45.0
chore(deps): update docker/login-action action to v4.2.0
A pointer to the input `LogLine` and the thread state was being held in memory between execution of the `ProcessLogLine` function. This change clears both pointers at the end of the program run to ensure the GC cleans them up. This may be related to Issue: #390 but I'm not fully convinced yet.
`v.t` is the same pointer and gets reset each time.
Clone a key when popping from the stack instead of using it directly; at this moment the programme is running and the LogLine is still live, but this is the last moment when the capture group reference should be shared with the LogLine. After this moment, we must use new memory to store the datum keys to avoid pinning potentially large log lines in memory permanently. Issue: #390
fix: Repair two memory leaks in the VM.
build: Add initial AGENTS instructions.
test: Skip timezone test when the tz db is not available.
test: Handle IPv6 and fallback to IPv4 addresses in testutil.FreePort.
chore: Update MODULE.bazel.lock
The custom list is underspecified, and causes `mtail` to crash before `main`. This change allows all the normal Go runtime system calls to execute. Also remove duplicate LockPersonality option. Fixes: #175
fix: Use the default SystemCallFilter setting for systemd hardening.
Show how to simulate production load and profile `mtail` memory. Issue: #390
Every log line we allocated and then released memory for the thread struct in the VM. This is costly in CPU because of the GC churn required. Instead, allocate from a pool which will reuse the memory already allocated. The per-log-line allocation of threads is now eliminated. | Metric | Before (no pool) | After (sync.Pool) | Change | |---|---|---|---| | **Total cumulative alloc_space** | ~60 MB | **~16–19 MB** | **~3× reduction** | | `VM.ProcessLogLine` **flat** | 17.0 MB (28%) | **0 MB** | eliminated | | `VM.execute` **flat** | 7.5 MB (13.7%) | **0.5 MB** (3%) | ~15× reduction | | Pool init (`New.func1`) | — | 2.5 MB | one-time cost | Issue: #390
refactor: Use a memory pool to avoid reallocating thread memory.
This avoids about 2MB of result allocations per regexp match.
`regexp.FindStringSubmatch` returns `[]string` — a new slice **plus N+1 new string allocations** (one for the full match, one per capture group). Each string copies its data from the original log line onto the heap. On a typical dhcpd log line with 7 capture groups, that's 8 separate heap-allocated strings per match.
`regexp.FindStringSubmatchIndex` returns `[]int` — a single flat slice of byte-position pairs. No string data is copied. The original text is kept alive by a `matchResult.text` reference, and capture groups are read on demand via `text[indices[2*n]:indices[2*n+1]]`, which creates only a lightweight string header (no data copy).
An alternation like `(group_a|group_b)` produces index pair `{-1, -1}` for the unmatched branch. With the old `FindStringSubmatch` this came back as `""` (a valid empty string). The index-based code would panic on a negative slice bound, so the new `captureGroup` method checks for `-1` and returns `""` explicitly.
Issue: #390
refactor: Store the indices of the capture groups instead of strings.
Two test functions and a benchmark exercise the exporter at scale: - TestWritePrometheusManyLabelValues: single metric, many label values - TestWritePrometheusManyMetrics: many separate metrics (exercises the goroutine-per-metric pattern in Exporter.Collect) - BenchmarkWritePrometheus: latency measurements for both patterns Also adds issue-903.md documenting the test plan and benchmark results.
196b8f4 to
f83dc52
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added two table-driven tests and a benchmark to
internal/exporter/prometheus_test.gothat exercise the exporter at high metric cardinality:Exporter.Collect(prometheus.go:46) that slows down at 70k+ scale.Benchmarks on this machine show ~40–50ms for 10k items, extrapolating to ~270–330ms for 70k — fast enough that the pure serialisation isnt the bottleneck. The failure likely stems from the
/metricshandlerWriteTimeout(5s) combined with the fullreg.Gather()path (Go/process/expvar collectors + user metrics) and the time to stream ~5.6MB of response text.Fixes #903