Skip to content

Commit aedd827

Browse files
LeeCampbellclaude
andauthored
feat: add benchmark-driven workflow for performance issues (#158)
* feat: add benchmark-driven workflow for performance issues Establishes a benchmark-first methodology in testing standards and updates autonomous agent prompts to enforce Phase 1 (baseline) → Phase 2 (implement) → Phase 3 (validate) ordering for performance issues, ensuring baseline measurements exist before any code changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR #158 review feedback - Remove stale Program.cs from git add in execute-tasks.md - Clarify --job short is only for Phase 2; Phase 1 and 3 use defaults - Add trailing newlines to create-tasks.md and execute-tasks.md - Replace static benchmarks table with directory lookup directive Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 85c8788 commit aedd827

7 files changed

Lines changed: 150 additions & 7 deletions

File tree

autonomous/agent-loop.sh

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,16 @@ case "$STATE" in
133133
134134
$(cat /tmp/plan-backup/done/task.md)
135135
136+
</details>"
137+
fi
138+
if [ -f /tmp/plan-backup/benchmarks/comparison.md ]; then
139+
PR_BODY="${PR_BODY}
140+
141+
<details>
142+
<summary>Benchmark comparison</summary>
143+
144+
$(cat /tmp/plan-backup/benchmarks/comparison.md)
145+
136146
</details>"
137147
fi
138148
PR_BODY="${PR_BODY}
@@ -249,7 +259,7 @@ EOF
249259
# If plan state doesn't exist on the branch, re-initialise
250260
if [ ! -d ./plan ]; then
251261
echo "No plan state found on branch, starting fresh"
252-
mkdir -p ./plan/planning ./plan/ready ./plan/done
262+
mkdir -p ./plan/planning ./plan/ready ./plan/done ./plan/benchmarks
253263

254264
ISSUE_BODY=$(gh issue view "$ISSUE_NUM" --repo "$UPSTREAM_REPO" \
255265
--json body,title --jq '"# Issue #'"$ISSUE_NUM"': " + .title + "\n\n" + .body')
@@ -264,7 +274,7 @@ EOF
264274
echo "Starting fresh: agent/${ISSUE_NUM}-${BRANCH_SLUG}"
265275
git fetch upstream
266276
git checkout -b "agent/${ISSUE_NUM}-${BRANCH_SLUG}" "upstream/$UPSTREAM_BASE_BRANCH"
267-
mkdir -p ./plan/planning ./plan/ready ./plan/done
277+
mkdir -p ./plan/planning ./plan/ready ./plan/done ./plan/benchmarks
268278

269279
ISSUE_BODY=$(gh issue view "$ISSUE_NUM" --repo "$UPSTREAM_REPO" \
270280
--json body,title --jq '"# Issue #'"$ISSUE_NUM"': " + .title + "\n\n" + .body')

autonomous/prompts/create-tasks.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,36 @@ Include tasks for:
2121
Use `[ ]` for each task.
2222
Validate the task list is complete by cross-referencing every acceptance criterion in the brief — each criterion must be covered by at least one task.
2323

24-
Any task that attempts to alter the ./.github folder will likely fail due to permissions restrictions. These changes should accompany the PR as an attached file with clear direction on the manual intervention required to complete the work.
24+
Any task that attempts to alter the ./.github folder will likely fail due to permissions restrictions. These changes should accompany the PR as an attached file with clear direction on the manual intervention required to complete the work.
25+
26+
If the brief category is `performance`, follow the benchmark-driven development process
27+
in spec/tech-standards/testing-standards.md. The task ordering MUST be:
28+
29+
Phase 1 — Benchmark scaffolding (before any implementation changes):
30+
- [ ] Create benchmark class(es) in HdrHistogram.Benchmarking/
31+
- Micro-benchmarks for the specific operations being optimised
32+
- End-to-end benchmarks that exercise the realistic user workflow
33+
- Add [MemoryDiagnoser] to all benchmark classes
34+
- Register new benchmarks in Program.cs BenchmarkSwitcher
35+
- [ ] Build and verify benchmarks compile:
36+
`dotnet build HdrHistogram.Benchmarking/ -c Release`
37+
- [ ] Run baseline benchmarks on the UNMODIFIED code
38+
- Run: `dotnet run -c Release --project HdrHistogram.Benchmarking/ -- --filter '*BenchmarkClass*' --exporters json`
39+
- Save formatted results table to `plan/benchmarks/baseline.md`
40+
- Include: Mean, StdDev, Allocated, Op/s for each benchmark method
41+
42+
Phase 2 — Implementation (the actual code changes):
43+
- [ ] (implementation tasks as normal)
44+
- [ ] (unit tests as normal)
45+
46+
Phase 3 — Benchmark validation (after implementation is complete):
47+
- [ ] Run post-change benchmarks with identical configuration
48+
- Save results to `plan/benchmarks/post-change.md`
49+
- [ ] Generate comparison in `plan/benchmarks/comparison.md` containing:
50+
- Side-by-side table: Benchmark | Baseline | Post-Change | Delta | Delta %
51+
- Summary: which metrics improved, which regressed, which unchanged
52+
- Verdict: does the data support the change?
53+
54+
Create the directory `plan/benchmarks/` for storing results.
55+
56+
If the brief category is `functional`, use the current ordering (implementation, tests, docs).

autonomous/prompts/execute-tasks.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,25 @@ After all tasks are marked `[x]`:
2424
2. If the review identifies issues, append new `[ ]` tasks to task.md describing each fix.
2525
3. If the review is clean, move brief.md and task.md to ./plan/done/
2626

27-
Process as many tasks as you can in this iteration.
27+
Process as many tasks as you can in this iteration.
28+
29+
Special handling for benchmark tasks (applies when brief category is `performance`):
30+
31+
Baseline capture (Phase 1 benchmark tasks):
32+
- After creating benchmark classes, commit ONLY the benchmark files:
33+
`git add HdrHistogram.Benchmarking/ && git commit -m "bench: add benchmarks for baseline capture"`
34+
- Run benchmarks in Release configuration. Use `--filter` to target only the relevant benchmarks.
35+
- BenchmarkDotNet outputs markdown tables to stdout and detailed results to `BenchmarkDotNet.Artifacts/`.
36+
Copy the results table into `plan/benchmarks/baseline.md`.
37+
- Do NOT proceed to Phase 2 implementation tasks until baseline results are captured and saved.
38+
39+
Post-change validation (Phase 3 benchmark tasks):
40+
- Run the exact same benchmark command used for the baseline.
41+
- Save results to `plan/benchmarks/post-change.md`.
42+
- Generate `plan/benchmarks/comparison.md` by reading both files and computing deltas.
43+
- If any benchmark shows a regression, flag it in comparison.md and add a new task to investigate.
44+
45+
Benchmark execution notes:
46+
- Both Phase 1 (baseline) and Phase 3 (final comparison) MUST use default BenchmarkDotNet settings — do NOT use `--job short` for these runs.
47+
- Use `--job short` ONLY for ad-hoc iteration during Phase 2 development.
48+
- Benchmarks that fail to compile or run must be fixed before proceeding.

autonomous/prompts/pick-issue.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,13 @@ Using the exploration results, create ./plan/planning/brief.md containing:
2222
- Acceptance criteria derived from the issue
2323
- Test strategy: which tests to add or modify
2424
- Risks or open questions
25+
- Category: classify as either `functional` or `performance`
26+
- `performance` if the issue mentions: allocation, memory, throughput, latency,
27+
GC pressure, benchmark, hot path, serialisation performance, or similar
28+
- `functional` for all other issues (bugs, features, refactors)
29+
- Benchmark strategy (required for `performance` issues, optional for `functional`):
30+
Follow the benchmark-driven development process in spec/tech-standards/testing-standards.md.
31+
Identify:
32+
- Which existing benchmarks are relevant (check HdrHistogram.Benchmarking/)
33+
- What new micro-benchmarks and end-to-end benchmarks are needed
34+
- Which metrics matter: throughput, allocation, GC collections

autonomous/prompts/review-brief.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,11 @@ Review the brief for:
1212
- Feasibility: Do the proposed changes align with what the code actually looks like?
1313
- Test strategy: Are there specific test cases identified?
1414
- Acceptance criteria: Are they measurable and verifiable?
15+
- Category: Is the classification correct? Performance issues MUST be marked `performance`.
16+
- Benchmark strategy (for `performance` issues, per spec/tech-standards/testing-standards.md):
17+
- Are both micro-benchmarks AND end-to-end benchmarks identified?
18+
- Do the benchmarks measure what the issue actually claims to improve?
19+
- Are the benchmarks testing the realistic hot path, not just the changed code in isolation?
1520

1621
If changes are needed: create ./plan/planning/brief-review.md with specific,
1722
actionable suggestions.

spec/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ This file provides guidance on where to find standards and specifications for th
4141
| encoding, LEB128, DEFLATE, compression | [Histogram Encoding](./tech-standards/histogram-encoding.md) |
4242
| log format, V2, persistence | [Histogram Encoding](./tech-standards/histogram-encoding.md) |
4343
| xUnit, test, FluentAssertions | [Testing Standards](./tech-standards/testing-standards.md) |
44+
| benchmark, performance, allocation, BenchmarkDotNet | [Testing Standards](./tech-standards/testing-standards.md) |
4445
| naming convention, XML docs, style | [Coding Standards](./tech-standards/coding-standards.md) |
4546
| build, NuGet, AppVeyor, CI/CD | [Build System](./tech-standards/build-system.md) |
4647
| milestone, issue, PR, GitHub | [GitHub CLI Reference](./tech-standards/github.md) |

spec/tech-standards/testing-standards.md

Lines changed: 67 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -171,13 +171,77 @@ Resource types:
171171

172172
## Performance Testing
173173

174-
Run tests in Release mode for accurate performance measurements:
174+
### Benchmarking Framework
175+
176+
The `HdrHistogram.Benchmarking/` project uses BenchmarkDotNet for performance measurement.
177+
All benchmarks must run in Release mode for accurate results.
178+
179+
| Component | Details |
180+
|-----------|---------|
181+
| Framework | BenchmarkDotNet |
182+
| Targets | net10.0, net9.0, net8.0 |
183+
| Entry point | `Program.cs` with `BenchmarkSwitcher` |
184+
| Diagnostics | `[MemoryDiagnoser]` on all benchmark classes |
185+
186+
### Benchmark-Driven Development
187+
188+
Performance issues follow a benchmark-first methodology, analogous to test-driven development.
189+
Benchmarks are the tests for non-functional requirements.
190+
191+
**Phase 1 — Benchmark scaffolding (before any implementation changes):**
192+
193+
1. Create benchmark classes covering both levels:
194+
- **Micro-benchmarks** — isolate the specific operation being optimised (e.g. `PutLong`/`GetLong`)
195+
- **End-to-end benchmarks** — exercise the realistic user workflow that the micro-operation feeds into (e.g. histogram encode/decode round-trip)
196+
2. Register benchmarks in `Program.cs` `BenchmarkSwitcher`
197+
3. Run benchmarks against the **unmodified code** to establish a baseline
198+
4. Record baseline results (Mean, StdDev, Allocated, Op/s)
199+
200+
**Phase 2 — Implementation:**
201+
202+
5. Apply the optimisation
203+
6. Add or update unit tests as normal
204+
205+
**Phase 3 — Validation (after implementation is complete):**
206+
207+
7. Run the same benchmarks with identical configuration
208+
8. Compare against baseline: generate a side-by-side table with deltas
209+
9. Document the results — including if the change shows no meaningful improvement
210+
211+
Both levels of benchmark are required because:
212+
213+
- Micro-benchmarks prove the isolated operation improved
214+
- End-to-end benchmarks prove the improvement matters in a realistic context
215+
- A micro-optimisation with no observable end-to-end impact may not justify the change
216+
217+
### Benchmark Design Guidelines
218+
219+
- Use `[MemoryDiagnoser]` on every benchmark class to surface allocation counts
220+
- Reset buffer positions or state in each benchmark method, not in setup (matches BDN iteration model)
221+
- Pre-allocate buffers in `[GlobalSetup]` so setup allocation is excluded from measurements
222+
- Use realistic data sizes and distributions that match production usage
223+
- Name benchmark methods clearly: `Encode`, `Decode`, `EncodeCompressed`, not `Test1`, `Bench_A`
224+
225+
### Running Benchmarks
175226

176227
```bash
177-
dotnet test --configuration Release
228+
# Build in Release mode (required)
229+
dotnet build HdrHistogram.Benchmarking/ -c Release
230+
231+
# Run specific benchmarks
232+
dotnet run -c Release --project HdrHistogram.Benchmarking/ -- --filter '*ClassName*'
233+
234+
# Export results as JSON for comparison
235+
dotnet run -c Release --project HdrHistogram.Benchmarking/ -- --filter '*ClassName*' --exporters json
236+
237+
# Quick iteration (fewer iterations, faster feedback)
238+
dotnet run -c Release --project HdrHistogram.Benchmarking/ -- --filter '*ClassName*' --job short
178239
```
179240

180-
Separate benchmarking project (`HdrHistogram.Benchmarking`) uses BenchmarkDotNet for micro-benchmarks.
241+
### Existing Benchmarks
242+
243+
Check `HdrHistogram.Benchmarking/` for the current set of benchmark classes.
244+
Each subdirectory groups benchmarks by area (e.g. `LeadingZeroCount/`, `Recording/`).
181245

182246
## Test Organization Guidelines
183247

0 commit comments

Comments
 (0)