|
| 1 | +# Issue #141: ByteBuffer — Reduce Allocations for Serialisation Path |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +`ByteBuffer.PutInt`, `PutLong`, `GetInt`, and `GetLong` originally called `BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value))`, which allocates a new `byte[]` on every call and immediately copies it into the buffer. |
| 6 | +The fix replaces all such paths with `BinaryPrimitives.WriteInt##BigEndian` / `ReadInt##BigEndian` operating directly on `_internalBuffer.AsSpan(Position)`, achieving zero intermediate heap allocations. |
| 7 | +The `IPAddress.HostToNetworkOrder` / `NetworkToHostOrder` dance exists solely for big-endian byte ordering; `BinaryPrimitives` handles this directly. |
| 8 | +The same allocation pattern applies to `PutDouble` / `GetDouble` (via `BitConverter.GetBytes` + `Array.Reverse`) and `GetShort`. |
| 9 | + |
| 10 | +## Affected Files |
| 11 | + |
| 12 | +| File | Change | |
| 13 | +|------|--------| |
| 14 | +| `HdrHistogram/Utilities/ByteBuffer.cs` | Replace all `BitConverter`/`IPAddress` paths with `BinaryPrimitives`; remove dead helper methods | |
| 15 | +| `HdrHistogram.UnitTests/Utilities/ByteBufferTests.cs` | Add round-trip unit tests for all read/write methods | |
| 16 | +| `HdrHistogram.Benchmarking/ByteBuffer/ByteBufferBenchmark.cs` | New micro-benchmark for `PutLong` / `GetLong` (1 000 iterations, `[MemoryDiagnoser]`) | |
| 17 | +| `HdrHistogram.Benchmarking/Serialisation/SerialisationBenchmark.cs` | New end-to-end benchmark for `Encode` / `Decode` / `EncodeCompressed` / `DecodeCompressed` | |
| 18 | + |
| 19 | +## Acceptance Criteria |
| 20 | + |
| 21 | +- All `ByteBuffer` read/write methods use `BinaryPrimitives` with `.AsSpan()` — zero intermediate heap allocations per call. |
| 22 | +- No `System.Net.IPAddress` references remain in `ByteBuffer.cs`. |
| 23 | +- No `BitConverter.GetBytes` or `Array.Reverse` calls remain in `ByteBuffer.cs`. |
| 24 | +- Dead private helper methods (`Int64BitsToDouble`, `ToInt64`, `CheckedFromBytes`, `FromBytes`, `CheckByteArgument`) are removed. |
| 25 | +- All pre-existing tests (including the Issue #99 stream-read regression test) continue to pass. |
| 26 | +- New round-trip unit tests cover `PutInt`/`GetInt`, indexed `PutInt`, `PutLong`/`GetLong`, `PutDouble`/`GetDouble`, and `GetShort`. |
| 27 | +- Benchmark classes exist with `[MemoryDiagnoser]` and report zero managed allocations for `PutLong` / `GetLong`. |
| 28 | +- Project builds and tests pass on all target frameworks: `net8.0`, `net9.0`, `net10.0`, `netstandard2.0`. |
| 29 | + |
| 30 | +## Test Strategy |
| 31 | + |
| 32 | +### Unit Tests |
| 33 | + |
| 34 | +Add a new test class `ByteBufferReadWriteTests` in `HdrHistogram.UnitTests/Utilities/ByteBufferTests.cs`: |
| 35 | + |
| 36 | +- `PutInt_and_GetInt_roundtrip` — `[Theory]` with `[InlineData(42, -1, int.MaxValue)]` |
| 37 | +- `PutInt_at_index_and_GetInt_roundtrip` — verifies indexed write does not advance `Position` |
| 38 | +- `PutLong_and_GetLong_roundtrip` — `[Theory]` with `[InlineData(100L, -1L, long.MaxValue)]` |
| 39 | +- `PutDouble_and_GetDouble_roundtrip` — `[Theory]` covering `0.0`, `PositiveInfinity`, and `3.14159…` |
| 40 | +- `PutDouble_and_GetDouble_roundtrip_NaN` — separate `[Fact]` (NaN equality requires `double.IsNaN`) |
| 41 | +- `GetShort_returns_big_endian_value` — verifies byte-order correctness via raw byte setup |
| 42 | + |
| 43 | +Use `FluentAssertions` for assertions. |
| 44 | +Access `_internalBuffer` via a `ByteBufferTestHelper` reflection helper where needed. |
| 45 | + |
| 46 | +### Existing Tests |
| 47 | + |
| 48 | +The existing `ByteBufferTests.ReadFrom_returns_all_bytes_when_stream_returns_partial_reads` test (Issue #99 regression) must continue to pass unchanged. |
| 49 | +Full-stack encoding/decoding tests in `HdrHistogram.UnitTests/Persistence/HistogramEncodingTestBase.cs` exercise all callers and must continue to pass. |
| 50 | + |
| 51 | +## Category |
| 52 | + |
| 53 | +`performance` |
| 54 | + |
| 55 | +## Benchmark Strategy |
| 56 | + |
| 57 | +### Benchmark-Driven Development Process |
| 58 | + |
| 59 | +Follow the benchmark-driven process from `spec/tech-standards/testing-standards.md`: |
| 60 | + |
| 61 | +1. Add benchmarks **before** (or alongside) implementation changes so before/after results can be captured. |
| 62 | +2. Run benchmarks in Release configuration: `dotnet run -c Release --project HdrHistogram.Benchmarking`. |
| 63 | +3. Use `[MemoryDiagnoser]` to capture `Allocated` column alongside mean throughput. |
| 64 | +4. Record results in the PR description. |
| 65 | + |
| 66 | +### Existing Relevant Benchmarks |
| 67 | + |
| 68 | +- `HdrHistogram.Benchmarking/Recording/Recording32BitBenchmark.cs` — recording throughput across histogram types; not directly affected. |
| 69 | +- `HdrHistogram.Benchmarking/LeadingZeroCount/` — bit-manipulation benchmarks; not affected. |
| 70 | + |
| 71 | +### New Micro-Benchmarks (`ByteBufferBenchmark.cs`) |
| 72 | + |
| 73 | +Class: `ByteBufferBenchmark` in `HdrHistogram.Benchmarking/ByteBuffer/` |
| 74 | + |
| 75 | +| Benchmark | What it measures | |
| 76 | +|-----------|-----------------| |
| 77 | +| `PutLong_After` | Writes 1 000 `long` values sequentially into a pre-allocated buffer | |
| 78 | +| `GetLong_After` | Reads 1 000 `long` values sequentially from a pre-populated buffer | |
| 79 | + |
| 80 | +Setup: allocate two `ByteBuffer` instances of `1000 * sizeof(long)` bytes; populate the read buffer with `i * 12345678L` values. |
| 81 | +Expected result after optimisation: `Allocated = 0 B` for both benchmarks. |
| 82 | + |
| 83 | +### New End-to-End Benchmarks (`SerialisationBenchmark.cs`) |
| 84 | + |
| 85 | +Class: `SerialisationBenchmark` in `HdrHistogram.Benchmarking/Serialisation/` |
| 86 | + |
| 87 | +| Benchmark | What it measures | |
| 88 | +|-----------|-----------------| |
| 89 | +| `Encode` | Encodes a 10 000-value `LongHistogram` to an uncompressed `ByteBuffer` | |
| 90 | +| `Decode` | Decodes an uncompressed `ByteBuffer` back to a `HistogramBase` | |
| 91 | +| `EncodeCompressed` | Encodes to a DEFLATE-compressed `ByteBuffer` | |
| 92 | +| `DecodeCompressed` | Decodes from a DEFLATE-compressed `ByteBuffer` | |
| 93 | + |
| 94 | +Setup: create `LongHistogram(3600_000_000L, 3)` with 10 000 recorded values spanning the range; pre-encode buffers for decode benchmarks. |
| 95 | + |
| 96 | +### Metrics That Matter |
| 97 | + |
| 98 | +| Metric | Why | |
| 99 | +|--------|-----| |
| 100 | +| `Mean` (ns) | Raw throughput improvement | |
| 101 | +| `Allocated` (B) | Must be `0 B` for `PutLong`/`GetLong` after the fix | |
| 102 | +| GC collections (`Gen0`) | Reduced allocation removes Gen 0 pressure on serialisation-heavy workloads | |
| 103 | + |
| 104 | +## Risks and Open Questions |
| 105 | + |
| 106 | +- **`netstandard2.0` compatibility**: `System.Buffers.Binary.BinaryPrimitives` is available via the `System.Memory` NuGet package on `netstandard2.0`. |
| 107 | + Verify that `HdrHistogram.csproj` references `System.Memory` (or that it is pulled in transitively) when targeting `netstandard2.0`. |
| 108 | +- **`BitConverter.Int64BitsToDouble` on `netstandard2.0`**: Available since .NET Standard 1.1; no compatibility risk. |
| 109 | +- **`double` NaN round-trip**: `BinaryPrimitives.ReadInt64BigEndian` + `BitConverter.Int64BitsToDouble` preserves all NaN bit patterns; test explicitly. |
| 110 | +- **No `Span<T>`-based public API**: The issue mentions refactoring `ByteBuffer` to accept `Memory<byte>` or `Span<byte>` from callers. |
| 111 | + This is a larger structural change and is out of scope for this issue; note it as a follow-up. |
0 commit comments