|
| 1 | +# Issue \#141: ByteBuffer — Massive Allocation Waste on Hot Serialisation Path |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +`ByteBuffer.cs` is the core serialisation primitive used throughout histogram encoding and decoding. |
| 6 | +Every call to `PutInt`, `PutLong`, `GetInt`, `GetLong`, `GetShort`, `PutDouble`, and `GetDouble` currently incurs one or more heap allocations: |
| 7 | + |
| 8 | +- `PutInt` and `PutLong` call `BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value))`, which allocates a `byte[]`, then immediately discards it after `Array.Copy`. |
| 9 | +- `GetInt`, `GetLong`, and `GetShort` call `IPAddress.HostToNetworkOrder(BitConverter.ToInt32/64(...))`, which also performs unnecessary work. |
| 10 | +- `PutDouble` calls `BitConverter.GetBytes` and `Array.Reverse`, allocating and mutating a temporary array. |
| 11 | +- `GetDouble` delegates to a private `ToInt64 → CheckedFromBytes → FromBytes` chain that manually loops over bytes when a direct API call is available. |
| 12 | + |
| 13 | +The `IPAddress` host/network order functions exist solely for byte-order conversion; they are a networking API being misused as an endianness utility. |
| 14 | +`System.Buffers.Binary.BinaryPrimitives` (available since `netstandard2.0` via the `System.Memory` package) provides `WriteInt64BigEndian`, `ReadInt64BigEndian`, and equivalents for all required widths, writing directly into a `Span<byte>` with zero allocation. |
| 15 | + |
| 16 | +On serialisation-heavy workloads (encoding thousands of histogram snapshots) this reduces GC pressure materially. |
| 17 | + |
| 18 | +## Affected Files |
| 19 | + |
| 20 | +| File | Change | |
| 21 | +|---|---| |
| 22 | +| `HdrHistogram/Utilities/ByteBuffer.cs` | Replace allocation-heavy implementations with `BinaryPrimitives` equivalents | |
| 23 | +| `HdrHistogram/HdrHistogram.csproj` | Add `System.Memory` package reference for `netstandard2.0` target (if not already present) | |
| 24 | +| `HdrHistogram.UnitTests/Utilities/ByteBufferTests.cs` | Add round-trip tests for `PutInt`/`GetInt`, `PutLong`/`GetLong`, `PutDouble`/`GetDouble`, and the positioned `PutInt(index, value)` overload | |
| 25 | +| `HdrHistogram.Benchmarking/` | Add a new `ByteBufferBenchmark` class to provide before/after evidence | |
| 26 | + |
| 27 | +## Required Code Changes |
| 28 | + |
| 29 | +### `PutLong` (line 267–272) |
| 30 | + |
| 31 | +```csharp |
| 32 | +// Before |
| 33 | +var longAsBytes = BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value)); |
| 34 | +Array.Copy(longAsBytes, 0, _internalBuffer, Position, longAsBytes.Length); |
| 35 | +Position += longAsBytes.Length; |
| 36 | + |
| 37 | +// After |
| 38 | +BinaryPrimitives.WriteInt64BigEndian(_internalBuffer.AsSpan(Position), value); |
| 39 | +Position += sizeof(long); |
| 40 | +``` |
| 41 | + |
| 42 | +### `GetLong` (line 131–136) |
| 43 | + |
| 44 | +```csharp |
| 45 | +// Before |
| 46 | +var longValue = IPAddress.HostToNetworkOrder(BitConverter.ToInt64(_internalBuffer, Position)); |
| 47 | +Position += sizeof(long); |
| 48 | +return longValue; |
| 49 | + |
| 50 | +// After |
| 51 | +var longValue = BinaryPrimitives.ReadInt64BigEndian(_internalBuffer.AsSpan(Position)); |
| 52 | +Position += sizeof(long); |
| 53 | +return longValue; |
| 54 | +``` |
| 55 | + |
| 56 | +### `PutInt(int value)` (line 241–246) and `PutInt(int index, int value)` (line 256–261) |
| 57 | + |
| 58 | +Replace `BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value))` + `Array.Copy` with `BinaryPrimitives.WriteInt32BigEndian`. |
| 59 | + |
| 60 | +### `GetInt` (line 120–125) and `GetShort` (line 109–114) |
| 61 | + |
| 62 | +Replace `IPAddress.HostToNetworkOrder(BitConverter.ToInt32/16(...))` with `BinaryPrimitives.ReadInt32BigEndian` / `ReadInt16BigEndian`. |
| 63 | + |
| 64 | +### `PutDouble` (line 278–285) |
| 65 | + |
| 66 | +```csharp |
| 67 | +// After — no allocation, no Array.Reverse |
| 68 | +BinaryPrimitives.WriteInt64BigEndian(_internalBuffer.AsSpan(Position), BitConverter.DoubleToInt64Bits(value)); |
| 69 | +Position += sizeof(double); |
| 70 | +``` |
| 71 | + |
| 72 | +### `GetDouble` (line 142–147) |
| 73 | + |
| 74 | +```csharp |
| 75 | +// After — replaces ToInt64/CheckedFromBytes/FromBytes/CheckByteArgument chain |
| 76 | +var longBits = BinaryPrimitives.ReadInt64BigEndian(_internalBuffer.AsSpan(Position)); |
| 77 | +Position += sizeof(double); |
| 78 | +return BitConverter.Int64BitsToDouble(longBits); |
| 79 | +``` |
| 80 | + |
| 81 | +Using `BitConverter.DoubleToInt64Bits` / `Int64BitsToDouble` with `BinaryPrimitives.WriteInt64BigEndian` / `ReadInt64BigEndian` is compatible with `netstandard2.0`, avoiding the need for `BinaryPrimitives.WriteDoubleBigEndian` which requires .NET 5+. |
| 82 | + |
| 83 | +Once `GetDouble` is rewritten the following private helpers become dead code and should be deleted: |
| 84 | + |
| 85 | +- `Int64BitsToDouble` (line 156–159) |
| 86 | +- `ToInt64` (line 167–170) |
| 87 | +- `CheckedFromBytes` (line 180–184) |
| 88 | +- `CheckByteArgument` (line 196–208) |
| 89 | +- `FromBytes` (line 218–226) |
| 90 | + |
| 91 | +The `using System.Net;` import should also be removed once `IPAddress` is no longer referenced. |
| 92 | + |
| 93 | +## Acceptance Criteria |
| 94 | + |
| 95 | +1. All public read/write methods (`GetShort`, `GetInt`, `PutInt`, `PutInt(index,value)`, `GetLong`, `PutLong`, `GetDouble`, `PutDouble`) use `BinaryPrimitives` with `AsSpan`, performing zero intermediate heap allocations. |
| 96 | +2. No references to `IPAddress`, `IPAddress.HostToNetworkOrder`, or `IPAddress.NetworkToHostOrder` remain in `ByteBuffer.cs`. |
| 97 | +3. No references to `BitConverter.GetBytes` or `Array.Reverse` remain in `ByteBuffer.cs`. |
| 98 | +4. The dead private helpers (`ToInt64`, `CheckedFromBytes`, `FromBytes`, `CheckByteArgument`, `Int64BitsToDouble`) are removed. |
| 99 | +5. All existing tests pass unchanged. |
| 100 | +6. New round-trip unit tests cover: `PutInt`/`GetInt`, `PutLong`/`GetLong`, `PutDouble`/`GetDouble`, and `PutInt(index, value)`. |
| 101 | +7. A new benchmark class exists in `HdrHistogram.Benchmarking/` demonstrating the allocation difference. |
| 102 | +8. The project builds and tests pass on all target frameworks: `net8.0`, `net9.0`, `net10.0`, `netstandard2.0`. |
| 103 | +9. `dotnet format` passes with no warnings. |
| 104 | + |
| 105 | +## Test Strategy |
| 106 | + |
| 107 | +### Unit tests to add (`ByteBufferTests.cs`) |
| 108 | + |
| 109 | +Add a new test class `ByteBufferReadWriteTests` (or extend the existing class) with: |
| 110 | + |
| 111 | +- `PutInt_and_GetInt_roundtrip` — write a known `int`, reset position, read it back, assert equality. Cover positive, negative, and `int.MaxValue`. |
| 112 | +- `PutInt_at_index_and_GetInt_roundtrip` — write to a specific index without advancing position; read from that index; assert equality. |
| 113 | +- `PutLong_and_GetLong_roundtrip` — same pattern for `long`. |
| 114 | +- `PutDouble_and_GetDouble_roundtrip` — same pattern for `double`. Include `double.NaN`, `double.PositiveInfinity`, and `0.0`. |
| 115 | +- `GetShort_returns_big_endian_value` — write known bytes in big-endian order into the raw buffer, call `GetShort`, assert result. |
| 116 | + |
| 117 | +All tests should use xUnit `[Theory]` with `[InlineData]` where multiple values are exercised. |
| 118 | + |
| 119 | +### Existing tests |
| 120 | + |
| 121 | +The single existing test (`ReadFrom_returns_all_bytes_when_stream_returns_partial_reads`) must continue to pass unmodified; it exercises a different code path and is unaffected by this change. |
| 122 | + |
| 123 | +### Integration / regression |
| 124 | + |
| 125 | +The existing histogram encoding and decoding tests (round-trip encode/decode of `LongHistogram` via `HistogramEncoderV2`) exercise the full stack and serve as integration regression coverage. These should be confirmed passing. |
| 126 | + |
| 127 | +## Benchmark |
| 128 | + |
| 129 | +Add `HdrHistogram.Benchmarking/ByteBuffer/ByteBufferBenchmark.cs` with: |
| 130 | + |
| 131 | +- `PutLong_Before` / `PutLong_After` benchmarks (or a single parameterised benchmark switching on implementation) |
| 132 | +- `GetLong_Before` / `GetLong_After` |
| 133 | +- Configured with `[MemoryDiagnoser]` to surface allocation counts |
| 134 | + |
| 135 | +The issue requires before/after benchmark results to accompany the PR. Because the "before" code will be replaced, record baseline numbers from the original code prior to the change, and include them in the PR description. |
| 136 | + |
| 137 | +## Risks and Open Questions |
| 138 | + |
| 139 | +1. **`netstandard2.0` compatibility** — `BinaryPrimitives` is in `System.Buffers.Binary` and `AsSpan()` on arrays requires `System.Memory`. These are available in `netstandard2.0` via the `System.Memory` NuGet package (version 4.5.x). Verify whether `HdrHistogram.csproj` already references this package; add it if not. |
| 140 | + |
| 141 | +2. **`BinaryPrimitives.WriteDoubleBigEndian` not available on `netstandard2.0`** — Mitigated by using `BinaryPrimitives.WriteInt64BigEndian(span, BitConverter.DoubleToInt64Bits(value))` instead, which is available across all target frameworks. |
| 142 | + |
| 143 | +3. **Byte-order correctness** — `IPAddress.HostToNetworkOrder` converts from host byte order (typically little-endian on x86/x64) to big-endian, and `BinaryPrimitives.WriteInt64BigEndian` writes in big-endian unconditionally. The replacement is semantically equivalent. This must be confirmed by the round-trip unit tests on a little-endian host. |
| 144 | + |
| 145 | +4. **`GetShort` semantics** — The current implementation calls `IPAddress.HostToNetworkOrder` on a value read with `BitConverter.ToInt16`, which means it reads the buffer as little-endian and converts. The replacement `BinaryPrimitives.ReadInt16BigEndian` reads directly as big-endian, which is correct. Verify by tracing the callers of `GetShort` (currently only `HistogramDecoder` variants). |
| 146 | + |
| 147 | +5. **`Memory<byte>` / `Span<byte>` refactor** — The issue mentions refactoring `ByteBuffer` to work over `Memory<byte>` or `Span<byte>` to allow caller-supplied pooled memory. This is noted as a secondary suggestion. It is a larger architectural change and should be treated as a separate issue rather than included here, to keep this PR focused and reviewable. |
0 commit comments