|
| 1 | +--- |
| 2 | +title: "To the Collider!" |
| 3 | +date: 2026-03-05 |
| 4 | +description: "Benchmarking isn't hard. One attribute, one run, and you know exactly what's faster. Putting it to practice with Testo." |
| 5 | +image: /blog/collider/img-00.jpg |
| 6 | +author: Aleksei Gagarin |
| 7 | +--- |
| 8 | + |
| 9 | +::: info 🤔 The Problem |
| 10 | +Write a function that calculates the sum of all numbers from `$a` to `$b`. |
| 11 | + |
| 12 | +For example, if `$a = 1` and `$b = 5`, the result is `1 + 2 + 3 + 4 + 5 = 15`. |
| 13 | +::: |
| 14 | + |
| 15 | +1. The simplest solution that comes to mind: |
| 16 | + iteratively add `$i` to `$a` in a `for` loop until we reach `$b`, but that's too obvious. |
| 17 | +2. Imagination takes over and you want to solve it with arrays: |
| 18 | + fill an array with values from `$a` to `$b` and pass it to `sum()`. |
| 19 | + |
| 20 | + |
| 21 | + |
| 22 | +## Comparing the Solutions |
| 23 | + |
| 24 | +PHP performs very well in synthetic benchmarks, outpacing Python and all that. With JIT enabled and a few tricks, it can even catch up to C++. |
| 25 | + |
| 26 | +We won't be comparing PHP with other languages right now — instead, let's just compare these two approaches against each other. |
| 27 | + |
| 28 | +Here's my reasoning: |
| 29 | + |
| 30 | +> The array solution should be slower than the `for` loop, since extra resources go into computing hashes for the hash table when creating the array, and more memory is needed for intermediate values. |
| 31 | +
|
| 32 | +Let's verify this: we'll write the functions and add the `#[BenchWith]` attribute to one of them. |
| 33 | + |
| 34 | +```php |
| 35 | +#[BenchWith( |
| 36 | + callables: [ |
| 37 | + 'in_array' => [self::class, 'sumInArray'], |
| 38 | + ], |
| 39 | + arguments: [1, 5_000], |
| 40 | + calls: 100, |
| 41 | + iterations: 1, |
| 42 | +)] |
| 43 | +public static function sumInCycle(int $a, int $b): int |
| 44 | +{ |
| 45 | + $result = 0; |
| 46 | + for ($i = $a; $i <= $b; ++$i) { |
| 47 | + $result += $i; |
| 48 | + } |
| 49 | + |
| 50 | + return $result; |
| 51 | +} |
| 52 | + |
| 53 | +public static function sumInArray(int $a, int $b): int |
| 54 | +{ |
| 55 | + return \array_sum(\range($a, $b)); |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +With the `#[BenchWith]` attribute, we're telling Testo that: |
| 60 | +- we want to compare the performance of the current function (`sumInCycle`) with another function (`sumInArray`); |
| 61 | +- both functions will receive the same arguments: `1` and `5_000`; |
| 62 | +- to measure execution time, each function will be called 100 times in a row (`calls: 100`). |
| 63 | + |
| 64 | +Place your bets and let's run it. |
| 65 | + |
| 66 | +``` |
| 67 | +Summary: |
| 68 | ++---+-------------+-------+-------+--------+-------------------+ |
| 69 | +| # | Name | Iters | Calls | Memory | Avg Time | |
| 70 | ++---+-------------+-------+-------+--------+-------------------+ |
| 71 | +| 2 | current | 1 | 100 | 0 | 38.921ms | |
| 72 | +| 1 | sumInArray | 1 | 100 | 0 | 21.472ms (-44.8%) | |
| 73 | ++---+-------------+-------+-------+--------+-------------------+ |
| 74 | +``` |
| 75 | + |
| 76 | +`sumInArray` takes first place, completing the task almost twice as fast as `sumInCycle`. |
| 77 | + |
| 78 | +> Wait, what? The array-based function won by a wide margin?! |
| 79 | +
|
| 80 | + |
| 81 | + |
| 82 | + |
| 83 | +## Statistical Artifact |
| 84 | + |
| 85 | +Indeed, this could just be a "statistical artifact." |
| 86 | + |
| 87 | +Each benchmark rerun produces different results, sometimes varying significantly from the previous ones. |
| 88 | +This can be caused by background tasks, user activity, or other phenomena that affect performance at the moment. |
| 89 | + |
| 90 | +::: warning ⚠️ We need guarantees that we're comparing genuinely stable results, not just random outliers. |
| 91 | +::: |
| 92 | + |
| 93 | +Statistics comes to the rescue with the [coefficient of variation](https://en.wikipedia.org/wiki/Coefficient_of_variation), which measures the relative variability of data. |
| 94 | +The smaller this coefficient, the more stable the results. |
| 95 | + |
| 96 | +All we need to do is collect more data spread over time — that is, rerun the benchmarks multiple times. |
| 97 | +The `#[BenchWith]` attribute has an `iterations` parameter responsible for the number of benchmark reruns. |
| 98 | + |
| 99 | +Let's set `iterations: 10` and rerun: |
| 100 | + |
| 101 | +``` |
| 102 | +Summary: |
| 103 | ++---+-------------+-------+-------+--------+-------------------+---------+ |
| 104 | +| # | Name | Iters | Calls | Memory | Avg Time | RStDev | |
| 105 | ++---+-------------+-------+-------+--------+-------------------+---------+ |
| 106 | +| 2 | current | 10 | 100 | 0 | 38.474ms | ±2.86% | |
| 107 | +| 1 | sumInArray | 10 | 100 | 0 | 12.501ms (-67.5%) | ±27.20% | |
| 108 | ++---+-------------+-------+-------+--------+-------------------+---------+ |
| 109 | +``` |
| 110 | + |
| 111 | +Now `sumInArray` runs 3x faster, but the coefficient of variation (`RStDev` column) is 27.2%, which is quite high. |
| 112 | +To claim stable results, you typically aim for RStDev < 2%. |
| 113 | + |
| 114 | +Let's think about how to reduce this variation. Our functions execute quite fast, and even small performance fluctuations can heavily impact the results, especially with a low number of runs. |
| 115 | +For fast code like ours, increasing the number of `calls` per iteration can help. Let's bump it to 2000: |
| 116 | + |
| 117 | +``` |
| 118 | +Summary: |
| 119 | ++---+-------------+-------+-------+--------+-------------------+--------+ |
| 120 | +| # | Name | Iters | Calls | Memory | Avg Time | RStDev | |
| 121 | ++---+-------------+-------+-------+--------+-------------------+--------+ |
| 122 | +| 2 | current | 10 | 2000 | 0 | 37.888ms | ±1.38% | |
| 123 | +| 1 | sumInArray | 10 | 2000 | 0 | 11.395ms (-69.9%) | ±1.72% | |
| 124 | ++---+-------------+-------+-------+--------+-------------------+--------+ |
| 125 | +``` |
| 126 | + |
| 127 | +As you can see, with a 3x performance difference, even ±27.2% variation wouldn't have saved the `for` loop from defeat. But now we can confidently claim the results are stable (RStDev < 2%). |
| 128 | + |
| 129 | + |
| 130 | + |
| 131 | + |
| 132 | +By the way, did you notice that `memory=0` in both cases? This means no additional memory was allocated for the arrays — what was already allocated at benchmark startup was enough. |
| 133 | + |
| 134 | +Of course, you could experiment with a larger range, enable JIT, and prove that in some cases the loop would be faster, |
| 135 | +but I want to draw your attention to how quick it is to benchmark something now! |
| 136 | + |
| 137 | + |
| 138 | +## BenchWith |
| 139 | + |
| 140 | + |
| 141 | + |
| 142 | +Benchmarking right in your code without extra boilerplate. Like [inline tests](/docs/inline-tests.md), but for benchmarks. |
| 143 | + |
| 144 | +[Dragon Code](https://github.com/TheDragonCode/benchmark) once showed that benchmarks can be simple and convenient: instead of tons of boilerplate, just call a single class and pass closures for comparison. |
| 145 | +Testo takes this to the next level: from intent to result in just one attribute. |
| 146 | + |
| 147 | +Run it with a single click in your IDE: |
| 148 | + |
| 149 | + |
| 150 | +But that's not all. Behind the simplicity on the surface lie serious algorithms backed by statistics. |
| 151 | + |
| 152 | +Testo automatically detects deviations in the data, discards outliers, and produces metrics that help you understand how stable the results are. |
| 153 | +For those who don't find raw numbers very telling, there's a summary with recommendations and alerts. |
| 154 | + |
| 155 | +Here's what it looks like right now: |
| 156 | + |
| 157 | +``` |
| 158 | +Results for calcFoo: |
| 159 | ++----------------------------+-------------------------------------------------+------------------------------------+--------------------------------------------------------------+ |
| 160 | +| BENCHMARK SETUP | TIME RESULTS | FILTERED RESULTS | SUMMARY | |
| 161 | +| Name | Iters | Calls | Mean | Median | RStDev | Rej. | Mean* | RStDev* | Place | Warnings | |
| 162 | ++------------+-------+-------+-------------------+-------------------+---------+------+-------------------+---------+-------+------------------------------------------------------+ |
| 163 | +| current | 10 | 20 | 44.03µs | 43.68µs | ±2.35% | 1 | 43.69µs | ±0.42% | 3rd | | |
| 164 | +| calcBar | 10 | 20 | 13.72µs (-68.8%) | 13.26µs (-69.6%) | ±7.77% | 2 | 13.23µs (-69.7%) | ±0.52% | 2nd | | |
| 165 | +| calcBaz | 10 | 20 | 110.50ns (-99.7%) | 105.00ns (-99.8%) | ±16.50% | 1 | 106.11ns (-99.8%) | ±12.52% | 1st | High variance, low iter time. Insufficient iter time | |
| 166 | ++------------+-------+-------+-------------------+-------------------+---------+------+-------------------+---------+-------+------------------------------------------------------+ |
| 167 | +Recommendations: |
| 168 | + ⚠ High variance, low iter time: Measurement overhead may dominate — increase calls per iteration. |
| 169 | + ⚠ Insufficient iter time: Timer jitter exceeds useful signal — increase calls per iteration. |
| 170 | +``` |
| 171 | + |
| 172 | +I know, it looks overwhelming, but this isn't the release version yet. In the future I envision it being even simpler: an attribute with automatic settings, no need to dive into the details. |
| 173 | + |
| 174 | +## Back to the Collider |
| 175 | + |
| 176 | +Alright, you obviously know that the range sum problem can be solved in `O(1)` using a simple mathematical formula. |
| 177 | +I'll deprive you of the pleasure of pointing this out in the comments. |
| 178 | + |
| 179 | +Here's the function and a benchmark against the previous solutions: |
| 180 | + |
| 181 | +```php |
| 182 | +public static function sumLinearF(int $a, int $b): int |
| 183 | +{ |
| 184 | + $d = $b - $a + 1; |
| 185 | + return (int) (($d - 1) * $d / 2) + $a * $d; |
| 186 | +} |
| 187 | +``` |
| 188 | + |
| 189 | +``` |
| 190 | +Summary: |
| 191 | ++---+-------------+-------+-------+-------------------+--------+ |
| 192 | +| # | Name | Iters | Calls | Avg Time | RStDev | |
| 193 | ++---+-------------+-------+-------+-------------------+--------+ |
| 194 | +| 4 | current | 10 | 2000 | 40.102ms | ±1.09% | |
| 195 | +| 2 | sumInArray | 10 | 2000 | 12.232ms (-69.5%) | ±0.93% | |
| 196 | +| 1 | sumLinear | 10 | 2000 | 77.065µs (-99.8%) | ±3.05% | |
| 197 | ++---+-------------+-------+-------+-------------------+--------+ |
| 198 | +``` |
| 199 | + |
| 200 | +Microseconds instead of milliseconds. Pretty cool, right? |
| 201 | + |
| 202 | +And even here there's room for optimization. You've probably heard that division isn't always the fastest operation. |
| 203 | +Dividing by 2 can be replaced with multiplying by 0.5. |
| 204 | + |
| 205 | + |
| 206 | + |
| 207 | +```php |
| 208 | +public static function multi(int $a, int $b): int |
| 209 | +{ |
| 210 | + $d = $b - $a + 1; |
| 211 | + return (int) (($d - 1) * $d * 0.5) + $a * $d; |
| 212 | +} |
| 213 | +``` |
| 214 | + |
| 215 | +``` |
| 216 | ++---+---------+-------+---------+--------+------------------+--------+ |
| 217 | +| # | Name | Iters | Calls | Memory | Avg Time | RStDev | |
| 218 | ++---+---------+-------+---------+--------+------------------+--------+ |
| 219 | +| 1 | current | 10 | 2000000 | 0 | 75.890µs | ±0.79% | |
| 220 | +| 2 | multi | 10 | 2000000 | 0 | 78.821µs (+3.9%) | ±0.47% | |
| 221 | ++---+---------+-------+---------+--------+------------------+--------+ |
| 222 | +``` |
| 223 | + |
| 224 | + |
| 225 | +::: info Division is faster than multiplication (╯°□°)╯︵ ┻━━┻ |
| 226 | + |
| 227 | +Expectations don't always match reality, and optimizations don't always work the way we think. |
| 228 | +::: |
| 229 | + |
| 230 | +Also, remembering that we're working with positive integers in binary, we can replace division with a bit shift, which in theory should be even faster. |
| 231 | + |
| 232 | + |
| 233 | + |
| 234 | + |
| 235 | +```php |
| 236 | +public static function shift(int $a, int $b): int |
| 237 | +{ |
| 238 | + $d = $b - $a + 1; |
| 239 | + return ((($d - 1) * $d) >> 1) + $a * $d; |
| 240 | +} |
| 241 | +``` |
| 242 | + |
| 243 | +``` |
| 244 | ++---+---------+-------+---------+--------+------------------+--------+ |
| 245 | +| # | Name | Iters | Calls | Memory | Avg Time | RStDev | |
| 246 | ++---+---------+-------+---------+--------+------------------+--------+ |
| 247 | +| 2 | current | 10 | 2000000 | 0 | 75.890µs | ±0.79% | |
| 248 | +| 1 | shift | 10 | 2000000 | 0 | 70.559µs (-7.0%) | ±0.70% | |
| 249 | ++---+---------+-------+---------+--------+------------------+--------+ |
| 250 | +``` |
| 251 | + |
| 252 | +At least the bit shift didn't let us down. |
| 253 | + |
| 254 | +Note that the 7% improvement doesn't mean bit shifting is exactly 7% faster than division. |
| 255 | +The function contains several other mathematical operations, and the function call itself takes some time. |
| 256 | +So 7% is the difference between two functions, not between two specific operations. |
| 257 | + |
| 258 | +::: info 💡 It's always important to understand what exactly is being compared, so you can correctly interpret the results. |
| 259 | +::: |
| 260 | + |
| 261 | + |
| 262 | + |
| 263 | + |
| 264 | +Use benchmarks, verify your assumptions, and find optimal solutions. |
0 commit comments