Commit e8c634e
authored
Rollup merge of #151611 - bonega:improve-is-slice-is-ascii-performance, r=folkertdev
Improve is_ascii performance on x86_64 with explicit SSE2 intrinsics
# Summary
Improves `slice::is_ascii` performance for SSE2 target roughly 1.5-2x on larger inputs.
AVX-512 keeps similiar performance characteristics.
This is building on the work already merged in rust-lang/rust#151259.
In particular this PR improves the default SSE2 performance, I don't consider this a temporary fix anymore.
Thanks to @folkertdev for pointing me to consider `as_chunk` again.
# The implementation:
- Uses 64-byte chunks with 4x 16-byte SSE2 loads OR'd together
- Extracts the MSB mask with a single `pmovmskb` instruction
- Falls back to usize-at-a-time SWAR for inputs < 64 bytes
# Performance impact (vs before rust-lang/rust#151259):
- AVX-512: 34-48x faster
- SSE2: 1.5-2x faster
<details>
<summary>Benchmark Results (click to expand)</summary>
Benchmarked on AMD Ryzen 9 9950X (AVX-512 capable). Values show relative performance (1.00 = fastest).
Tops out at 139GB/s for large inputs.
### early_non_ascii
| Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
|------------|------------|----------|------------|----------|
| 64 | 1.01 | **1.00** | 13.45 | 1.13 |
| 1024 | 1.01 | **1.00** | 13.53 | 1.14 |
| 65536 | 1.01 | **1.00** | 13.99 | 1.12 |
| 1048576 | 1.02 | **1.00** | 13.29 | 1.12 |
### late_non_ascii
| Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
|------------|------------|----------|------------|----------|
| 64 | **1.00** | 1.01 | 13.37 | 1.13 |
| 1024 | 1.10 | **1.00** | 42.42 | 1.95 |
| 65536 | **1.00** | 1.06 | 42.22 | 1.73 |
| 1048576 | **1.00** | 1.03 | 34.73 | 1.46 |
### pure_ascii
| Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
|------------|------------|----------|------------|----------|
| 4 | 1.03 | **1.00** | 1.75 | 1.32 |
| 8 | **1.00** | 1.14 | 3.89 | 2.06 |
| 16 | **1.00** | 1.04 | 1.13 | 1.62 |
| 32 | 1.07 | 1.19 | 5.11 | **1.00** |
| 64 | **1.00** | 1.13 | 13.32 | 1.57 |
| 128 | **1.00** | 1.01 | 19.97 | 1.55 |
| 256 | **1.00** | 1.02 | 27.77 | 1.61 |
| 1024 | **1.00** | 1.02 | 41.34 | 1.84 |
| 4096 | 1.02 | **1.00** | 45.61 | 1.98 |
| 16384 | 1.01 | **1.00** | 48.67 | 2.04 |
| 65536 | **1.00** | 1.03 | 43.86 | 1.77 |
| 262144 | **1.00** | 1.06 | 41.44 | 1.79 |
| 1048576 | 1.02 | **1.00** | 35.36 | 1.44 |
</details>
## Reproduction / Test Projects
Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation
- `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
- `fuzz/` - Compares old/new implementations with libfuzzer
Relates to: llvm/llvm-project#1769060 file changed
0 commit comments