x86: use generic x86-64-v{N} mcpu and explicit mattrs#9134
Merged
Conversation
The x86 backend previously assumed AVX-512 microarchitectures form a
strict feature hierarchy (Zen4 implies Cannonlake implies Skylake-AVX512
implies AVX512). It then drove `-mcpu` off the highest level in that
chain, picking vendor-specific CPU names like `znver4` or `cannonlake`.
That assumption isn't quite right: vendor-specific mcpu choices enable
vendor-specific features for us (e.g. `-mcpu=znver4` turns on `sse4a`,
which Cannonlake doesn't have). A Halide flag higher in the hierarchy
was therefore not actually a strict superset of one below it -- code
built for `AVX512_Zen4` could use SSE4a and fail to run on a Cannonlake
CPU even though Halide treats Zen4 as "Cannonlake and above".
Switch `mcpu_target()` to pick only generic `x86-64-v{N}` levels, and
have `mattrs()` explicitly enable every feature Halide tracks on top of
that baseline. The Halide feature flags keep their existing meaning,
but a level like `AVX512_Zen4` now produces code that runs on the
intersection of Zen4 and Cannonlake rather than the union, preserving
the "each level is a superset of the level below" invariant the rest of
the backend depends on. Per-CPU tuning via `mcpu_tune()` is untouched,
so users who want znver4/znver5-specific scheduling still get it.
Verified with `make correctness_simd_op_check_x86` (passes) and by
running `correctness_vector_reductions` under Intel SDE's
`-chip_check_die` for pnr/snb/hsw/skx/cnl/icx/spr with matching Halide
targets (all pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexreinking
approved these changes
May 12, 2026
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9134 +/- ##
=======================================
Coverage ? 69.77%
=======================================
Files ? 255
Lines ? 77525
Branches ? 18534
=======================================
Hits ? 54094
Misses ? 17953
Partials ? 5478 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
slomp
approved these changes
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The x86 backend previously assumed AVX-512 microarchitectures form a strict feature hierarchy (Zen4 implies Cannonlake implies Skylake-AVX512 implies AVX512). It then drove
-mcpuoff the highest level in that chain, picking vendor-specific CPU names likeznver4orcannonlake.That assumption isn't quite right: vendor-specific mcpu choices enable vendor-specific features for us (e.g.
-mcpu=znver4turns onsse4a, which Cannonlake doesn't have). A Halide flag higher in the hierarchy was therefore not actually a strict superset of one below it -- code built forAVX512_Zen4could use SSE4a and fail to run on a Cannonlake CPU even though Halide treats Zen4 as "Cannonlake and above".Switch
mcpu_target()to pick only genericx86-64-v{N}levels, and havemattrs()explicitly enable every feature Halide tracks on top of that baseline. The Halide feature flags keep their existing meaning, but a level likeAVX512_Zen4now produces code that runs on the intersection of Zen4 and Cannonlake rather than the union, preserving the "each level is a superset of the level below" invariant the rest of the backend depends on. Per-CPU tuning viamcpu_tune()is untouched, so users who want znver4/znver5-specific scheduling still get it.Verified with
make correctness_simd_op_check_x86(passes) and by runningcorrectness_vector_reductionsunder Intel SDE's-chip_check_diefor pnr/snb/hsw/skx/cnl/icx/spr with matching Halide targets (all pass).