Skip to content

x86: use generic x86-64-v{N} mcpu and explicit mattrs#9134

Merged
abadams merged 1 commit into
mainfrom
abadams/avx512_nesting
May 12, 2026
Merged

x86: use generic x86-64-v{N} mcpu and explicit mattrs#9134
abadams merged 1 commit into
mainfrom
abadams/avx512_nesting

Conversation

@abadams
Copy link
Copy Markdown
Member

@abadams abadams commented May 12, 2026

The x86 backend previously assumed AVX-512 microarchitectures form a strict feature hierarchy (Zen4 implies Cannonlake implies Skylake-AVX512 implies AVX512). It then drove -mcpu off the highest level in that chain, picking vendor-specific CPU names like znver4 or cannonlake.

That assumption isn't quite right: vendor-specific mcpu choices enable vendor-specific features for us (e.g. -mcpu=znver4 turns on sse4a, which Cannonlake doesn't have). A Halide flag higher in the hierarchy was therefore not actually a strict superset of one below it -- code built for AVX512_Zen4 could use SSE4a and fail to run on a Cannonlake CPU even though Halide treats Zen4 as "Cannonlake and above".

Switch mcpu_target() to pick only generic x86-64-v{N} levels, and have mattrs() explicitly enable every feature Halide tracks on top of that baseline. The Halide feature flags keep their existing meaning, but a level like AVX512_Zen4 now produces code that runs on the intersection of Zen4 and Cannonlake rather than the union, preserving the "each level is a superset of the level below" invariant the rest of the backend depends on. Per-CPU tuning via mcpu_tune() is untouched, so users who want znver4/znver5-specific scheduling still get it.

Verified with make correctness_simd_op_check_x86 (passes) and by running correctness_vector_reductions under Intel SDE's -chip_check_die for pnr/snb/hsw/skx/cnl/icx/spr with matching Halide targets (all pass).

The x86 backend previously assumed AVX-512 microarchitectures form a
strict feature hierarchy (Zen4 implies Cannonlake implies Skylake-AVX512
implies AVX512). It then drove `-mcpu` off the highest level in that
chain, picking vendor-specific CPU names like `znver4` or `cannonlake`.

That assumption isn't quite right: vendor-specific mcpu choices enable
vendor-specific features for us (e.g. `-mcpu=znver4` turns on `sse4a`,
which Cannonlake doesn't have). A Halide flag higher in the hierarchy
was therefore not actually a strict superset of one below it -- code
built for `AVX512_Zen4` could use SSE4a and fail to run on a Cannonlake
CPU even though Halide treats Zen4 as "Cannonlake and above".

Switch `mcpu_target()` to pick only generic `x86-64-v{N}` levels, and
have `mattrs()` explicitly enable every feature Halide tracks on top of
that baseline. The Halide feature flags keep their existing meaning,
but a level like `AVX512_Zen4` now produces code that runs on the
intersection of Zen4 and Cannonlake rather than the union, preserving
the "each level is a superset of the level below" invariant the rest of
the backend depends on. Per-CPU tuning via `mcpu_tune()` is untouched,
so users who want znver4/znver5-specific scheduling still get it.

Verified with `make correctness_simd_op_check_x86` (passes) and by
running `correctness_vector_reductions` under Intel SDE's
`-chip_check_die` for pnr/snb/hsw/skx/cnl/icx/spr with matching Halide
targets (all pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 87.50000% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@dd187a2). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/CodeGen_X86.cpp 87.50% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9134   +/-   ##
=======================================
  Coverage        ?   69.77%           
=======================================
  Files           ?      255           
  Lines           ?    77525           
  Branches        ?    18534           
=======================================
  Hits            ?    54094           
  Misses          ?    17953           
  Partials        ?     5478           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@abadams abadams merged commit 23b1f4f into main May 12, 2026
25 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants