x86: use generic x86-64-v{N} mcpu and explicit mattrs by abadams · Pull Request #9134 · halide/Halide

abadams · 2026-05-12T19:10:10Z

The x86 backend previously assumed AVX-512 microarchitectures form a strict feature hierarchy (Zen4 implies Cannonlake implies Skylake-AVX512 implies AVX512). It then drove -mcpu off the highest level in that chain, picking vendor-specific CPU names like znver4 or cannonlake.

That assumption isn't quite right: vendor-specific mcpu choices enable vendor-specific features for us (e.g. -mcpu=znver4 turns on sse4a, which Cannonlake doesn't have). A Halide flag higher in the hierarchy was therefore not actually a strict superset of one below it -- code built for AVX512_Zen4 could use SSE4a and fail to run on a Cannonlake CPU even though Halide treats Zen4 as "Cannonlake and above".

Switch mcpu_target() to pick only generic x86-64-v{N} levels, and have mattrs() explicitly enable every feature Halide tracks on top of that baseline. The Halide feature flags keep their existing meaning, but a level like AVX512_Zen4 now produces code that runs on the intersection of Zen4 and Cannonlake rather than the union, preserving the "each level is a superset of the level below" invariant the rest of the backend depends on. Per-CPU tuning via mcpu_tune() is untouched, so users who want znver4/znver5-specific scheduling still get it.

Verified with make correctness_simd_op_check_x86 (passes) and by running correctness_vector_reductions under Intel SDE's -chip_check_die for pnr/snb/hsw/skx/cnl/icx/spr with matching Halide targets (all pass).

The x86 backend previously assumed AVX-512 microarchitectures form a strict feature hierarchy (Zen4 implies Cannonlake implies Skylake-AVX512 implies AVX512). It then drove `-mcpu` off the highest level in that chain, picking vendor-specific CPU names like `znver4` or `cannonlake`. That assumption isn't quite right: vendor-specific mcpu choices enable vendor-specific features for us (e.g. `-mcpu=znver4` turns on `sse4a`, which Cannonlake doesn't have). A Halide flag higher in the hierarchy was therefore not actually a strict superset of one below it -- code built for `AVX512_Zen4` could use SSE4a and fail to run on a Cannonlake CPU even though Halide treats Zen4 as "Cannonlake and above". Switch `mcpu_target()` to pick only generic `x86-64-v{N}` levels, and have `mattrs()` explicitly enable every feature Halide tracks on top of that baseline. The Halide feature flags keep their existing meaning, but a level like `AVX512_Zen4` now produces code that runs on the intersection of Zen4 and Cannonlake rather than the union, preserving the "each level is a superset of the level below" invariant the rest of the backend depends on. Per-CPU tuning via `mcpu_tune()` is untouched, so users who want znver4/znver5-specific scheduling still get it. Verified with `make correctness_simd_op_check_x86` (passes) and by running `correctness_vector_reductions` under Intel SDE's `-chip_check_die` for pnr/snb/hsw/skx/cnl/icx/spr with matching Halide targets (all pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-12T20:28:07Z

Codecov Report

❌ Patch coverage is 87.50000% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@dd187a2). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/CodeGen_X86.cpp	87.50%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #9134   +/-   ##
=======================================
  Coverage        ?   69.77%           
=======================================
  Files           ?      255           
  Lines           ?    77525           
  Branches        ?    18534           
=======================================
  Hits            ?    54094           
  Misses          ?    17953           
  Partials        ?     5478

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

alexreinking approved these changes May 12, 2026

View reviewed changes

slomp approved these changes May 12, 2026

View reviewed changes

abadams merged commit 23b1f4f into main May 12, 2026
25 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86: use generic x86-64-v{N} mcpu and explicit mattrs#9134

x86: use generic x86-64-v{N} mcpu and explicit mattrs#9134
abadams merged 1 commit into
mainfrom
abadams/avx512_nesting

abadams commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abadams commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants