Skip to content

Add split-dim seed search#511

Open
AlbedoWang wants to merge 1 commit into
mainfrom
kaijian/split-dim-seed
Open

Add split-dim seed search#511
AlbedoWang wants to merge 1 commit into
mainfrom
kaijian/split-dim-seed

Conversation

@AlbedoWang

@AlbedoWang AlbedoWang commented Jun 29, 2026

Copy link
Copy Markdown

Mesh-discovery sweeps can make full strategy enumeration expensive because every mesh dimension multiplies the placement search space. This PR adds a small split-dim seed primitive: solve one one-dimensional sharding problem per mesh dimension, stitch the per-node placements into a full-mesh seed, then let the normal full-mesh optimizer search only a Hamming ball around that seed.

The seeded path keeps the existing optimizer lifecycle. Strategy generation applies the Hamming-ball restriction, then the usual PuLP decision variables, costs, and constraints are built. get_solution() still solves the restricted problem as an ILP, and the new solve_lp_relaxation() solves the same restricted PuLP problem continuously and reports objective/status/fractionality diagnostics, with solution extraction when the relaxation is integral.

For fabric-aware seed generation, each one-dimensional solve can reuse the NCCL topology derived from the corresponding full-mesh dimension. NCCLTopoConfig.mesh_dim_topo_override is limited to 1D dim0 so this override cannot silently change the general topology derivation path.

Review order

  1. autoparallel/mesh_search.pybuild_split_dim_seed(...), one-dimensional solve cache keys, fabric-aware per-dimension cost model handling, and seed construction for FX nodes plus input/output companion nodes.
  2. autoparallel/shardings/propagation_rules.py and autoparallel/shardings/placement_options.py — active seed state, Hamming-ball filtering, and cache-key isolation so seeded and unseeded placement options do not reuse each other.
  3. autoparallel/optimize_sharding.pystrategy_seed / strategy_radius wiring during metadata construction and the LP-relaxation solve path.
  4. autoparallel/cost_models/nccl_cost_model.py — one-dimensional mesh topology override for fabric-aware seed solves.
  5. Docs and tests — split-dim seed docs plus tests/test_split_dim_seed.py covering seed construction and restricted ILP/LP solve behavior.

Two compatibility fixes are included because they are exercised by the same CUDA CI matrix: local_map HOP target names are accepted by family instead of the older exact call_local_map* spelling, and the fake CUDA device properties used by tests include L2_cache_size for newer Inductor autotuning code.

Test plan

  • Lint checks matching CI: isort, black, mypy, flake8
  • CUDA 13 nightly pytest tests — 517 passed, 1 xfailed, 4 xpassed
  • CUDA examples: example_autoparallel.py, example_llama3.py, example_local_map.py
  • 4-GPU CUDA checks: tests/test_dcp_roundtrip.py -v and torchrun --standalone --nproc-per-node 4 examples/example_ds3_local_map.py

Authored with Codex.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 29, 2026
@AlbedoWang AlbedoWang force-pushed the kaijian/split-dim-seed branch 2 times, most recently from 0de7611 to edf7bb8 Compare June 29, 2026 21:09
Authored with Codex.
@AlbedoWang AlbedoWang force-pushed the kaijian/split-dim-seed branch from edf7bb8 to d59fc43 Compare June 29, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant