Add split-dim seed search#511
Open
AlbedoWang wants to merge 1 commit into
Open
Conversation
0de7611 to
edf7bb8
Compare
Authored with Codex.
edf7bb8 to
d59fc43
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mesh-discovery sweeps can make full strategy enumeration expensive because every mesh dimension multiplies the placement search space. This PR adds a small split-dim seed primitive: solve one one-dimensional sharding problem per mesh dimension, stitch the per-node placements into a full-mesh seed, then let the normal full-mesh optimizer search only a Hamming ball around that seed.
The seeded path keeps the existing optimizer lifecycle. Strategy generation applies the Hamming-ball restriction, then the usual PuLP decision variables, costs, and constraints are built.
get_solution()still solves the restricted problem as an ILP, and the newsolve_lp_relaxation()solves the same restricted PuLP problem continuously and reports objective/status/fractionality diagnostics, with solution extraction when the relaxation is integral.For fabric-aware seed generation, each one-dimensional solve can reuse the NCCL topology derived from the corresponding full-mesh dimension.
NCCLTopoConfig.mesh_dim_topo_overrideis limited to 1D dim0 so this override cannot silently change the general topology derivation path.Review order
autoparallel/mesh_search.py—build_split_dim_seed(...), one-dimensional solve cache keys, fabric-aware per-dimension cost model handling, and seed construction for FX nodes plus input/output companion nodes.autoparallel/shardings/propagation_rules.pyandautoparallel/shardings/placement_options.py— active seed state, Hamming-ball filtering, and cache-key isolation so seeded and unseeded placement options do not reuse each other.autoparallel/optimize_sharding.py—strategy_seed/strategy_radiuswiring during metadata construction and the LP-relaxation solve path.autoparallel/cost_models/nccl_cost_model.py— one-dimensional mesh topology override for fabric-aware seed solves.tests/test_split_dim_seed.pycovering seed construction and restricted ILP/LP solve behavior.Two compatibility fixes are included because they are exercised by the same CUDA CI matrix: local_map HOP target names are accepted by family instead of the older exact
call_local_map*spelling, and the fake CUDA device properties used by tests includeL2_cache_sizefor newer Inductor autotuning code.Test plan
pytest tests— 517 passed, 1 xfailed, 4 xpassedexample_autoparallel.py,example_llama3.py,example_local_map.pytests/test_dcp_roundtrip.py -vandtorchrun --standalone --nproc-per-node 4 examples/example_ds3_local_map.pyAuthored with Codex.