feat: parallel candidate evaluation via worktrees#2121
Closed
Conversation
e68f7e2 to
9677b56
Compare
Move OptimizedCandidateSource and BatchRefiner models from models.py to shared_types.py to avoid a circular dependency between models.py and function_optimizer.py.
Pool of N git worktree slots with async acquire/release semantics. Each slot provides write_candidate() and mirror() for file isolation.
Add async_execute_test_subprocess() for running pytest via asyncio.create_subprocess_exec with stdout/stderr capture and timeout. Add --parallel-candidates N CLI argument. Add anyio dependency.
Phase 1 (concurrent): behavioral correctness tests run in parallel. Failed candidates release their worktree slot immediately. Phase 2 (sequential): only passing candidates get benchmarked, one at a time, for accurate timing without CPU contention. EvalFailure carries test diffs for repair context.
Adds the API method for submitting multiple candidates for refinement in a single request — used by the parallel evaluator to dispatch refinement/repair after evaluation completes.
Line profiler needs @Profile instrumented in the main tree, so it must run after candidate selection rather than inside the worktree. This method handles write → profile → restore for the parallel path. Also adds # mypy: ignore-errors — this file has 181 pre-existing mypy errors unrelated to this PR.
Wires the parallel evaluation path into _evaluate_candidates: - Checks --parallel-candidates flag to branch between sequential/parallel - Batches candidates with dedup/normalization gating - Dispatches repair and refinement futures from evaluation results - Calls _run_line_profiler_for_winner after selection New methods: _evaluate_candidates_parallel, _dispatch_refinement, _dispatch_repair_if_possible.
Covers the full stack: pool lifecycle/cleanup, file isolation between slots, subprocess stdout/stderr/timeout, and evaluator logic (failure with diffs, success routing, concurrent multi-candidate).
… evaluator Critical fixes from code review: - Deadlock: slots are now released after behavioral tests (Phase 1), re-acquired for benchmarking (Phase 2). Previously, holding slots across phases caused deadlock when passes >= pool_size. - Pydantic ValidationError: behavior_test_results is now stored in _BehavioralPass and passed through to OptimizedCandidateResult. - Slot leak on cancellation: catch BaseException in _behavioral_phase. WorktreePool improvements: - Graceful partial creation failure (one slot failing doesn't crash pool). - Cleanup resilience (one rmtree failure doesn't abort others). - Stream lifecycle: close send/receive in cleanup(). - Async-safe: use anyio.Path for exists() checks. - Python 3.12+: use onexc instead of deprecated onerror for rmtree. - Remove dead code: PID file, unused restore_file method. Other fixes: - _run_line_profiler_for_winner: catch all exceptions. - _dispatch_repair_if_possible: skip when diffs are empty. - aiservice.py: pass language to _get_valid_candidates in batch path. - Remove unused AIServiceBatchRefinerRequest dataclass. - Fix result file path collision: include slot.index in filename. - Remove _code_replace_lock (no longer needed since slots are released immediately and _replace_and_capture is serialized by GIL).
…ession test - Parallel path now checks if a successful candidate was previously refined (via path_to_root ancestry). If so, dispatches adaptive optimization instead of batch refinement — matching sequential behavior. - Adds regression test: 6 candidates with pool_size=2 all pass, proving no deadlock occurs when passes exceed available slots.
- Add replace_lock to serialize main-tree access in _replace_and_capture - Fix Phase 2 benchmark not writing candidate code to fresh worktree slot - Add _closed flag and ClosedResourceError suppression in pool release - Broaden exception handling and protect finally restore block - Remove unused eval_ctx/exp_type params from run_parallel_evaluation - Add tests for re-staging, partial pool init, restore-on-failure, empty candidates
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--parallel-candidates NCLI flag that evaluates optimization candidates in isolated git worktrees concurrently via aWorktreePoolThreadPoolExecutor(no async client needed)Key design decisions
@profileinstrumentation)CandidateProcessor'sconcurrent.futures.FutureexpectationsTest plan
topological_sort.pywith--parallel-candidates 4WorktreePooland async subprocess execution