Skip to content

feat: parallel candidate evaluation via worktrees#2121

Closed
KRRT7 wants to merge 11 commits intomainfrom
codeflash-agent
Closed

feat: parallel candidate evaluation via worktrees#2121
KRRT7 wants to merge 11 commits intomainfrom
codeflash-agent

Conversation

@KRRT7
Copy link
Copy Markdown
Collaborator

@KRRT7 KRRT7 commented May 6, 2026

Summary

  • Adds --parallel-candidates N CLI flag that evaluates optimization candidates in isolated git worktrees concurrently via a WorktreePool
  • Behavioral tests and performance benchmarks run in parallel per-candidate, with pass/fail gating before benchmarking
  • Refinement and repair are dispatched immediately via the existing ThreadPoolExecutor (no async client needed)
  • Line profiler runs on the winning candidate after selection
  • Includes async worktree subprocess execution, instrumented test file copying, and XML result parsing in worktrees

Key design decisions

  • Worktree isolation: Each candidate gets its own worktree slot — no shared file state between concurrent evaluations
  • pass_fail_only=True: Parallel path compares pass/fail status only (return values are stored in SQLite relative to main tree). Serial path handles deeper comparison if needed
  • Line profiler after selection: Only the winner gets line-profiled (requires writing to main tree for @profile instrumentation)
  • ThreadPoolExecutor for refinement/repair: Integrates naturally with CandidateProcessor's concurrent.futures.Future expectations

Test plan

  • End-to-end validation on topological_sort.py with --parallel-candidates 4
  • Unit tests for WorktreePool and async subprocess execution
  • Integration test for parallel evaluator
  • CI green on this branch

@KRRT7 KRRT7 force-pushed the codeflash-agent branch 2 times, most recently from e68f7e2 to 9677b56 Compare May 7, 2026 00:25
KRRT7 added 8 commits May 6, 2026 19:28
Move OptimizedCandidateSource and BatchRefiner models from models.py
to shared_types.py to avoid a circular dependency between models.py
and function_optimizer.py.
Pool of N git worktree slots with async acquire/release semantics.
Each slot provides write_candidate() and mirror() for file isolation.
Add async_execute_test_subprocess() for running pytest via
asyncio.create_subprocess_exec with stdout/stderr capture and timeout.

Add --parallel-candidates N CLI argument. Add anyio dependency.
Phase 1 (concurrent): behavioral correctness tests run in parallel.
  Failed candidates release their worktree slot immediately.
Phase 2 (sequential): only passing candidates get benchmarked, one
  at a time, for accurate timing without CPU contention.

EvalFailure carries test diffs for repair context.
Adds the API method for submitting multiple candidates for
refinement in a single request — used by the parallel evaluator
to dispatch refinement/repair after evaluation completes.
Line profiler needs @Profile instrumented in the main tree, so it
must run after candidate selection rather than inside the worktree.
This method handles write → profile → restore for the parallel path.

Also adds # mypy: ignore-errors — this file has 181 pre-existing
mypy errors unrelated to this PR.
Wires the parallel evaluation path into _evaluate_candidates:
- Checks --parallel-candidates flag to branch between sequential/parallel
- Batches candidates with dedup/normalization gating
- Dispatches repair and refinement futures from evaluation results
- Calls _run_line_profiler_for_winner after selection

New methods: _evaluate_candidates_parallel, _dispatch_refinement,
_dispatch_repair_if_possible.
Covers the full stack: pool lifecycle/cleanup, file isolation between
slots, subprocess stdout/stderr/timeout, and evaluator logic (failure
with diffs, success routing, concurrent multi-candidate).
@KRRT7 KRRT7 force-pushed the codeflash-agent branch from 9677b56 to 96fd1ca Compare May 7, 2026 00:32
KRRT7 added 3 commits May 6, 2026 19:46
… evaluator

Critical fixes from code review:
- Deadlock: slots are now released after behavioral tests (Phase 1),
  re-acquired for benchmarking (Phase 2). Previously, holding slots
  across phases caused deadlock when passes >= pool_size.
- Pydantic ValidationError: behavior_test_results is now stored in
  _BehavioralPass and passed through to OptimizedCandidateResult.
- Slot leak on cancellation: catch BaseException in _behavioral_phase.

WorktreePool improvements:
- Graceful partial creation failure (one slot failing doesn't crash pool).
- Cleanup resilience (one rmtree failure doesn't abort others).
- Stream lifecycle: close send/receive in cleanup().
- Async-safe: use anyio.Path for exists() checks.
- Python 3.12+: use onexc instead of deprecated onerror for rmtree.
- Remove dead code: PID file, unused restore_file method.

Other fixes:
- _run_line_profiler_for_winner: catch all exceptions.
- _dispatch_repair_if_possible: skip when diffs are empty.
- aiservice.py: pass language to _get_valid_candidates in batch path.
- Remove unused AIServiceBatchRefinerRequest dataclass.
- Fix result file path collision: include slot.index in filename.
- Remove _code_replace_lock (no longer needed since slots are released
  immediately and _replace_and_capture is serialized by GIL).
…ession test

- Parallel path now checks if a successful candidate was previously
  refined (via path_to_root ancestry). If so, dispatches adaptive
  optimization instead of batch refinement — matching sequential behavior.
- Adds regression test: 6 candidates with pool_size=2 all pass, proving
  no deadlock occurs when passes exceed available slots.
- Add replace_lock to serialize main-tree access in _replace_and_capture
- Fix Phase 2 benchmark not writing candidate code to fresh worktree slot
- Add _closed flag and ClosedResourceError suppression in pool release
- Broaden exception handling and protect finally restore block
- Remove unused eval_ctx/exp_type params from run_parallel_evaluation
- Add tests for re-staging, partial pool init, restore-on-failure, empty candidates
@KRRT7
Copy link
Copy Markdown
Collaborator Author

KRRT7 commented May 7, 2026

superseded by stacked PRs #2124#2125#2126#2127

@KRRT7 KRRT7 closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant