Commit 4d8288a
authored
Add local eval mode and switch to CUDA graph capture/replay (#123)
- Add _do_bench_cudagraph() for stable kernel timing using captured
CUDA graphs with L2 cache clearing and overhead subtraction
- Add _copy_data_inplace() to feed new inputs into graph buffers
without recapturing, used during recheck correctness passes
- Capture kernels in CUDA graphs during testing (_run_single_test)
to validate that submissions are graph-capturable
- Add run_local() for local eval without Popcorn infrastructure
(usage: python eval.py <mode> <problem_dir>)
- Defer imports of problem-directory modules (reference, submission,
utils, task) to runtime instead of module level1 parent 8307b17 commit 4d8288a
1 file changed
Lines changed: 273 additions & 74 deletions
0 commit comments