Skip to content

Commit 4d8288a

Browse files
authored
Add local eval mode and switch to CUDA graph capture/replay (#123)
- Add _do_bench_cudagraph() for stable kernel timing using captured CUDA graphs with L2 cache clearing and overhead subtraction - Add _copy_data_inplace() to feed new inputs into graph buffers without recapturing, used during recheck correctness passes - Capture kernels in CUDA graphs during testing (_run_single_test) to validate that submissions are graph-capturable - Add run_local() for local eval without Popcorn infrastructure (usage: python eval.py <mode> <problem_dir>) - Defer imports of problem-directory modules (reference, submission, utils, task) to runtime instead of module level
1 parent 8307b17 commit 4d8288a

1 file changed

Lines changed: 273 additions & 74 deletions

File tree

0 commit comments

Comments
 (0)