Skip to content

Commit df582dd

Browse files
author
Mark Saroufim
committed
Fix moe-mxfp4 check_implementation: avoid cloned weight comparison
aiter's fused_moe produces different results when weight tensors are cloned (same values, different memory). The eval harness clones data before passing to the submission, so comparing cloned-weight output against original-weight output always fails. Since fused_moe doesn't mutate inputs, we use a custom check_implementation that compares the submission output against a fresh ref_kernel run on the original (un-cloned) data.
1 parent 55af3b9 commit df582dd

1 file changed

Lines changed: 13 additions & 2 deletions

File tree

problems/amd_202602/moe-mxfp4/reference.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from utils import make_match_reference
1+
from utils import make_match_reference, verbose_allclose
22
from task import input_t, output_t
33
import torch
44
import torch.nn.functional as F
@@ -296,4 +296,15 @@ def ref_kernel(data: input_t) -> output_t:
296296

297297

298298

299-
check_implementation = make_match_reference(ref_kernel, rtol=5e-2, atol=5e-2)
299+
def check_implementation(data, submission_output):
300+
"""
301+
Custom check that re-runs ref_kernel on the ORIGINAL (un-cloned) data.
302+
303+
aiter's fused_moe is sensitive to weight tensor memory layout — cloned
304+
weight tensors (as produced by the eval harness's _clone_data) yield
305+
different results even though the values are identical. Because fused_moe
306+
does NOT mutate its inputs, comparing the submission output against a fresh
307+
ref_kernel run on the same data object is safe and correct.
308+
"""
309+
expected = ref_kernel(data)
310+
return verbose_allclose(submission_output, expected, rtol=5e-2, atol=5e-2)

0 commit comments

Comments
 (0)