Clarify Princeton cross-entropy constraints

msaroufim · msaroufim · commit 575362ac1a27 · 2026-04-05T13:29:02.000-07:00
diff --git a/problems/princeton/cross_entropy_py/submission.py b/problems/princeton/cross_entropy_py/submission.py
@@ -1,8 +1,16 @@
+#!POPCORN leaderboard princeton_cross_entropy
+
 """
 Baseline submission for the cross-entropy problem.
 
 Replace these functions with a faster implementation.
 
+The evaluator uses:
+- B = 4096
+- V in {32000, 50264, 128256}
+- V % 8 == 0
+- finite real-valued logits (no masking with -inf)
+
 Example local bandwidth calculation for the three ranked shapes:
 
     def print_max_bw(batch_size, vocab_size, combined_ms):
diff --git a/problems/princeton/cross_entropy_py/task.yml b/problems/princeton/cross_entropy_py/task.yml
@@ -13,19 +13,27 @@ description: |
   - cross_entropy_backward(logits, targets, grad_output) -> grad_logits
 
   Inputs:
-  - logits: torch.bfloat16 tensor of shape (B, V)
+  - logits: torch.bfloat16 tensor of real-valued, finite logits with shape (B, V)
   - targets: torch.int64 tensor of shape (B,)
   - grad_output: torch.float32 tensor of shape (B,)
 
   Outputs:
   - forward output: torch.float32 tensor of shape (B,)
   - backward output: torch.bfloat16 tensor of shape (B, V)
 
+  Assumptions used by the evaluator and benchmark:
+  - batch size is fixed at B = 4096
+  - vocab sizes are V in {32000, 50264, 128256}
+  - vocab size is guaranteed to be divisible by 8
+  - logits are ordinary real numbers; masked values such as -inf are not used
+
 config:
   main: "eval.py"
 
 tests:
   - {"vocab_size": 32000}
+  - {"vocab_size": 50264}
+  - {"vocab_size": 128256}
 
 benchmarks:
   - {"vocab_size": 32000}