Add tile_m/tile_k/tile_n overrides to SwiGLUPrefill#106
Open
albiol2004 wants to merge 1 commit intoamd:develfrom
Open
Add tile_m/tile_k/tile_n overrides to SwiGLUPrefill#106albiol2004 wants to merge 1 commit intoamd:develfrom
albiol2004 wants to merge 1 commit intoamd:develfrom
Conversation
SwiGLUPrefill currently uses GEMM's default tile triple (64/64/64), which forces min_M = tile_m * num_aie_rows = 256. Real-world prefill batch sizes from decoder-model runtimes (llama.cpp ubatch=32/64/128) fall well below that threshold, leaving the fused SwiGLU path unreachable in practice. Add optional tile_m/tile_k/tile_n kwargs that pass through to both inner GEMMs. When None (default), each falls back to GEMM's native default, so existing callers and the existing (256, 2048, 2048) test are unchanged. Add a small-M test case (M=64, K=1024, N=3584, tile_m=16) that exercises the override path at the Qwen3.5-0.8B FFN shape.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
tile_m/tile_k/tile_nkwargs toSwiGLUPrefill, threading them through to the two innerGEMMoperators.SwiGLUPrefillcurrently uses GEMM's default tile triple (64/64/64), which forcesmin_M = tile_m * 4 = 256. Real prefill batches from decoder-model runtimes (llama.cpp ubatch=32/64/128) fall below that threshold, so the fused SwiGLU path is unreachable in practice for the M range it was designed for. Passingtile_m=16dropsmin_Mto 64.Added
tile_m/tile_k/tile_nparameters onSwiGLUPrefill.__init__(defaultNone) that pass through to both inner GEMMs. Both stages receive the same tile triple.(seq_len=64, embedding_dim=1024, hidden_dim=3584, tile_m=16, tile_k=64, tile_n=64)covering a decode-runtime-sized prefill at the Qwen3.5-0.8B FFN shape. Existing(256, 2048, 2048)case unchanged.Changed
tile_m/tile_k/tile_nisNone, the corresponding kwarg is omitted from the GEMM constructor call, preserving the previous behavior for existing callers.Removed
PR Merge Checklist
develcommit and pointing todevel.