Multiply dense contraction results directly into the destination#185
Merged
Conversation
259dbbb to
e545d5e
Compare
Matricize the destination and multiply the gemm result into it in place with `mul!`, writing a detached copy back with `unmatricize!` only when the matricized destination does not alias it (a runtime `Base.mightalias` check, so the view-or-copy decision stays correct for every fusion style by observing what `matricize` actually returned). On aligned and transposed outputs this drops the product temporary and the permuted scatter. With the scatter gone, the `unmatricizeadd!` interface it relied on has no caller and is removed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
e545d5e to
67fa053
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #185 +/- ##
==========================================
+ Coverage 79.10% 79.54% +0.44%
==========================================
Files 20 20
Lines 670 665 -5
==========================================
- Hits 530 529 -1
+ Misses 140 136 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Multiplies the dense contraction result straight into its destination instead of through a separate temporary. The matricize-then-gemm path used to allocate the matrix product into a fresh array and then permute-and-accumulate it into the destination. It now matricizes the destination, multiplies into it in place with
mul!, and writes back only when the matricized destination is a detached copy.That decision is a runtime
Base.mightaliascheck on the result. Whethermatricizereturns a view or a copy depends on the fusion style, the array type, and the permutation, so the alias check observes whatmatricizeactually returned and stays correct for every fusion style without having to predict it. An aligned or transposed dense output matricizes to a view, andmul!writes through it with no product temporary. A permuted output, or any graded gather, matricizes to a fresh copy seeded with the destination's current contents, so theβaccumulation rides on themul!and the write-back is a plain overwrite.For aligned and transposed contractions this removes both the product temporary and the scatter, dropping allocation to roughly the result alone. A square matmul at bond dimension 64 falls from about 71 KiB to 38.5 KiB per call. Genuinely permuted outputs are unchanged.
With the scatter gone, the
unmatricizeadd!interface it relied on has no remaining caller and is removed. It was an unexported internal name, so this is non-breaking.This builds on the maybe-view
matricizefrom #183, which is what letsmul!write through an aligned or transposed destination.