Skip to content

Multiply dense contraction results directly into the destination#185

Merged
mtfishman merged 1 commit into
mainfrom
mf/mul-into-dest
Jun 27, 2026
Merged

Multiply dense contraction results directly into the destination#185
mtfishman merged 1 commit into
mainfrom
mf/mul-into-dest

Conversation

@mtfishman

@mtfishman mtfishman commented Jun 27, 2026

Copy link
Copy Markdown
Member

Summary

Multiplies the dense contraction result straight into its destination instead of through a separate temporary. The matricize-then-gemm path used to allocate the matrix product into a fresh array and then permute-and-accumulate it into the destination. It now matricizes the destination, multiplies into it in place with mul!, and writes back only when the matricized destination is a detached copy.

That decision is a runtime Base.mightalias check on the result. Whether matricize returns a view or a copy depends on the fusion style, the array type, and the permutation, so the alias check observes what matricize actually returned and stays correct for every fusion style without having to predict it. An aligned or transposed dense output matricizes to a view, and mul! writes through it with no product temporary. A permuted output, or any graded gather, matricizes to a fresh copy seeded with the destination's current contents, so the β accumulation rides on the mul! and the write-back is a plain overwrite.

For aligned and transposed contractions this removes both the product temporary and the scatter, dropping allocation to roughly the result alone. A square matmul at bond dimension 64 falls from about 71 KiB to 38.5 KiB per call. Genuinely permuted outputs are unchanged.

With the scatter gone, the unmatricizeadd! interface it relied on has no remaining caller and is removed. It was an unexported internal name, so this is non-breaking.

This builds on the maybe-view matricize from #183, which is what lets mul! write through an aligned or transposed destination.

Matricize the destination and multiply the gemm result into it in place with
`mul!`, writing a detached copy back with `unmatricize!` only when the matricized
destination does not alias it (a runtime `Base.mightalias` check, so the
view-or-copy decision stays correct for every fusion style by observing what
`matricize` actually returned). On aligned and transposed outputs this drops the
product temporary and the permuted scatter. With the scatter gone, the
`unmatricizeadd!` interface it relied on has no caller and is removed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.54%. Comparing base (8137898) to head (67fa053).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #185      +/-   ##
==========================================
+ Coverage   79.10%   79.54%   +0.44%     
==========================================
  Files          20       20              
  Lines         670      665       -5     
==========================================
- Hits          530      529       -1     
+ Misses        140      136       -4     
Flag Coverage Δ
docs 29.69% <80.00%> (-0.73%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mtfishman mtfishman merged commit 8324c9d into main Jun 27, 2026
26 checks passed
@mtfishman mtfishman deleted the mf/mul-into-dest branch June 27, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant