Skip to content

Commit be3cb5d

Browse files
committed
Update Muon blog future plan: mark ZeRO stage 3 and Gram NS as done
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
1 parent 7abc53a commit be3cb5d

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

blogs/muon-optimizer/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -95,10 +95,10 @@ Muon reduces per-GPU memory by approximately 3 GiB (9%) compared to AdamW. The
9595
## Future plan
9696
Muon optimizer is getting more and more attention, and is verified by production-level open LLM model such as Kimi-K2 which has 1T weights. This makes Muon a strong second choice and a potential replacement of Adam optimizer. To make Muon optimizer more accessible in production environment, the following features are needed:
9797

98-
- [ ] Muon optimizer with ZeRO stage 3
98+
- [x] Muon optimizer with ZeRO stage 3
9999
- [ ] CPU Offloading support
100100
- [ ] MuonClip support
101-
- [ ] Performance optimization to make Muon optimizer more efficient
101+
- [x] Performance optimization with Gram-Schmidt based Newton-Schulz iteration (in review)
102102

103103
If you have thoughts, feedback and contribution on Muon optimizer, welcome to start an issue for discussion, or submit a PR to DeepSpeed. Let’s make Muon optimizer rock solid and lightning fast in DeepSpeed!
104104

0 commit comments

Comments
 (0)