Update Muon blog future plan: mark ZeRO stage 3 and Gram NS as done

delock · delock · commit be3cb5d3b99b · 2026-04-08T23:56:54.000-07:00
Signed-off-by: Ma, Guokai &lt;guokai.ma@gmail.com&gt;
diff --git a/blogs/muon-optimizer/README.md b/blogs/muon-optimizer/README.md
@@ -95,10 +95,10 @@ Muon reduces per-GPU memory by approximately 3 GiB (9%) compared to AdamW.  The
 ## Future plan
 Muon optimizer is getting more and more attention, and is verified by production-level open LLM model such as Kimi-K2 which has 1T weights.  This makes Muon a strong second choice and a potential replacement of Adam optimizer.   To make Muon optimizer more accessible in production environment, the following features are needed:
 
-- [ ] Muon optimizer with ZeRO stage 3
+- [x] Muon optimizer with ZeRO stage 3
 - [ ] CPU Offloading support
 - [ ] MuonClip support
-- [ ] Performance optimization to make Muon optimizer more efficient
+- [x] Performance optimization with Gram-Schmidt based Newton-Schulz iteration (in review)
 
 If you have thoughts, feedback and contribution on Muon optimizer, welcome to start an issue for discussion, or submit a PR to DeepSpeed.  Let’s make Muon optimizer rock solid and lightning fast in DeepSpeed!