Skip to content

Commit 8d9433e

Browse files
committed
Add training configuration caption to convergence chart
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
1 parent 7c57422 commit 8d9433e

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

blogs/muon-optimizer/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ cd deepspeed_finetune_demo
2323
We compared Muon optimizer with AdamW optimizer by finetuning a Qwen2.5-3B model on the tatsu-lab/alpaca dataset with the same learning rate.
2424

2525
![Muon vs AdamW convergence on Qwen2.5-3B](images/adamw_vs_muon_3b.png)
26+
*Training configuration: Qwen2.5-3B, tatsu-lab/alpaca dataset, ZeRO Stage 2, bf16, batch_size=8, lr=2e-5, 1 epoch, 8 GPUs.*
2627

2728
In one epoch, Muon optimizer achieved approximately 19% lower loss compared to AdamW optimizer. Moreover, Muon optimizer did not show overfitting while AdamW optimizer exhibited overfitting behavior.
2829

0 commit comments

Comments
 (0)