Skip to content

Feat/support moe#153

Open
kilinchange wants to merge 11 commits into
masterfrom
feat/support_moe
Open

Feat/support moe#153
kilinchange wants to merge 11 commits into
masterfrom
feat/support_moe

Conversation

@kilinchange
Copy link
Copy Markdown
Collaborator

@kilinchange kilinchange commented May 18, 2026

引入 Megatron 风格的 MoE 基础训练路径,并提供 tiny_mixtral 端到端验证样例。打通单卡 MoE 训练流程,为后续 AllToAll EP、Grouped GEMM expert、真实 Mixtral 权重转换做基础铺垫。

主要修改

  1. 新增 MoE 结构相关模块:
  • TopKRouter 模块:logits -> score_function -> topk -> scatter,输出 dense routing_probs 和 bool routing_map,支持 top-k routing、top-k 权重归一化、scaling factor 等配置
  • Experts 模块:专家计算层
    • SequentialMLP:最直接的基线实现,逐个 expert 顺序执行
  • MoETokenDispatcher 模块
    • MoEAllGatherTokenDispatcher:单卡退化路径
  • MoELayer 模块:组装 TopKRouter、Token Dispatcher、experts 三个模块,替换原有 dense 网络结构的 MLP 层计算
  1. 新增 tiny_mixtral 示例:
  • 文件结构对齐 example/llama3
  • 支持从 LLMC .bin 加载权重
  • 使用真实 token 数据训练
  • config 尽量贴近 Mixtral 8x7B,未对齐字段保留简短注释说明
  1. 测试和验证:
  • 增加 MoE layer/top-k 相关架构测试
  • 集成 tiny_mixtral 到测试/profile 流程
  • 增加 Megatron tiny Mixtral 对齐脚本(待合入测试仓库)和日志留档,用于确认 loss 对齐

@kilinchange kilinchange force-pushed the feat/support_moe branch 3 times, most recently from a36393d to f0835b2 Compare May 26, 2026 12:59
@kilinchange kilinchange force-pushed the feat/support_moe branch 5 times, most recently from b23e9c9 to dc65b98 Compare June 2, 2026 07:44
@kilinchange kilinchange changed the title [WIP] Feat/support moe Feat/support moe Jun 4, 2026
Comment thread scripts/test_config.json
"cmd": "cmake -DUSE_CUDA=ON -DUSE_NCCL=ON -DPROFILE_MODE=ON .. && make -j"
}
],
"test_groups": [
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moe 放 test_groups 下的 tag,.sh 里做筛选。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant