[WIP] feat: support MLA and refactor MHA by Chamberlain0w0 · Pull Request #163 · InfiniTensor/InfiniTrain

Chamberlain0w0 · 2026-05-29T06:24:29Z

修改目前 MHA 实现
a. 原来的 TransformerConfig::attention_type = kStandard / kRoPE 不太合适，Megatron 及其他开源实现中通常把 attn_type 分为 self/cross。这块命名更改为 Megatron 中使用的 --position-embedding-type，可选值为 learned_absolute / rope / yarn / mrope / relative / none。相应地修改创建 WPE/apply rope 的相关条件判断。
b. 删除了 CausalSelfAttention::ForwardStandard 和 ForwardWithRoPE 两条分支，合并成一个统一的 Forward。GQA 也被纳入统一路径。
c. ApplyRotaryEmbedding 从 CausalSelfAttention 成员函数提到了 transformer utils.cc
d. causal mask buffer 现在无论 learned absolute 还是 RoPE 都会初始化；如果外部没有传 mask，会 fallback 到内部 causal mask。这个对 RoPE 直接调用且不传 mask 的场景是一个小的行为统一。
添加 MLA Module
--TODO--

…ing_type

Chamberlain0w0 added 3 commits May 28, 2026 03:08

feat: add MLASelfAttention Module

9de7f8f

feat: support q_lora/non-q_lora and tp/non-tp variations

87ca357

fix: move mla args into TransformerConfig

dd18b35

Chamberlain0w0 changed the title ~~[WIP] feat: support MLA~~ [WIP] feat: support MLA and refactor MHA Jun 2, 2026

refactor: merge 2 MHA paths, rename attention_type to position_embedd…

937a71c

…ing_type

Provide feedback