Skip to content

issue/1167 - feat: add flash-attn via MooreThreads/mate for moore gpu#1168

Open
spike-zhu wants to merge 1 commit into
mainfrom
issue/1167
Open

issue/1167 - feat: add flash-attn via MooreThreads/mate for moore gpu#1168
spike-zhu wants to merge 1 commit into
mainfrom
issue/1167

Conversation

@spike-zhu
Copy link
Copy Markdown
Contributor

@spike-zhu spike-zhu commented May 21, 2026

摩尔 flash-attn 的支持,依赖开源摩尔 mate(https://github.com/MooreThreads/mate) v0.1.3 版本。

算子测试:
image

部分 case 性能变化见:InfiniTensor/InfiniLM#395

@spike-zhu spike-zhu requested a review from a team May 21, 2026 08:01
@spike-zhu spike-zhu force-pushed the issue/1167 branch 3 times, most recently from 549a616 to cb6a99b Compare May 21, 2026 08:23
@spike-zhu spike-zhu self-assigned this May 21, 2026
@spike-zhu spike-zhu force-pushed the issue/1167 branch 2 times, most recently from 11cd394 to 6079550 Compare May 22, 2026 04:29
Comment thread third_party/mate
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

平台限定的东西最好不要轻易加submodule

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

平台限定的东西最好不要轻易加submodule

如果暂时不加 submodule,第三方仓库先通过 README 的形式说明依赖和拉取方式,这个方法是否合适?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

平台限定的东西最好不要轻易加submodule

如果暂时不加 submodule,第三方仓库先通过 README 的形式说明依赖和拉取方式,这个方法是否合适?

可以,之前nv是这么做的。或者你上次好像说了一个默认不拉,手动指定才拉的方式应该也行。

Comment thread xmake.lua

-- Moore mate: enable Python bridge macro for flash-attn Moore path
if has_config("moore-gpu") and has_config("aten") then
add_defines("ENABLE_MOORE_MATE_FLASH_ATTN")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里感觉略显草率

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里感觉略显草率

草率是指 ENABLE_MOORE_MATE_FLASH_ATTN 宏命名吗?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里感觉略显草率

草率是指 ENABLE_MOORE_MATE_FLASH_ATTN 宏命名吗?

我觉得,
一方面好像应该是要求了摩尔和flash attention,才会需要编摩尔的flash attention。
现在是只要要求摩尔和aten就编flash attention,逻辑上有点粗暴。

另外命名上,是否需要一个MOORE_MATE_FLASH_ATTN的命名?还是有现有的摩尔和flash attention就够了?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里感觉略显草率

草率是指 ENABLE_MOORE_MATE_FLASH_ATTN 宏命名吗?

我觉得, 一方面好像应该是要求了摩尔和flash attention,才会需要编摩尔的flash attention。 现在是只要要求摩尔和aten就编flash attention,逻辑上有点粗暴。

另外命名上,是否需要一个MOORE_MATE_FLASH_ATTN的命名?还是有现有的摩尔和flash attention就够了?

关于“我觉得, 一方面好像应该是要求了摩尔和flash attention,才会需要编摩尔的flash attention。 现在是只要要求摩尔和aten就编flash attention,逻辑上有点粗暴。”

回复:已在 xmake 中增加 flash-attn=y 判断,现在需要同时开启 moore-gpu、aten 和 flash-attn,才会编译 Moore 的 flash-attn 路径,避免只开 moore-gpu + aten 就默认编译,解决原先逻辑较粗的问题。

关于“另外命名上,是否需要一个MOORE_MATE_FLASH_ATTN的命名?还是有现有的摩尔和flash attention就够了?”

回复:这里保留 MOORE_MATE_FLASH_ATTN 是为了区分 Moore MATE 的特殊实现路径。当前 Moore 并不是直接复用现有 flash-attn C++ 接口,而是通过 MATE/Python wrapper 调用,并额外依赖 pybind11、MUSA stream 相关处理等;因此需要单独的宏隔离 Moore 特有逻辑,避免影响其他平台。

#endif

#if defined(ENABLE_MOORE_MATE_FLASH_ATTN)
#include "infinicore/adaptor/aten_adaptor.hpp"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里别人都不需要,摩尔确认需要么?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里别人都不需要,摩尔确认需要么?

这个是需要的。Moore MATE 这条路径里增加了 LocalMUSAStreamGuard,其中直接使用了 c10::musa::MUSAStream、getCurrentMUSAStream() 和 setCurrentMUSAStream(),因此当前编译单元需要引入 MUSA stream 相关声明。aten_adaptor.hpp 在 ENABLE_MOORE_API 下会间接包含 <c10/musa/MUSAStream.h>,去掉后这里会编译不过。

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦,这个是因为别人在这里又引用了一遍c10/cuda/CUDAGuard.h,麻烦看一眼怎样合理然后统一一下?

或者如果觉得应该换个pr改合理,那也可以先按nv和沐曦一样的做法写。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦,这个是因为别人在这里又引用了一遍c10/cuda/CUDAGuard.h,麻烦看一眼怎样合理然后统一一下?

或者如果觉得应该换个pr改合理,那也可以先按nv和沐曦一样的做法写。

当前其他平台在 mha_kvcache_flashattn.cc 中通过 flash_attention_adaptor.hpp 间接引入 aten_adaptor.hpp(需要使用 get_cuda_stream 等 aten_adaptor.hpp 等内容); Moore MATE 路径使用的是独立的 ENABLE_MOORE_MATE_FLASH_ATTN 宏,不会经过 ENABLE_FLASH_ATTN 对应的 include 链路。当前在 ENABLE_MOORE_MATE_FLASH_ATTN 直接引入 aten_adaptor.hpp 是改动较为合适的一种方式

Comment thread .gitmodules
@spike-zhu spike-zhu force-pushed the issue/1167 branch 3 times, most recently from bfc8b82 to 0c37636 Compare May 28, 2026 10:14
@spike-zhu spike-zhu requested a review from wooway777 May 28, 2026 10:41
Copy link
Copy Markdown
Collaborator

@wooway777 wooway777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

完全rebase g了。请确认改动重新rebase

@spike-zhu
Copy link
Copy Markdown
Contributor Author

完全rebase g了。请确认改动重新rebase

这个 rebase g 了啥意思?我基于 origin/main 重新 rebase 后,文件似乎没有变动

附 rebase 图:
image

@spike-zhu spike-zhu requested a review from wooway777 June 5, 2026 12:51
@wooway777
Copy link
Copy Markdown
Collaborator

完全rebase g了。请确认改动重新rebase

这个 rebase g 了啥意思?我基于 origin/main 重新 rebase 后,文件似乎没有变动

附 rebase 图: image

我的问题,把你force push的内容当成pr改动了 (:з」∠)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants