feat: route infiniop gemm through InfiniOps#1197
Open
voltjia wants to merge 10 commits into
Open
Conversation
….py\` InfiniOps NVIDIA device implementations include \`.cuh\` headers with CUDA syntax, so the generated operator files must be compiled with \`nvcc\`. Change the sync script to output \`.cu\` files and remove old \`.cc\` files to prevent duplicate definitions.
- Add \`infiniops\` option for specifying the InfiniOps project root - Add InfiniOps include path and source files to the \`infiniop\` target - Compile \`.cu\` operator files with \`nvcc\` on NVIDIA builds, or as plain C++ on non-NVIDIA builds (CUDA includes are \`#ifdef\`-guarded) - Suppress \`-Wunused-but-set-variable\` for NVIDIA target
Move the InfiniOps sync from `before_build` to `on_load` in the `infiniop-nvidia` target so that generated `.cu` files exist before xmake resolves file lists. The sync now stubs the original `operator.cc` with a comment instead of deleting it, preventing duplicate symbols while keeping the glob pattern `src/infiniop/ops/*/operator.cc` valid for non-synced operators.
…ary targets
The `infiniop` target is `set_kind("shared")`, so xmake ignores
`add_ldflags` during linking. Switch to `add_shflags` with
`--no-as-needed` so the GNU linker keeps `libinfiniops.so` in the
`NEEDED` list even when no direct symbol references exist in
`infiniop`'s own object files.
Change `infiniop-nvidia`, `infinirt-nvidia`, `infiniccl-nvidia` from static to shared libraries so that `nvcc` performs proper CUDA device linking within each `.so`. When these were static archives, `g++` linked them into downstream shared libraries without device linking, corrupting `.nv_fatbin` registration and causing segfaults in `__cudaRegisterLinkedBinary` during `dlopen`. Also replace no-op `on_install` with proper `set_installdir` for all four NVIDIA targets (including `flash-attn-nvidia`).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
submodules/InfiniOpsas the InfiniOps submodule location.infiniopGEMM C ABI shim through InfiniOps C++Operator<infini::ops::Gemm>::Call.libinfiniops.sofrom the submodule build and restrict the trial InfiniOps wrapper compilation to GEMM only.infinioptarget builds on the remote NVIDIA host.Validation
On
nvidia:~/InfiniCore-infiniops:Result:
xmake build -y infiniopcompleted successfully withbuild ok, spent 108.409s.Also verified
build/linux/x86_64/release/libinfiniop.soresolveslibinfiniops.sofromsubmodules/InfiniOps/build/src/libinfiniops.so.