Skip to content

feat: route infiniop gemm through InfiniOps#1197

Open
voltjia wants to merge 10 commits into
mainfrom
refactor/infiniops-shared-lib
Open

feat: route infiniop gemm through InfiniOps#1197
voltjia wants to merge 10 commits into
mainfrom
refactor/infiniops-shared-lib

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented Jun 4, 2026

Summary

  • Add submodules/InfiniOps as the InfiniOps submodule location.
  • Route the legacy infiniop GEMM C ABI shim through InfiniOps C++ Operator<infini::ops::Gemm>::Call.
  • Link libinfiniops.so from the submodule build and restrict the trial InfiniOps wrapper compilation to GEMM only.
  • Fix NVIDIA/CUDA include discovery needed for full infiniop target builds on the remote NVIDIA host.

Validation

On nvidia:~/InfiniCore-infiniops:

cmake -S submodules/InfiniOps -B submodules/InfiniOps/build \
  -DPython_EXECUTABLE=/home/huangjiacheng/.venv/bin/python \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc \
  -DWITH_CPU=ON -DWITH_NVIDIA=ON -DWITH_TORCH=ON \
  -DGENERATE_CPP_OPERATOR_API=OFF -DGENERATE_PYTHON_BINDINGS=OFF
cmake --build submodules/InfiniOps/build --target infiniops --parallel 8
PATH=/usr/local/cuda-13.0/bin:/home/huangjiacheng/.local/bin:$PATH \
  ~/.local/bin/xmake f -y --require=no --nv-gpu=y --cpu=y \
  --cuda_arch=sm_80 --cu=/usr/local/cuda-13.0/bin/nvcc
PATH=/usr/local/cuda-13.0/bin:/home/huangjiacheng/.local/bin:$PATH \
  ~/.local/bin/xmake build -y infiniop

Result: xmake build -y infiniop completed successfully with build ok, spent 108.409s.

Also verified build/linux/x86_64/release/libinfiniop.so resolves libinfiniops.so from submodules/InfiniOps/build/src/libinfiniops.so.

voltjia added 10 commits April 10, 2026 09:19
….py\`

InfiniOps NVIDIA device implementations include \`.cuh\` headers with CUDA
syntax, so the generated operator files must be compiled with \`nvcc\`.
Change the sync script to output \`.cu\` files and remove old \`.cc\` files
to prevent duplicate definitions.
- Add \`infiniops\` option for specifying the InfiniOps project root
- Add InfiniOps include path and source files to the \`infiniop\` target
- Compile \`.cu\` operator files with \`nvcc\` on NVIDIA builds, or as
  plain C++ on non-NVIDIA builds (CUDA includes are \`#ifdef\`-guarded)
- Suppress \`-Wunused-but-set-variable\` for NVIDIA target
Move the InfiniOps sync from `before_build` to `on_load` in the
`infiniop-nvidia` target so that generated `.cu` files exist before
xmake resolves file lists. The sync now stubs the original `operator.cc`
with a comment instead of deleting it, preventing duplicate symbols
while keeping the glob pattern `src/infiniop/ops/*/operator.cc` valid
for non-synced operators.
…ary targets

The `infiniop` target is `set_kind("shared")`, so xmake ignores
`add_ldflags` during linking. Switch to `add_shflags` with
`--no-as-needed` so the GNU linker keeps `libinfiniops.so` in the
`NEEDED` list even when no direct symbol references exist in
`infiniop`'s own object files.
Change `infiniop-nvidia`, `infinirt-nvidia`, `infiniccl-nvidia` from
static to shared libraries so that `nvcc` performs proper CUDA device
linking within each `.so`. When these were static archives, `g++`
linked them into downstream shared libraries without device linking,
corrupting `.nv_fatbin` registration and causing segfaults in
`__cudaRegisterLinkedBinary` during `dlopen`.

Also replace no-op `on_install` with proper `set_installdir` for all
four NVIDIA targets (including `flash-attn-nvidia`).
@voltjia voltjia requested a review from a team June 4, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant