[BUG] 修复 2025 秋季算子比赛 T1-1 NVIDIA 测试失败/不稳定算子

**版本**
0.1.0

**问题描述**
在 NVIDIA 后端验证 2025 年秋季算子比赛 T1-1 相关算子时，发现部分 `test/infinicore` 算子存在单测失败或批量测试不稳定问题，影响全量算子测试通过。

涉及算子如下：

单测失败：

* `fmod`，T1-1-9
* `logdet`，T1-1-37
* `upsample_nearest`，T1-1-49
* `logical_and`，T1-1-50
* `logical_not`，T1-1-50

批量测试中出现过不稳定：

* `addbmm`，T1-1-5
* `gaussian_nll_loss`，T1-1-30
* `index_copy`，T1-1-40

初步定位结果：

* `index_copy`：测试随机 index 可能重复，导致相同位置重复写入，CUDA 并行写入顺序与 PyTorch 顺序行为不一致，结果不稳定。
* `fmod`：float16 divisor 可能为 0，导致 PyTorch 和 InfiniCore 都返回 NaN，但比较器未启用 `equal_nan`，因此比对失败。
* `logdet`：随机矩阵 determinant 可能为负，PyTorch 和 InfiniCore 都返回 NaN，但比较器未启用 `equal_nan`，因此比对失败。
* `upsample_nearest`：`interpolate(mode="nearest")` 的 NVIDIA 1D nearest 路径未走已有 `upsample_nearest` 实现。
* `logical_and`：CUDA 分支调用了当前 `ntops.torch` 中不存在的 `logical_and`。
* `logical_not`：CUDA 分支调用了当前 `ntops.torch` 中不存在的 `logical_not`，并且直接写 `out=input` 时存在 alias 问题。
* `addbmm`：批量测试中出现过不稳定，但单算子连续测试未复现。
* `gaussian_nll_loss`：批量测试中出现过不稳定，但单算子连续测试未复现。

**如何复现**

单算子复现命令：

```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=1 PYTHONFAULTHANDLER=1 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python test/infinicore/run.py --nvidia --ops <op>
```

例如：

```bash
python test/infinicore/run.py --nvidia --ops fmod
python test/infinicore/run.py --nvidia --ops logdet
python test/infinicore/run.py --nvidia --ops upsample_nearest
python test/infinicore/run.py --nvidia --ops logical_and
python test/infinicore/run.py --nvidia --ops logical_not
python test/infinicore/run.py --nvidia --ops index_copy
```

组合验证命令：

```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=1 PYTHONFAULTHANDLER=1 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python test/infinicore/run.py --nvidia --ops index_copy addbmm gaussian_nll_loss fmod logdet upsample_nearest logical_and logical_not
```



**预期结果**
上述 T1-1 相关 NVIDIA 算子不应在单算子测试中失败，也不应因为测试数据构造问题、Python wrapper 接线问题或缺失的 `ntops.torch` API 调用导致全量测试失败。

预期结果：

* `fmod`、`logdet`、`upsample_nearest`、`logical_and`、`logical_not`、`index_copy` 单算子测试通过；
* `addbmm`、`gaussian_nll_loss` 单算子连续测试稳定通过；
* 组合测试中 `Failed: 0`；
* 后续全量 NVIDIA 测试不再被这些算子阻塞。

**截图**

<img width="1982" height="404" alt="Image" src="https://github.com/user-attachments/assets/c0c0d47e-d80a-4b58-8659-0a94b31f58c2" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] 修复 2025 秋季算子比赛 T1-1 NVIDIA 测试失败/不稳定算子 #1207

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] 修复 2025 秋季算子比赛 T1-1 NVIDIA 测试失败/不稳定算子 #1207

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions