@kshyatt I bumped into this recently, and this is probably one of my mistakes that we might try and fix:
Most of the functionality of the CUDA extension does not actually require cuTENSOR, and this was mostly there because TensorOperations only had a cuTENSOR backend, while now Strided should just work (🤞).
We could try and separating that out, and only require cuTENSOR for specifically accessing the faster kernels.
@kshyatt I bumped into this recently, and this is probably one of my mistakes that we might try and fix:
Most of the functionality of the CUDA extension does not actually require cuTENSOR, and this was mostly there because TensorOperations only had a cuTENSOR backend, while now Strided should just work (🤞).
We could try and separating that out, and only require cuTENSOR for specifically accessing the faster kernels.