Summary
This bug seems to occur regardless of the length of the series. Interestingly, the error always occurs when element exceeds 65535, which falls nicely to the limit of y- or z-grid of any cuda versions, or x-grid of old cuda versions see: https://en.wikipedia.org/wiki/CUDA#Technical_specifications
The error does not seem to occur with pytorch when a similar size is used.
Error
CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Reproducible Code
produces error
import numpy as np
import torch
import ptwt
import pywt
limit = 65535 + 1
for i in range(10):
B, T = limit, 32
wavelet = pywt.Wavelet('db4')
level = 4
series_batch = torch.randn(B, T, dtype=torch.float32, device="cuda")
coeffs = ptwt.wavedec(series_batch, wavelet, level=level)
reconstructed = ptwt.waverec(coeffs, wavelet)
works nicely with pytorch
import torch
import torch.nn as nn
limit = 65535 + 1
B, T = limit, 32
in_channels = T
out_channels = T
kernel_size = 5
padding = kernel_size // 2
conv = nn.Conv1d(in_channels=in_channels, out_channels=out_channels,
kernel_size=kernel_size, padding=padding).cuda()
for i in range(10):
series_batch = torch.randn(B, T, dtype=torch.float32, device="cuda")
series_batch = series_batch.transpose(0, 1).unsqueeze(0)
convolved = conv(series_batch)
reconstructed = convolved.squeeze(0).transpose(0, 1)
works nicely with pytorch with 4 convolutions
import torch
import torch.nn as nn
limit = 65535 + 1
B, T = limit, 32
in_channels = T
out_channels = T
kernel_size = 5
padding = kernel_size // 2
# Create four convolution layers to simulate four levels
convs = nn.ModuleList([
nn.Conv1d(in_channels=in_channels, out_channels=out_channels,
kernel_size=kernel_size, padding=padding).cuda()
for _ in range(4)
])
for i in range(10):
series_batch = torch.randn(B, T, dtype=torch.float32, device="cuda")
series_batch = series_batch.transpose(0, 1).unsqueeze(0)
for conv in convs:
series_batch = conv(series_batch)
series_batch = nn.functional.avg_pool1d(series_batch, kernel_size=2, stride=2, padding=0)
reconstructed = nn.functional.interpolate(series_batch, size=B, mode='linear', align_corners=False)
reconstructed = reconstructed.squeeze(0).transpose(0, 1)
Summary
This bug seems to occur regardless of the length of the series. Interestingly, the error always occurs when element exceeds 65535, which falls nicely to the limit of y- or z-grid of any cuda versions, or x-grid of old cuda versions see: https://en.wikipedia.org/wiki/CUDA#Technical_specifications
The error does not seem to occur with pytorch when a similar size is used.
Error
Reproducible Code
produces error
works nicely with pytorch
works nicely with pytorch with 4 convolutions