Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Commit 36ed5e0

Browse files
mk-61DickJC123vcherepanov-nv
authored
Port convolutions to cuDNN v8 API (#20635)
* Add failsafe flag to StorageManager Alloc() * Clear sticky cudaErrorMemoryAllocation errors * Make Conv and Deconv cuDNN implementation use v8 API This copies changes I previously implemented in the container. Dick Carter <dcarter@nvidia.com> made a number of improvements and fixes (memory use during auto-tuning, proper time calculation and time limit cutoff in auto-tuning sampler, etc). * Downstandard some C++17 code to C++14 to accommodate CUDA 10 * Relax cuDNN version to 8.0.2 * Use newer cuDNN version in CI * Dont's verify cmake.org certificate * Disable mobilenet inference test * Re-format with the new clang-format config * Fix cpplint after clang-format * Disable fprop eng:5 to fix test failure on M60 * Conv autotune workspaces released via DirectFree() * Address review comments * Pamper clang-format * Fix default heuristics mode logic and document env var * Add doc for MXNET_CUDNN_ALGO_VERBOSE_LEVEL * More review comments Co-authored-by: Dick Carter <dcarter@nvidia.com> Co-authored-by: Vladimir Cherepanov <vcherepanov@nvidia.com>
1 parent 16fed6e commit 36ed5e0

25 files changed

Lines changed: 1925 additions & 1909 deletions

ci/docker/Dockerfile.build.centos7

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ SHELL [ "/usr/bin/scl", "enable", "devtoolset-7", "rh-python38", "rh-maven35" ]
8888

8989
# Install minimum required cmake version
9090
RUN cd /usr/local/src && \
91-
wget -nv https://cmake.org/files/v3.13/cmake-3.13.5-Linux-x86_64.sh && \
91+
wget -nv --no-check-certificate https://cmake.org/files/v3.13/cmake-3.13.5-Linux-x86_64.sh && \
9292
sh cmake-3.13.5-Linux-x86_64.sh --prefix=/usr/local --skip-license && \
9393
rm cmake-3.13.5-Linux-x86_64.sh
9494

ci/docker/Dockerfile.build.ubuntu

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,7 @@ ARG BASE_IMAGE
161161
RUN export SHORT_CUDA_VERSION=${CUDA_VERSION%.*} && \
162162
export OS_RELEASE="$(cat /etc/os-release)" && \
163163
apt-get update && \
164+
apt-get install -y --allow-change-held-packages libcudnn8 libcudnn8-dev && \
164165
if [[ ${OS_RELEASE} == *"Bionic"* ]]; then \
165166
if [ ${SHORT_CUDA_VERSION} = 11.0 ]; then \
166167
TRT_VERSION="7.2.0-1+cuda11.0"; \

docs/static_site/src/pages/api/faq/env_var.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,16 +295,62 @@ If ctypes is used, it must be `mxnet._ctypes.ndarray.NDArrayBase`.
295295
- Value of 1 chooses the best algo in a limited workspace
296296
- Value of 2 chooses the fastest algo whose memory requirements may be larger than the default workspace threshold
297297

298+
* MXNET_CUDNN_HEUR_MODE
299+
- Values: 0 or 1 (available since cuDNN 8.1) ```(default=1 for cuDNN 8.1 and later, otherwise 0)```
300+
- Choose cuDNN heuristics mode.
301+
- If set to '0', use fast decision tree based method.
302+
- If set to '1', use neural network based method. It generalizes better for unknown or uncommon models.
303+
304+
* MXNET_CUDNN_ALGO_VERBOSE_LEVEL
305+
- Values: 0, 1, or 2 ```(default=0)```
306+
- The level of printed output describing the "convolution engine" configurations
307+
- Value of 0 produces no output
308+
- Value of 1 outputs for the chosen config the engine number ("algo"), additional parameters ("knobs") and numerical notes
309+
- Value of 2 outputs the same info as with a '1' setting, but for all configs considered
310+
The output can be used to develop engine config filtering strategies to modify model behaviors.
311+
Numerical accuracy may be improved by filtering out configs shown with 'rp', 'w' or 'fft' (i.e. reduced precision, winograd, or fft).
312+
The configs are output with their list-index, as suggested by cuDNN, and with the chosen config flagged with a '*'.
313+
If autotuning is enabled (MXNET_CUDNN_AUTOTUNE_DEFAULT != 0), the measured kernel times will be reported.
314+
298315
* MXNET_CUDA_ALLOW_TENSOR_CORE
299316
- 0(false) or 1(true) ```(default=1)```
300317
- If set to '0', disallows Tensor Core use in CUDA ops.
301318
- If set to '1', allows Tensor Core use in CUDA ops.
302319
- This variable can only be set once in a session.
320+
- Also controls filtering cuDNN engines with CUDNN_NUMERICAL_NOTE_TENSOR_CORE.
303321

304322
* MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION
305323
- 0(false) or 1(true) ```(default=0)```
306324
- If set to '0', disallows implicit type conversions to Float16 to use Tensor Cores
307325
- If set to '1', allows CUDA ops like RNN and Convolution to use TensorCores even with Float32 input data by using implicit type casting to Float16. Only has an effect if `MXNET_CUDA_ALLOW_TENSOR_CORE` is `1`.
326+
- Also controls filtering cuDNN engines with CUDNN_NUMERICAL_NOTE_DOWN_CONVERT_INPUTS (such engines are disallowed if set to 0).
327+
328+
* MXNET_CUDNN_ALLOW_REDUCED_PRECISION_REDUCTION
329+
- 0(false) or 1(true) ```(default=1)```
330+
- If set to '0', disallows cuDNN engines with CUDNN_NUMERICAL_NOTE_REDUCED_PRECISION_REDUCTION.
331+
- If set to '1', allows cuDNN engines with CUDNN_NUMERICAL_NOTE_REDUCED_PRECISION_REDUCTION.
332+
333+
* MXNET_CUDNN_ALLOW_FFT
334+
- 0(false) or 1(true) ```(default=1)```
335+
- If set to '0', disallows cuDNN engines with CUDNN_NUMERICAL_NOTE_FFT.
336+
- If set to '1', allows cuDNN engines with CUDNN_NUMERICAL_NOTE_FFT.
337+
338+
* MXNET_CUDNN_ALLOW_WINOGRAD
339+
- 0(false) or 1(true) ```(default=1)```
340+
- If set to '0', disallows cuDNN engines with CUDNN_NUMERICAL_NOTE_WINOGRAD.
341+
- If set to '1', allows cuDNN engines with CUDNN_NUMERICAL_NOTE_WINOGRAD.
342+
343+
* MXNET_CUDNN_DISABLED_CONV_FWD_ENGINES
344+
- Comma-separated list of cuDNN convolution forward engine numbers to disable.
345+
- Normally should be left alone, unless you know what you're doing.
346+
347+
* MXNET_CUDNN_DISABLED_CONV_DGRAD_ENGINES
348+
- Comma-separated list of cuDNN convolution dgrad engine numbers to disable.
349+
- Normally should be left alone, unless you know what you're doing.
350+
351+
* MXNET_CUDNN_DISABLED_CONV_WGRAD_ENGINES
352+
- Comma-separated list of cuDNN convolution wgrad engine numbers to disable.
353+
- Normally should be left alone, unless you know what you're doing.
308354

309355
* MXNET_CUDA_LIB_CHECKING
310356
- 0(false) or 1(true) ```(default=1)```
@@ -342,6 +388,7 @@ If ctypes is used, it must be `mxnet._ctypes.ndarray.NDArrayBase`.
342388
- If set to true, MXNet will only use deterministic algorithms in forward and backward computation.
343389
If no such algorithm exists given other constraints, MXNet will error out. This variable affects the choice
344390
of CUDNN convolution algorithms. Please see [CUDNN developer guide](https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html) for more details.
391+
- Also controls filtering cuDNN engines with CUDNN_NUMERICAL_NOTE_NONDETERMINISTIC (such engines are disallowed if set to 1).
345392

346393
* MXNET_CPU_PARALLEL_SIZE
347394
- Values: Int ```(default=200000)```

include/mxnet/storage.h

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -86,20 +86,21 @@ class Storage {
8686
* \brief Allocate a new contiguous memory for a given size.
8787
* \param size Total size of memory in bytes.
8888
* \param ctx Context information about the device and ID.
89+
* \param failsafe Return a handle with a null dptr if out of memory, rather than exit.
8990
* \return Handle struct.
9091
*/
91-
Handle Alloc(size_t size, Context ctx) {
92+
Handle Alloc(size_t size, Context ctx, bool failsafe = false) {
9293
Handle hd;
9394
hd.size = size;
9495
hd.ctx = ctx;
95-
this->Alloc(&hd);
96+
this->Alloc(&hd, failsafe);
9697
return hd;
9798
}
9899
/*!
99100
* \brief Allocate a new contiguous memory for a given size.
100101
* \param handle handle initialized with size and ctx
101102
*/
102-
virtual void Alloc(Handle* handle) = 0;
103+
virtual void Alloc(Handle* handle, bool failsafe = false) = 0;
103104
/*!
104105
* \brief Increase ref counter on shared memory.
105106
* \param handle handle to shared memory.

0 commit comments

Comments
 (0)