A simple example of offloading computations in a Stan model to the GPU. Uses the beta negative binomial distribution and its gradient as an example.
This repository is written for CmdStan version 2.35.0. For python users, here's install script. Installation scripts via other interfaces are similar to this one.
from cmdstanpy import cmdstan_path, set_cmdstan_path, install_cmdstan
install_cmdstan(version="2.35.0", verbose=True, progress=True, cores=4)
set_cmdstan_path(os.path.join(os.path.expanduser('~'), '.cmdstan', 'cmdstan-2.35.0'))
cmdstan_path()Ensure that NVIDIA GPU drivers are correctly installed.
nvidia-smiEnsure that CUDA Toolkit are correctly installed and is accessible via PATH
nvcc --version
echo $CUDA_HOMEClone this repository, rename it to cmdstan-cuda, and move it to the root directory of CmdStan.
In most cases, this will be located at: $HOME/.cmdstan/cmdstan-2.35.0.
git clone https://github.com/MLGlobalHealth/cmdstan-cuda-example.git cmdstan-cuda
CMDSTAN_HOME="$HOME/.cmdstan/cmdstan-2.35.0"
mv cmdstan-cuda "$CMDSTAN_HOME/"Execute main.py, and go to the output and model folders to have a look.
The output folder should contain a comparison histogram.
The model folder should contain the saved results, logs, and diagnostics.
To find the critical point on your current machine where CPU and GPU performance are equal, run:
python prof/prof.pyThen go to the output folder to check. It should contain CSV output of the runtime test and comparative plots.
To view the details and timeline of CUDA API calls by CmdStan model, run:
nsys profile --stats=true --force-overwrite=true -o report ./prof_cu_bnb.shCould use NVIDIA Nsight Systems provided to open the generated report.nsys-rep file.