Skip to content

Review and add GPU-accelerated notebook samples#243

Open
bvonodiripsa wants to merge 7 commits into
microsoft:mainfrom
bvonodiripsa:main
Open

Review and add GPU-accelerated notebook samples#243
bvonodiripsa wants to merge 7 commits into
microsoft:mainfrom
bvonodiripsa:main

Conversation

@bvonodiripsa

Copy link
Copy Markdown

Summary

  • Adds docs-samples/data-science/gpu-accelerated-samples/ folder with 11 Jupyter notebooks demonstrating NVIDIA GPU acceleration using RAPIDS libraries
  • cuDF notebooks: pandas accelerator mode demos (general + stock analysis)
  • cuML notebook: scikit-learn accelerator for ML workflows (PCA, UMAP, KNN, HDBSCAN)
  • Benchmarks: GPU vs CPU compute comparison, RAPIDS DataFrame operations
  • Multi-GPU: Ray + RAPIDS embedding generation and KNN search across 4 GPUs
  • cuCIM medical imaging (subfolder): whole-slide pathology image reading, cache performance, Gabor texture classification, random walker segmentation, vesselness filtering
  • Includes a README with descriptions of all notebooks

Environment

Tested on Azure VM with NVIDIA Tesla T4 GPU, CUDA 12.x, Python 3.13, RAPIDS 25.x (Conda environment).

RAPIDS cuDF, cuML, and cuCIM notebooks demonstrating GPU vs CPU
acceleration for dataframes, ML, embeddings, KNN search, and
medical image processing.
@bvonodiripsa

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="NVIDIA"

- Add title/description cells to cuCIM notebooks (gabor, random walker, vesselness)
- Add title to multi-gpu embedding notebook, fix step ordering
- Convert cuCIM image paths to relative for portability
- Update cuCIM folder references (cucim_notebooks → cucim_medical_imaging)

@thinkall thinkall left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the great PR! @bvonodiripsa

I've left some comments and suggestions. We can discuss more details later.

"source": [
"from cucim import CuImage\n",
"\n",
"img = CuImage(\"input/image.tif\")\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to auto download the data from a blob storage to local path.

"metadata": {},
"outputs": [],
"source": [
"%load_ext cudf.pandas"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work in Fabric as we have pre-run script to import pandas before running user code. I'll check if we can add this to the pre-run script. It might be an issue if loading this extension will take more than a few seconds.

| [rapids-dataframe-gpu-vs-cpu.ipynb](rapids-dataframe-gpu-vs-cpu.ipynb) | RAPIDS cuDF DataFrame operations compared to pandas on larger datasets |
| [multi-gpu-embedding-and-knn-search.ipynb](multi-gpu-embedding-and-knn-search.ipynb) | Multi-GPU text embedding generation and KNN similarity search using Ray + RAPIDS |

## cuCIM Medical Imaging

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these notebooks seem to be a little bit heavy as sample notebooks, I'd suggest we combine them into one notebook and keep core components in the notebooks.

"id": "4zGUeWvcTbDs"
},
"source": [
"# Download the data\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook shows similar functionalities with stock one. Maybe we can keep one of them. There are also duplicated code inside the notebooks, I'd suggest simplifying the contents.

"\n",
"\n",
"# --- 4 GPUs with Ray ---\n",
"if ray.is_initialized():\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove/hide the ray part to avoid confusing customers.

"from openai import AzureOpenAI\n",
"\n",
"client = AzureOpenAI(\n",
" api_key=\"YOUR_AZURE_OPENAI_API_KEY\",\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebook should not raise errors when keys are not provided.

@bvonodiripsa

bvonodiripsa commented Jun 6, 2026 via email

Copy link
Copy Markdown
Author

- Remove cudf-pandas-accelerator-demo (duplicates stock analysis)
- Remove multi-gpu-embedding-and-knn-search (Ray/Azure OpenAI dependency)
- Remove separate gpu-vs-cpu-compute-benchmark and rapids-dataframe-gpu-vs-cpu
- Add rapids-gpu-accelerated-demo: single notebook combining DataFrame ops,
  string ops, KMeans, Random Forest, text embeddings, and KNN search
- Add auto-download cell in cuCIM 01 notebook for sample image
- Add Microsoft Fabric note about %load_ext cudf.pandas pre-run requirement
- Update README to reflect simplified notebook set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants