Review and add GPU-accelerated notebook samples by bvonodiripsa · Pull Request #243 · microsoft/fabric-samples

bvonodiripsa · 2026-06-02T01:00:42Z

Summary

Adds docs-samples/data-science/gpu-accelerated-samples/ folder with 11 Jupyter notebooks demonstrating NVIDIA GPU acceleration using RAPIDS libraries
cuDF notebooks: pandas accelerator mode demos (general + stock analysis)
cuML notebook: scikit-learn accelerator for ML workflows (PCA, UMAP, KNN, HDBSCAN)
Benchmarks: GPU vs CPU compute comparison, RAPIDS DataFrame operations
Multi-GPU: Ray + RAPIDS embedding generation and KNN search across 4 GPUs
cuCIM medical imaging (subfolder): whole-slide pathology image reading, cache performance, Gabor texture classification, random walker segmentation, vesselness filtering
Includes a README with descriptions of all notebooks

Environment

Tested on Azure VM with NVIDIA Tesla T4 GPU, CUDA 12.x, Python 3.13, RAPIDS 25.x (Conda environment).

RAPIDS cuDF, cuML, and cuCIM notebooks demonstrating GPU vs CPU acceleration for dataframes, ML, embeddings, KNN search, and medical image processing.

bvonodiripsa · 2026-06-02T01:12:30Z

@microsoft-github-policy-service agree company="NVIDIA"

- Add title/description cells to cuCIM notebooks (gabor, random walker, vesselness) - Add title to multi-gpu embedding notebook, fix step ordering - Convert cuCIM image paths to relative for portability - Update cuCIM folder references (cucim_notebooks → cucim_medical_imaging)

thinkall

Thank you so much for the great PR! @bvonodiripsa

I've left some comments and suggestions. We can discuss more details later.

thinkall · 2026-06-04T00:00:15Z

+   "source": [
+    "from cucim import CuImage\n",
+    "\n",
+    "img = CuImage(\"input/image.tif\")\n",


Better to auto download the data from a blob storage to local path.

thinkall · 2026-06-04T00:14:36Z

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%load_ext cudf.pandas"


This won't work in Fabric as we have pre-run script to import pandas before running user code. I'll check if we can add this to the pre-run script. It might be an issue if loading this extension will take more than a few seconds.

thinkall · 2026-06-04T00:20:13Z

+| [rapids-dataframe-gpu-vs-cpu.ipynb](rapids-dataframe-gpu-vs-cpu.ipynb) | RAPIDS cuDF DataFrame operations compared to pandas on larger datasets |
+| [multi-gpu-embedding-and-knn-search.ipynb](multi-gpu-embedding-and-knn-search.ipynb) | Multi-GPU text embedding generation and KNN similarity search using Ray + RAPIDS |
+
+## cuCIM Medical Imaging


these notebooks seem to be a little bit heavy as sample notebooks, I'd suggest we combine them into one notebook and keep core components in the notebooks.

thinkall · 2026-06-04T00:23:14Z

+    "id": "4zGUeWvcTbDs"
+   },
+   "source": [
+    "# Download the data\n",


This notebook shows similar functionalities with stock one. Maybe we can keep one of them. There are also duplicated code inside the notebooks, I'd suggest simplifying the contents.

thinkall · 2026-06-04T00:25:34Z

+    "\n",
+    "\n",
+    "# --- 4 GPUs with Ray ---\n",
+    "if ray.is_initialized():\n",


Let's remove/hide the ray part to avoid confusing customers.

thinkall · 2026-06-04T00:33:19Z

+    "from openai import AzureOpenAI\n",
+    "\n",
+    "client = AzureOpenAI(\n",
+    "    api_key=\"YOUR_AZURE_OPENAI_API_KEY\",\n",


Notebook should not raise errors when keys are not provided.

bvonodiripsa · 2026-06-06T19:50:42Z

Thanks Li, I agree. Let me fix it. From: Li Jiang ***@***.***> Sent: Wednesday, June 3, 2026 5:37 PM To: microsoft/fabric-samples ***@***.***> Cc: Aleksandr Spiridonov (NVIDIA CORPORATION) ***@***.***>; Mention ***@***.***> Subject: Re: [microsoft/fabric-samples] Review and add GPU-accelerated notebook samples (PR #243) @thinkall commented on this pull request. Thank you so much for the great PR! @bvonodiripsa<https://github.com/bvonodiripsa> I've left some comments and suggestions. We can discuss more details later.

________________________________ In docs-samples/data-science/gpu-accelerated-samples/cucim_medical_imaging/01-whole-slide-image-reading.ipynb<#243 (comment)>:

+ "id": "6378c4a3",

+ "metadata": {}, + "source": [ + "## Read image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c02bf93e", + "metadata": {}, + "outputs": [], + "source": [ + "from cucim import CuImage\n", + "\n", + "img = CuImage(\"input/image.tif\")\n", Better to auto download the data from a blob storage to local path.

________________________________ In docs-samples/data-science/gpu-accelerated-samples/cudf-pandas-stock-analysis.ipynb<#243 (comment)>:

+ "cell_type": "markdown",

+ "metadata": { + "id": "Pq01z9FvJjxR" + }, + "source": [ + "# Analysis using Standard Pandas\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%load_ext cudf.pandas" This won't work in Fabric as we have pre-run script to import pandas before running user code. I'll check if we can add this to the pre-run script. It might be an issue if loading this extension will take more than a few seconds.

________________________________ In docs-samples/data-science/gpu-accelerated-samples/README.md<#243 (comment)>:

+- Python 3.12+

+- RAPIDS 25.x (cuDF, cuML, cuCIM) +- Conda environment recommended + +## Notebooks + +| Notebook | Description | +|----------|-------------| +| [cudf-pandas-accelerator-demo.ipynb](cudf-pandas-accelerator-demo.ipynb) | Drop-in GPU acceleration for pandas with `cudf.pandas` - no code changes needed | +| [cudf-pandas-stock-analysis.ipynb](cudf-pandas-stock-analysis.ipynb) | Stock market data analysis using GPU-accelerated pandas (read, merge, resample, plot) | +| [cuml-scikit-learn-accelerator-demo.ipynb](cuml-scikit-learn-accelerator-demo.ipynb) | Drop-in GPU acceleration for scikit-learn (PCA, UMAP, KNN, HDBSCAN on activity recognition data) | +| [gpu-vs-cpu-compute-benchmark.ipynb](gpu-vs-cpu-compute-benchmark.ipynb) | Side-by-side GPU vs CPU benchmark for common compute operations | +| [rapids-dataframe-gpu-vs-cpu.ipynb](rapids-dataframe-gpu-vs-cpu.ipynb) | RAPIDS cuDF DataFrame operations compared to pandas on larger datasets | +| [multi-gpu-embedding-and-knn-search.ipynb](multi-gpu-embedding-and-knn-search.ipynb) | Multi-GPU text embedding generation and KNN similarity search using Ray + RAPIDS | + +## cuCIM Medical Imaging these notebooks seem to be a little bit heavy as sample notebooks, I'd suggest we combine them into one notebook and keep core components in the notebooks.

________________________________ In docs-samples/data-science/gpu-accelerated-samples/cudf-pandas-accelerator-demo.ipynb<#243 (comment)>:

+ "id": "Y2vPCtXcCvUR",

+ "outputId": "bbc9fe46-ad25-4781-ff8d-5a7a5bbefb41", + "tags": [] + }, + "outputs": [], + "source": [ + "!nvidia-smi # this should display information about available GPUs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4zGUeWvcTbDs" + }, + "source": [ + "# Download the data\n", This notebook shows similar functionalities with stock one. Maybe we can keep one of them. There are also duplicated code inside the notebooks, I'd suggest simplifying the contents.

________________________________ In docs-samples/data-science/gpu-accelerated-samples/multi-gpu-embedding-and-knn-search.ipynb<#243 (comment)>:

+ "logging.disable(logging.WARNING)\n",

+ "\n", + "import ray\n", + "from sentence_transformers import SentenceTransformer\n", + "\n", + "# --- 1 GPU (optional) ---\n", + "if RUN_1GPU:\n", + " model_gpu = SentenceTransformer('all-MiniLM-L6-v2', device='cuda:0')\n", + " start = time.perf_counter()\n", + " embeddings_gpu = model_gpu.encode(documents, batch_size=512, show_progress_bar=False, convert_to_numpy=True)\n", + " gpu1_time = time.perf_counter() - start\n", + " print(f\"1 GPU: {gpu1_time:.2f}s ({len(documents)/gpu1_time:.0f} docs/sec)\")\n", + "\n", + "\n", + "# --- 4 GPUs with Ray ---\n", + "if ray.is_initialized():\n", Let's remove/hide the ray part to avoid confusing customers.

________________________________ In docs-samples/data-science/gpu-accelerated-samples/multi-gpu-embedding-and-knn-search.ipynb<#243 (comment)>:

+ "metadata": {},

+ "source": [ + "## Step 1: Connect to Azure OpenAI (GPT-5.4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from openai import AzureOpenAI\n", + "\n", + "client = AzureOpenAI(\n", + " api_key=\"YOUR_AZURE_OPENAI_API_KEY\",\n", Notebook should not raise errors when keys are not provided. - Reply to this email directly, view it on GitHub<#243?email_source=notifications&email_token=BFTW2ZUQWD2CUYBAFOURBND46DAB7A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINBSGM3DIMJQGEYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#pullrequestreview-4423641010>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BFTW2ZUYRIKY4ETJJJWP2V346DAB7AVCNFSM6AAAAACZWL47GGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DIMRTGY2DCMBRGA>. Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS<https://github.com/notifications/mobile/ios/BFTW2ZV7WM5COUNIVVM3S5L46DAB7A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINBSGM3DIMJQGEYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJKTGN5XXIZLSL5UW64Y> and Android<https://github.com/notifications/mobile/android/BFTW2ZQ5RE24IL4HKTD7V2T46DAB7A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINBSGM3DIMJQGEYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLTGN5XXIZLSL5QW4ZDSN5UWI>. Download it today! You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

- Remove cudf-pandas-accelerator-demo (duplicates stock analysis) - Remove multi-gpu-embedding-and-knn-search (Ray/Azure OpenAI dependency) - Remove separate gpu-vs-cpu-compute-benchmark and rapids-dataframe-gpu-vs-cpu - Add rapids-gpu-accelerated-demo: single notebook combining DataFrame ops, string ops, KMeans, Random Forest, text embeddings, and KNN search - Add auto-download cell in cuCIM 01 notebook for sample image - Add Microsoft Fabric note about %load_ext cudf.pandas pre-run requirement - Update README to reflect simplified notebook set

bvonodiripsa added 2 commits June 2, 2026 00:44

Add GPU-accelerated notebook samples

d281844

RAPIDS cuDF, cuML, and cuCIM notebooks demonstrating GPU vs CPU acceleration for dataframes, ML, embeddings, KNN search, and medical image processing.

Add README with notebook descriptions

8b0c943

bvonodiripsa added 3 commits June 2, 2026 01:22

Add NVIDIA RAPIDS attribution for all notebooks

125cef1

Add setup instructions and notes to README

35ee3a8

thinkall reviewed Jun 4, 2026

View reviewed changes

bvonodiripsa added 2 commits June 9, 2026 23:30

Add auto-download for HAR dataset in cuml notebook

9f50137

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review and add GPU-accelerated notebook samples#243

Review and add GPU-accelerated notebook samples#243
bvonodiripsa wants to merge 7 commits into
microsoft:mainfrom
bvonodiripsa:main

bvonodiripsa commented Jun 2, 2026

Uh oh!

bvonodiripsa commented Jun 2, 2026

Uh oh!

thinkall left a comment

Uh oh!

thinkall Jun 4, 2026

Uh oh!

thinkall Jun 4, 2026

Uh oh!

thinkall Jun 4, 2026

Uh oh!

thinkall Jun 4, 2026

Uh oh!

thinkall Jun 4, 2026

Uh oh!

thinkall Jun 4, 2026

Uh oh!

bvonodiripsa commented Jun 6, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bvonodiripsa commented Jun 2, 2026

Summary

Environment

Uh oh!

bvonodiripsa commented Jun 2, 2026

Uh oh!

thinkall left a comment

Choose a reason for hiding this comment

Uh oh!

thinkall Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

thinkall Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

thinkall Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

thinkall Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

thinkall Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

thinkall Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

bvonodiripsa commented Jun 6, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants