Review and add GPU-accelerated notebook samples#243
Conversation
RAPIDS cuDF, cuML, and cuCIM notebooks demonstrating GPU vs CPU acceleration for dataframes, ML, embeddings, KNN search, and medical image processing.
|
@microsoft-github-policy-service agree company="NVIDIA" |
- Add title/description cells to cuCIM notebooks (gabor, random walker, vesselness) - Add title to multi-gpu embedding notebook, fix step ordering - Convert cuCIM image paths to relative for portability - Update cuCIM folder references (cucim_notebooks → cucim_medical_imaging)
thinkall
left a comment
There was a problem hiding this comment.
Thank you so much for the great PR! @bvonodiripsa
I've left some comments and suggestions. We can discuss more details later.
| "source": [ | ||
| "from cucim import CuImage\n", | ||
| "\n", | ||
| "img = CuImage(\"input/image.tif\")\n", |
There was a problem hiding this comment.
Better to auto download the data from a blob storage to local path.
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "%load_ext cudf.pandas" |
There was a problem hiding this comment.
This won't work in Fabric as we have pre-run script to import pandas before running user code. I'll check if we can add this to the pre-run script. It might be an issue if loading this extension will take more than a few seconds.
| | [rapids-dataframe-gpu-vs-cpu.ipynb](rapids-dataframe-gpu-vs-cpu.ipynb) | RAPIDS cuDF DataFrame operations compared to pandas on larger datasets | | ||
| | [multi-gpu-embedding-and-knn-search.ipynb](multi-gpu-embedding-and-knn-search.ipynb) | Multi-GPU text embedding generation and KNN similarity search using Ray + RAPIDS | | ||
|
|
||
| ## cuCIM Medical Imaging |
There was a problem hiding this comment.
these notebooks seem to be a little bit heavy as sample notebooks, I'd suggest we combine them into one notebook and keep core components in the notebooks.
| "id": "4zGUeWvcTbDs" | ||
| }, | ||
| "source": [ | ||
| "# Download the data\n", |
There was a problem hiding this comment.
This notebook shows similar functionalities with stock one. Maybe we can keep one of them. There are also duplicated code inside the notebooks, I'd suggest simplifying the contents.
| "\n", | ||
| "\n", | ||
| "# --- 4 GPUs with Ray ---\n", | ||
| "if ray.is_initialized():\n", |
There was a problem hiding this comment.
Let's remove/hide the ray part to avoid confusing customers.
| "from openai import AzureOpenAI\n", | ||
| "\n", | ||
| "client = AzureOpenAI(\n", | ||
| " api_key=\"YOUR_AZURE_OPENAI_API_KEY\",\n", |
There was a problem hiding this comment.
Notebook should not raise errors when keys are not provided.
|
Thanks Li,
I agree. Let me fix it.
From: Li Jiang ***@***.***>
Sent: Wednesday, June 3, 2026 5:37 PM
To: microsoft/fabric-samples ***@***.***>
Cc: Aleksandr Spiridonov (NVIDIA CORPORATION) ***@***.***>; Mention ***@***.***>
Subject: Re: [microsoft/fabric-samples] Review and add GPU-accelerated notebook samples (PR #243)
@thinkall commented on this pull request.
Thank you so much for the great PR! @bvonodiripsa<https://github.com/bvonodiripsa>
I've left some comments and suggestions. We can discuss more details later.
________________________________
In docs-samples/data-science/gpu-accelerated-samples/cucim_medical_imaging/01-whole-slide-image-reading.ipynb<#243 (comment)>:
+ "id": "6378c4a3",
+ "metadata": {},
+ "source": [
+ "## Read image"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c02bf93e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from cucim import CuImage\n",
+ "\n",
+ "img = CuImage(\"input/image.tif\")\n",
Better to auto download the data from a blob storage to local path.
________________________________
In docs-samples/data-science/gpu-accelerated-samples/cudf-pandas-stock-analysis.ipynb<#243 (comment)>:
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Pq01z9FvJjxR"
+ },
+ "source": [
+ "# Analysis using Standard Pandas\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%load_ext cudf.pandas"
This won't work in Fabric as we have pre-run script to import pandas before running user code. I'll check if we can add this to the pre-run script. It might be an issue if loading this extension will take more than a few seconds.
________________________________
In docs-samples/data-science/gpu-accelerated-samples/README.md<#243 (comment)>:
+- Python 3.12+
+- RAPIDS 25.x (cuDF, cuML, cuCIM)
+- Conda environment recommended
+
+## Notebooks
+
+| Notebook | Description |
+|----------|-------------|
+| [cudf-pandas-accelerator-demo.ipynb](cudf-pandas-accelerator-demo.ipynb) | Drop-in GPU acceleration for pandas with `cudf.pandas` - no code changes needed |
+| [cudf-pandas-stock-analysis.ipynb](cudf-pandas-stock-analysis.ipynb) | Stock market data analysis using GPU-accelerated pandas (read, merge, resample, plot) |
+| [cuml-scikit-learn-accelerator-demo.ipynb](cuml-scikit-learn-accelerator-demo.ipynb) | Drop-in GPU acceleration for scikit-learn (PCA, UMAP, KNN, HDBSCAN on activity recognition data) |
+| [gpu-vs-cpu-compute-benchmark.ipynb](gpu-vs-cpu-compute-benchmark.ipynb) | Side-by-side GPU vs CPU benchmark for common compute operations |
+| [rapids-dataframe-gpu-vs-cpu.ipynb](rapids-dataframe-gpu-vs-cpu.ipynb) | RAPIDS cuDF DataFrame operations compared to pandas on larger datasets |
+| [multi-gpu-embedding-and-knn-search.ipynb](multi-gpu-embedding-and-knn-search.ipynb) | Multi-GPU text embedding generation and KNN similarity search using Ray + RAPIDS |
+
+## cuCIM Medical Imaging
these notebooks seem to be a little bit heavy as sample notebooks, I'd suggest we combine them into one notebook and keep core components in the notebooks.
________________________________
In docs-samples/data-science/gpu-accelerated-samples/cudf-pandas-accelerator-demo.ipynb<#243 (comment)>:
+ "id": "Y2vPCtXcCvUR",
+ "outputId": "bbc9fe46-ad25-4781-ff8d-5a7a5bbefb41",
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "!nvidia-smi # this should display information about available GPUs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "4zGUeWvcTbDs"
+ },
+ "source": [
+ "# Download the data\n",
This notebook shows similar functionalities with stock one. Maybe we can keep one of them. There are also duplicated code inside the notebooks, I'd suggest simplifying the contents.
________________________________
In docs-samples/data-science/gpu-accelerated-samples/multi-gpu-embedding-and-knn-search.ipynb<#243 (comment)>:
+ "logging.disable(logging.WARNING)\n",
+ "\n",
+ "import ray\n",
+ "from sentence_transformers import SentenceTransformer\n",
+ "\n",
+ "# --- 1 GPU (optional) ---\n",
+ "if RUN_1GPU:\n",
+ " model_gpu = SentenceTransformer('all-MiniLM-L6-v2', device='cuda:0')\n",
+ " start = time.perf_counter()\n",
+ " embeddings_gpu = model_gpu.encode(documents, batch_size=512, show_progress_bar=False, convert_to_numpy=True)\n",
+ " gpu1_time = time.perf_counter() - start\n",
+ " print(f\"1 GPU: {gpu1_time:.2f}s ({len(documents)/gpu1_time:.0f} docs/sec)\")\n",
+ "\n",
+ "\n",
+ "# --- 4 GPUs with Ray ---\n",
+ "if ray.is_initialized():\n",
Let's remove/hide the ray part to avoid confusing customers.
________________________________
In docs-samples/data-science/gpu-accelerated-samples/multi-gpu-embedding-and-knn-search.ipynb<#243 (comment)>:
+ "metadata": {},
+ "source": [
+ "## Step 1: Connect to Azure OpenAI (GPT-5.4)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "from openai import AzureOpenAI\n",
+ "\n",
+ "client = AzureOpenAI(\n",
+ " api_key=\"YOUR_AZURE_OPENAI_API_KEY\",\n",
Notebook should not raise errors when keys are not provided.
-
Reply to this email directly, view it on GitHub<#243?email_source=notifications&email_token=BFTW2ZUQWD2CUYBAFOURBND46DAB7A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINBSGM3DIMJQGEYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#pullrequestreview-4423641010>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BFTW2ZUYRIKY4ETJJJWP2V346DAB7AVCNFSM6AAAAACZWL47GGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DIMRTGY2DCMBRGA>.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS<https://github.com/notifications/mobile/ios/BFTW2ZV7WM5COUNIVVM3S5L46DAB7A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINBSGM3DIMJQGEYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJKTGN5XXIZLSL5UW64Y> and Android<https://github.com/notifications/mobile/android/BFTW2ZQ5RE24IL4HKTD7V2T46DAB7A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINBSGM3DIMJQGEYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLTGN5XXIZLSL5QW4ZDSN5UWI>. Download it today!
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
- Remove cudf-pandas-accelerator-demo (duplicates stock analysis) - Remove multi-gpu-embedding-and-knn-search (Ray/Azure OpenAI dependency) - Remove separate gpu-vs-cpu-compute-benchmark and rapids-dataframe-gpu-vs-cpu - Add rapids-gpu-accelerated-demo: single notebook combining DataFrame ops, string ops, KMeans, Random Forest, text embeddings, and KNN search - Add auto-download cell in cuCIM 01 notebook for sample image - Add Microsoft Fabric note about %load_ext cudf.pandas pre-run requirement - Update README to reflect simplified notebook set
Summary
docs-samples/data-science/gpu-accelerated-samples/folder with 11 Jupyter notebooks demonstrating NVIDIA GPU acceleration using RAPIDS librariesEnvironment
Tested on Azure VM with NVIDIA Tesla T4 GPU, CUDA 12.x, Python 3.13, RAPIDS 25.x (Conda environment).