initial commit of DICOM store creation

fedorov · fedorov · commit ddea0c444a73 · 2023-09-14T10:16:15.000-04:00
diff --git a/notebooks/viewers_deployment/Creating_Google_Healthcare_DICOM_store.ipynb b/notebooks/viewers_deployment/Creating_Google_Healthcare_DICOM_store.ipynb
@@ -0,0 +1,344 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "private_outputs": true,
+      "toc_visible": true,
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/viewers_deployment/Creating_Google_Healthcare_DICOM_store.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Prerequisites\n",
+        "\n",
+        "This notebook complements the tutorial available in this document: https://tinyurl.com/idc-ohif-gcp\n",
+        "\n",
+        "You must complete the prerequisites discussed in that document before proceeding with this notebook!\n",
+        "\n",
+        "Please use the document above for providing feedback or asking questions!"
+      ],
+      "metadata": {
+        "id": "ScmfC87l5FAZ"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "2tuVb1sCCjUV"
+      },
+      "source": [
+        "## Preparation\n",
+        "\n",
+        "You will need to authenticate with Google to be able to complete this notebook."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "RVxG6QvteybL"
+      },
+      "source": [
+        "from google.colab import auth\n",
+        "auth.authenticate_user()"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Substitute `REPLACE_WITH_YOUR_PROJECT_ID` with the ID of the project you can access, which has billing configured, in the following cell and anywhere else you may see it."
+      ],
+      "metadata": {
+        "id": "DUFqJyhx45MC"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import os\n",
+        "os.environ[\"MY_PROJECT_ID\"] = \"REPLACE_WITH_YOUR_PROJECT_ID\""
+      ],
+      "metadata": {
+        "id": "IjqLpKMWXOKi"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!gcloud config set project ${MY_PROJECT_ID}"
+      ],
+      "metadata": {
+        "id": "PtwyS8d2c9ho"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "6is-ni21Ac1_"
+      },
+      "source": [
+        "## Define the list of instances\n",
+        "\n",
+        "We will use the query below to get the URLs for the files corresponding to the DICOM instances included in the sample DICOM study.\n",
+        "\n",
+        "Same as in the previous step, make sure to replace `REPLACE_WITH_YOUR_PROJECT_ID` in the cell below with the ID of your project, which should have billing enabled."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "9rsNuDaMB6wO"
+      },
+      "source": [
+        "%%bigquery --project=REPLACE_WITH_YOUR_PROJECT_ID cohort_df\n",
+        "\n",
+        "SELECT gcs_url\n",
+        "FROM `bigquery-public-data.idc_current.dicom_all`\n",
+        "WHERE StudyInstanceUID = \"1.3.6.1.4.1.14519.5.2.1.6279.6001.224985459390356936417021464571\""
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "vDSBOmiUDpxV"
+      },
+      "source": [
+        "## Retrieve DICOM instances\n",
+        "\n",
+        "First, we will save the Google Cloud Storage (GCS) URLs into a separate file."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "bife6XMuhWTD"
+      },
+      "source": [
+        "cohort_df = cohort_df.join(cohort_df[\"gcs_url\"].str.split('#', 1, expand=True).rename(columns={0:'gcs_url_no_revision', 1:'gcs_revision'}))\n",
+        "cohort_df[\"gcs_url_no_revision\"].to_csv(\"gcs_paths.txt\", header=False, index=False)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "5kXtIUh4hzO0"
+      },
+      "source": [
+        "!head /content/gcs_paths.txt"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "oqP67MonHBam"
+      },
+      "source": [
+        "Next, we will download the files to the VM filesystem using the standard `gsutil` command, which is preinstalled on Colab instances.\n",
+        "\n",
+        "IDC-hosted data is available from free Google Storage buckets maintained under [Google Public Dataset Program](https://console.cloud.google.com/marketplace/product/gcp-public-data-idc/nci-idc-data), which sponsors free egress of the data either within or out of the Google Cloud."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "BCjgz_DTnlgX"
+      },
+      "source": [
+        "# https://cloud.google.com/storage/docs/gsutil/commands/cp\n",
+        "%%capture\n",
+        "!mkdir downloaded_cohort\n",
+        "!cat gcs_paths.txt | gsutil -m cp -I ./downloaded_cohort"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "L_lE0c1FIvha"
+      },
+      "source": [
+        "## Create and populate GCP Healthcare DICOM store\n",
+        "\n",
+        "Next we will create a temporary storage bucket that will contain the files that will be imported into a DICOM store. Note that we could import the files directly, one by one, from their original locations, but that operation is slower."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "os.environ[\"MY_TEMP_BUCKET\"] = \"gs://\"+os.environ[\"MY_PROJECT_ID\"]+\"_af_ohiftutorial_temp\"\n",
+        "print(\"DICOM data bucket location: \"+os.environ[\"MY_TEMP_BUCKET\"])"
+      ],
+      "metadata": {
+        "id": "IXo205RjZv5X"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "**WARNING**: storage bucket names must be globally unique! If you fail to create a bucket due to a conflict, choose a different name, or reuse an existing bucket!"
+      ],
+      "metadata": {
+        "id": "rqMQKvkSdO9G"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!gsutil mb -p $MY_PROJECT_ID ${MY_TEMP_BUCKET}"
+      ],
+      "metadata": {
+        "id": "O744yoSdW4AH"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "sADXT6tukTu2"
+      },
+      "source": [
+        "%%capture\n",
+        "!gsutil -m cp -r ./downloaded_cohort/* ${MY_TEMP_BUCKET}"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "In the following cells we are first creating a Healthcare dataset, and then a DICOM store within that dataset.\n",
+        "\n",
+        "**WARNING**:\n",
+        "* please make sure Google Cloud Healthcare Service Agent has the Storage Viewer role to access the storage bucket! If you are not sure how to assign that role, please see the main tutorial document [here](https://docs.google.com/document/d/1v4Syu_yOV6yH--QBLGzsL9fJ7-XyD1CnQu4iTIoPVD8/edit#heading=h.jg76iucxtly9).\n",
+        "* if the dataset or DICOM store with the given names already exist, you can skip the following steps, or choose different names for those resources."
+      ],
+      "metadata": {
+        "id": "eC_kXT9jZHbW"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# These uniquely identify your DICOM store:\n",
+        "# you will need those if you want to open content of this store in OHIF Viewer\n",
+        "os.environ[\"DATASET_ID\"] = \"af_ohiftutorial_temp_dataset\"\n",
+        "os.environ[\"STORE_ID\"] = \"af_ohiftutorial_temp_store\"\n",
+        "os.environ[\"LOCATION\"] = \"us-central1\""
+      ],
+      "metadata": {
+        "id": "USSz27O4aV-3"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!gcloud healthcare datasets create ${DATASET_ID} --project=${MY_PROJECT_ID} --location=${LOCATION}"
+      ],
+      "metadata": {
+        "id": "-utc1cmGclST"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!gcloud healthcare dicom-stores create ${STORE_ID} --project=${MY_PROJECT_ID} --dataset=${DATASET_ID} --location=${LOCATION}"
+      ],
+      "metadata": {
+        "id": "crCPOQK-aFlo"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Finally, now that we have a GCS storage buckets with the files we want to use to populate the DICOM store, we can import the content of the bucket into the DICOM store created in the previous step."
+      ],
+      "metadata": {
+        "id": "rfrHJUch8Yeg"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!gcloud healthcare dicom-stores import gcs ${STORE_ID} \\\n",
+        "  --dataset=${DATASET_ID} \\\n",
+        "  --location=${LOCATION} \\\n",
+        "  --gcs-uri=${MY_TEMP_BUCKET}/**.dcm \\\n",
+        "  --project=${MY_PROJECT_ID}"
+      ],
+      "metadata": {
+        "id": "8gE7zbsldQHp"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Viewing DICOM store content in OHIF Viewer\n",
+        "\n",
+        "If you completed deployment of the OHIF Viewer, as described in [the tutorial document](https://docs.google.com/document/d/1v4Syu_yOV6yH--QBLGzsL9fJ7-XyD1CnQu4iTIoPVD8/edit#heading=h.jg76iucxtly9), you should now be able to open the study we retrieved in the first steps of this notebook in the viewer.\n",
+        "\n",
+        "To do this, update the string below to form the URL:\n",
+        "\n",
+        "* MY_OHIF_VIEWER: Firebase hosting URL\n",
+        "* MY_PROJECT_ID: project under which you created the Healthcare dataset\n",
+        "* DATASET_ID: dataset ID you specified earlier while creating the dataset\n",
+        "* STORE_ID: DICOM store ID you specified earlier\n",
+        "* 1.3.6.1.4.1.14519.5.2.1.6279.6001.224985459390356936417021464571 is the DICOM StudyInstanceUID that we used in the beginning of this notebook to retrieve the study from IDC - you can replace this with the StudyInstanceUID corresponding to any study you have in the DICOM store.\n",
+        "\n",
+        "**MY_OHIF_VIEWER_URL/projects/MY_PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/dicomStores/STORE_ID/study/1.3.6.1.4.1.14519.5.2.1.6279.6001.224985459390356936417021464571**"
+      ],
+      "metadata": {
+        "id": "Ca1EksVyfe22"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now is the good time to return back to the [tutorial document](https://docs.google.com/document/d/1v4Syu_yOV6yH--QBLGzsL9fJ7-XyD1CnQu4iTIoPVD8/edit#heading=h.jg76iucxtly9)!"
+      ],
+      "metadata": {
+        "id": "WtvtwQM1hPwd"
+      }
+    }
+  ]
+}