Skip to content

Commit 7e2058e

Browse files
minor changes in text and code
1 parent 3592a38 commit 7e2058e

1 file changed

Lines changed: 116 additions & 122 deletions

File tree

notebooks/collections_demos/RMS-Mutation-Prediction-Expert-Annotations_exploration.ipynb

Lines changed: 116 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
{
44
"cell_type": "markdown",
55
"metadata": {
6-
"id": "view-in-github",
7-
"colab_type": "text"
6+
"colab_type": "text",
7+
"id": "view-in-github"
88
},
99
"source": [
1010
"<a href=\"https://colab.research.google.com/github/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/collections_demos/RMS-Mutation-Prediction-Expert-Annotations_exploration.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
@@ -32,7 +32,7 @@
3232
"\n",
3333
"`RMS-Mutation-Prediction-Expert-Annotations` is collection available in the [NCI Imaging Data Commons (IDC)](https://portal.imaging.datacommons.cancer.gov) that contains expert annotations of tissue types for 95 patients of the digital pathology slide images in the `RMS-Mutation-Prediction` collection released earlier. You can learn more about this collection in the following dataset record:\n",
3434
"\n",
35-
"> Bridge, C., Brown, G. T., Jung, H., Lisle, C., Clunie, D., Milewski, D., Liu, Y., Collins, J., Linardic, C. M., Hawkins, D. S., Venkatramani, R., Fedorov, A., & Khan, J. (2024). Expert annotations of the tissue types for the RMS-Mutation-Prediction microscopy images [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10462858\n",
35+
"> Bridge, C., Brown, G. T., Jung, H., Lisle, C., Clunie, D., Milewski, D., Liu, Y., Collins, J., Linardic, C. M., Hawkins, D. S., Venkatramani, R., Fedorov, A., & Khan, J. (2024). Expert annotations of the tissue types for the `RMS-Mutation-Prediction` microscopy images [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10462858\n",
3636
"\n",
3737
"You can access this annotations collection in the IDC Portal using [this link](https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=RMS-Mutation-Prediction-Expert-Annotations), or you can explore its content using this [custom Google Looker dashboard](https://tinyurl.com/idc-rms-annotations).\n",
3838
"\n",
@@ -94,15 +94,15 @@
9494
},
9595
{
9696
"cell_type": "code",
97-
"source": [
98-
"%%capture\n",
99-
"!sudo apt-get install dcmtk"
100-
],
97+
"execution_count": null,
10198
"metadata": {
10299
"id": "VNcBETkX1ePy"
103100
},
104-
"execution_count": null,
105-
"outputs": []
101+
"outputs": [],
102+
"source": [
103+
"%%capture\n",
104+
"!sudo apt-get install dcmtk"
105+
]
106106
},
107107
{
108108
"cell_type": "markdown",
@@ -199,8 +199,8 @@
199199
},
200200
"outputs": [
201201
{
202-
"output_type": "stream",
203202
"name": "stderr",
203+
"output_type": "stream",
204204
"text": [
205205
"Downloading data: 100%|██████████| 4.81M/4.81M [00:01<00:00, 3.15MB/s]\n"
206206
]
@@ -237,11 +237,7 @@
237237
},
238238
"outputs": [
239239
{
240-
"output_type": "execute_result",
241240
"data": {
242-
"text/plain": [
243-
"<IPython.lib.display.IFrame at 0x7f6f352e9690>"
244-
],
245241
"text/html": [
246242
"\n",
247243
" <iframe\n",
@@ -253,10 +249,14 @@
253249
" \n",
254250
" ></iframe>\n",
255251
" "
252+
],
253+
"text/plain": [
254+
"<IPython.lib.display.IFrame at 0x7f6f352e9690>"
256255
]
257256
},
257+
"execution_count": 12,
258258
"metadata": {},
259-
"execution_count": 12
259+
"output_type": "execute_result"
260260
}
261261
],
262262
"source": [
@@ -292,11 +292,7 @@
292292
},
293293
"outputs": [
294294
{
295-
"output_type": "execute_result",
296295
"data": {
297-
"text/plain": [
298-
"<pandas.io.formats.style.Styler at 0x7f6eaa2c1f30>"
299-
],
300296
"text/html": [
301297
"<style type=\"text/css\">\n",
302298
"</style>\n",
@@ -985,10 +981,14 @@
985981
" </tr>\n",
986982
" </tbody>\n",
987983
"</table>\n"
984+
],
985+
"text/plain": [
986+
"<pandas.io.formats.style.Styler at 0x7f6eaa2c1f30>"
988987
]
989988
},
989+
"execution_count": 15,
990990
"metadata": {},
991-
"execution_count": 15
991+
"output_type": "execute_result"
992992
}
993993
],
994994
"source": [
@@ -1091,8 +1091,8 @@
10911091
},
10921092
"outputs": [
10931093
{
1094-
"output_type": "stream",
10951094
"name": "stdout",
1095+
"output_type": "stream",
10961096
"text": [
10971097
"This SR document contains 14 \"Planar ROI Measurements and Qualitative Evaluations\".\n",
10981098
"An example measurement group of type \"Planar ROI Measurements and Qualitative Evaluations\": \n",
@@ -1176,30 +1176,27 @@
11761176
},
11771177
{
11781178
"cell_type": "markdown",
1179-
"source": [
1180-
"An alternative way to look at the content of a DICOM SR is by using DCMTK [`dsrdump` command line utility](https://support.dcmtk.org/docs/dsrdump.html), that will have one line for each node of the SR content tree - a lot more condensed and perhaps easier to understand representation than what you see above!"
1181-
],
11821179
"metadata": {
11831180
"id": "nwEwwoGA9_9x"
1184-
}
1181+
},
1182+
"source": [
1183+
"An alternative way to look at the content of a DICOM SR is by using DCMTK [`dsrdump` command line utility](https://support.dcmtk.org/docs/dsrdump.html), that will have one line for each node of the SR content tree - a lot more condensed and perhaps easier to understand representation than what you see above!"
1184+
]
11851185
},
11861186
{
11871187
"cell_type": "code",
1188-
"source": [
1189-
"!dsrdump $sr_path_example1"
1190-
],
1188+
"execution_count": null,
11911189
"metadata": {
1192-
"id": "iEauB0ii-Ap2",
1193-
"outputId": "36ec7a4c-a4ad-491c-cae9-72d2794568f8",
11941190
"colab": {
11951191
"base_uri": "https://localhost:8080/"
1196-
}
1192+
},
1193+
"id": "iEauB0ii-Ap2",
1194+
"outputId": "36ec7a4c-a4ad-491c-cae9-72d2794568f8"
11971195
},
1198-
"execution_count": null,
11991196
"outputs": [
12001197
{
1201-
"output_type": "stream",
12021198
"name": "stdout",
1199+
"output_type": "stream",
12031200
"text": [
12041201
"Comprehensive 3D SR Document\n",
12051202
"\n",
@@ -1331,6 +1328,9 @@
13311328
"\n"
13321329
]
13331330
}
1331+
],
1332+
"source": [
1333+
"!dsrdump $sr_path_example1"
13341334
]
13351335
},
13361336
{
@@ -3564,72 +3564,38 @@
35643564
},
35653565
{
35663566
"cell_type": "markdown",
3567+
"metadata": {
3568+
"id": "OyzS7N7SWdYb"
3569+
},
35673570
"source": [
35683571
"# Advanced topic: Querying annotations using BigQuery\n",
35693572
"\n",
3570-
"In the exercises above we fetched all of the DICOM SR files before examining them locally using `highdicom`.\n",
3571-
"\n",
3572-
"All of the metadata available in DICOM files you will find in IDC is extracted and searchable in Google BigQuery tables. With BigQuery search, you do not need to download anything if all you need to do is examine the metadata.\n",
3573+
"In the exercises above we fetched all of the DICOM SR files before examining them locally using highdicom. \n",
3574+
"Yet, all of the metadata available in IDC's DICOM files are also extracted to and searchable in Google BigQuery tables. With BigQuery search, you do not need to download anything if all you need to do is examine the metadata.\n",
35733575
"\n",
35743576
"If you would like to use BigQuery, you will need to complete the advanced prerequisites in [part 1](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part1_prerequisites.ipynb) of the \"Getting started\" tutorial series before running the following cells. You can also check out [part 3](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part3_exploring_cohorts.ipynb) of that series to get started with the IDC BigQuery content.\n",
3575-
"\n",
35763577
"In the following cell we query DICOM metadata to get information about the ROI type for the annotations in the `RMS-Mutation-Prediction-Expert-Annotations` collection."
3577-
],
3578-
"metadata": {
3579-
"id": "OyzS7N7SWdYb"
3580-
}
3578+
]
35813579
},
35823580
{
35833581
"cell_type": "code",
3582+
"execution_count": null,
3583+
"metadata": {
3584+
"id": "ZjehiSgxZEh0"
3585+
},
3586+
"outputs": [],
35843587
"source": [
3585-
"#@title Enter your Project ID\n",
3586-
"# initialize this variable with your Google Cloud Project ID!\n",
3587-
"my_ProjectID = \"idc-sandbox-000\" #@param {type:\"string\"}\n",
3588-
"\n",
3589-
"import os\n",
3588+
"#@title Enter your Google Cloud Project ID here\n",
3589+
"my_ProjectID = \"Please enter you project ID here\" #@param {type:\"string\"}\n",
35903590
"os.environ[\"GCP_PROJECT_ID\"] = my_ProjectID\n",
35913591
"\n",
35923592
"from google.colab import auth\n",
35933593
"auth.authenticate_user()"
3594-
],
3595-
"metadata": {
3596-
"id": "ZjehiSgxZEh0"
3597-
},
3598-
"execution_count": 1,
3599-
"outputs": []
3594+
]
36003595
},
36013596
{
36023597
"cell_type": "code",
3603-
"source": [
3604-
"from google.cloud import bigquery\n",
3605-
"\n",
3606-
"# BigQuery client is initialized with the ID of the project\n",
3607-
"# we specified in the beginning of the notebook!\n",
3608-
"bq_client = bigquery.Client(my_ProjectID)\n",
3609-
"\n",
3610-
"selection_query = \"\"\"\n",
3611-
"SELECT\n",
3612-
" PatientID,\n",
3613-
" StudyInstanceUID,\n",
3614-
" contentSequenceUnnested3.ConceptCodeSequence[SAFE_OFFSET(0)].CodeMeaning\n",
3615-
"FROM\n",
3616-
" `bigquery-public-data.idc_current.dicom_all` AS dicom_all\n",
3617-
"CROSS JOIN\n",
3618-
" UNNEST(ContentSequence) AS contentSequenceUnnested\n",
3619-
"CROSS JOIN\n",
3620-
" UNNEST(contentSequenceUnnested.ContentSequence) AS contentSequenceUnnested2\n",
3621-
"CROSS JOIN\n",
3622-
" UNNEST(contentSequenceUnnested2.ContentSequence) AS contentSequenceUnnested3\n",
3623-
"WHERE\n",
3624-
" dicom_all.analysis_result_id = \"RMS-Mutation-Prediction-Expert-Annotations\"\n",
3625-
" AND contentSequenceUnnested3.ConceptNameCodeSequence[SAFE_OFFSET(0)].CodeMeaning = \"Finding\"\n",
3626-
"\"\"\"\n",
3627-
"\n",
3628-
"selection_result = bq_client.query(selection_query)\n",
3629-
"selection_df = selection_result.result().to_dataframe()\n",
3630-
"\n",
3631-
"selection_df"
3632-
],
3598+
"execution_count": 3,
36333599
"metadata": {
36343600
"colab": {
36353601
"base_uri": "https://localhost:8080/",
@@ -3638,40 +3604,14 @@
36383604
"id": "FeMBDY1PY841",
36393605
"outputId": "831da40d-2592-4429-8379-ed5a9bb552a9"
36403606
},
3641-
"execution_count": 3,
36423607
"outputs": [
36433608
{
3644-
"output_type": "execute_result",
36453609
"data": {
3646-
"text/plain": [
3647-
" PatientID StudyInstanceUID \\\n",
3648-
"0 RMS2400 2.25.136698327400450893837131938791757812545 \n",
3649-
"1 RMS2400 2.25.136698327400450893837131938791757812545 \n",
3650-
"2 RMS2400 2.25.136698327400450893837131938791757812545 \n",
3651-
"3 RMS2270 2.25.124251988371010685434513158523091145860 \n",
3652-
"4 RMS2270 2.25.124251988371010685434513158523091145860 \n",
3653-
".. ... ... \n",
3654-
"675 RMS2423 2.25.56148336459229922868266898022297146711 \n",
3655-
"676 RMS2423 2.25.56148336459229922868266898022297146711 \n",
3656-
"677 RMS2406 2.25.5360555849781855019773810600059868899 \n",
3657-
"678 RMS2406 2.25.5360555849781855019773810600059868899 \n",
3658-
"679 RMS2406 2.25.5360555849781855019773810600059868899 \n",
3659-
"\n",
3660-
" CodeMeaning \n",
3661-
"0 Necrosis \n",
3662-
"1 Connective tissue \n",
3663-
"2 Embryonal rhabdomyosarcoma \n",
3664-
"3 Connective tissue \n",
3665-
"4 Connective tissue \n",
3666-
".. ... \n",
3667-
"675 Connective tissue \n",
3668-
"676 Connective tissue \n",
3669-
"677 Alveolar rhabdomyosarcoma \n",
3670-
"678 Alveolar rhabdomyosarcoma \n",
3671-
"679 Alveolar rhabdomyosarcoma \n",
3672-
"\n",
3673-
"[680 rows x 3 columns]"
3674-
],
3610+
"application/vnd.google.colaboratory.intrinsic+json": {
3611+
"summary": "{\n \"name\": \"selection_df\",\n \"rows\": 680,\n \"fields\": [\n {\n \"column\": \"PatientID\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 95,\n \"samples\": [\n \"RMS2388\",\n \"RMS2447\",\n \"RMS2451\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"StudyInstanceUID\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 95,\n \"samples\": [\n \"2.25.149028740594095659141785347110368591911\",\n \"2.25.169909530366026741902094648458907485460\",\n \"2.25.276769950763730969207822143036911312833\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"CodeMeaning\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Connective tissue\",\n \"Alveolar rhabdomyosarcoma\",\n \"Necrosis\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
3612+
"type": "dataframe",
3613+
"variable_name": "selection_df"
3614+
},
36753615
"text/html": [
36763616
"\n",
36773617
" <div id=\"df-e42b5fcf-e1f8-47be-a7cf-e467ae503632\" class=\"colab-df-container\">\n",
@@ -4034,15 +3974,69 @@
40343974
" </div>\n",
40353975
" </div>\n"
40363976
],
4037-
"application/vnd.google.colaboratory.intrinsic+json": {
4038-
"type": "dataframe",
4039-
"variable_name": "selection_df",
4040-
"summary": "{\n \"name\": \"selection_df\",\n \"rows\": 680,\n \"fields\": [\n {\n \"column\": \"PatientID\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 95,\n \"samples\": [\n \"RMS2388\",\n \"RMS2447\",\n \"RMS2451\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"StudyInstanceUID\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 95,\n \"samples\": [\n \"2.25.149028740594095659141785347110368591911\",\n \"2.25.169909530366026741902094648458907485460\",\n \"2.25.276769950763730969207822143036911312833\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"CodeMeaning\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Connective tissue\",\n \"Alveolar rhabdomyosarcoma\",\n \"Necrosis\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
4041-
}
3977+
"text/plain": [
3978+
" PatientID StudyInstanceUID \\\n",
3979+
"0 RMS2400 2.25.136698327400450893837131938791757812545 \n",
3980+
"1 RMS2400 2.25.136698327400450893837131938791757812545 \n",
3981+
"2 RMS2400 2.25.136698327400450893837131938791757812545 \n",
3982+
"3 RMS2270 2.25.124251988371010685434513158523091145860 \n",
3983+
"4 RMS2270 2.25.124251988371010685434513158523091145860 \n",
3984+
".. ... ... \n",
3985+
"675 RMS2423 2.25.56148336459229922868266898022297146711 \n",
3986+
"676 RMS2423 2.25.56148336459229922868266898022297146711 \n",
3987+
"677 RMS2406 2.25.5360555849781855019773810600059868899 \n",
3988+
"678 RMS2406 2.25.5360555849781855019773810600059868899 \n",
3989+
"679 RMS2406 2.25.5360555849781855019773810600059868899 \n",
3990+
"\n",
3991+
" CodeMeaning \n",
3992+
"0 Necrosis \n",
3993+
"1 Connective tissue \n",
3994+
"2 Embryonal rhabdomyosarcoma \n",
3995+
"3 Connective tissue \n",
3996+
"4 Connective tissue \n",
3997+
".. ... \n",
3998+
"675 Connective tissue \n",
3999+
"676 Connective tissue \n",
4000+
"677 Alveolar rhabdomyosarcoma \n",
4001+
"678 Alveolar rhabdomyosarcoma \n",
4002+
"679 Alveolar rhabdomyosarcoma \n",
4003+
"\n",
4004+
"[680 rows x 3 columns]"
4005+
]
40424006
},
4007+
"execution_count": 3,
40434008
"metadata": {},
4044-
"execution_count": 3
4009+
"output_type": "execute_result"
40454010
}
4011+
],
4012+
"source": [
4013+
"from google.cloud import bigquery\n",
4014+
"\n",
4015+
"# BigQuery client is initialized with the ID of the project we specified in the cell above!\n",
4016+
"bq_client = bigquery.Client(my_ProjectID)\n",
4017+
"\n",
4018+
"selection_query = \"\"\"\n",
4019+
"SELECT\n",
4020+
" PatientID,\n",
4021+
" StudyInstanceUID,\n",
4022+
" contentSequenceUnnested3.ConceptCodeSequence[SAFE_OFFSET(0)].CodeMeaning\n",
4023+
"FROM\n",
4024+
" `bigquery-public-data.idc_current.dicom_all` AS dicom_all\n",
4025+
"CROSS JOIN\n",
4026+
" UNNEST(ContentSequence) AS contentSequenceUnnested\n",
4027+
"CROSS JOIN\n",
4028+
" UNNEST(contentSequenceUnnested.ContentSequence) AS contentSequenceUnnested2\n",
4029+
"CROSS JOIN\n",
4030+
" UNNEST(contentSequenceUnnested2.ContentSequence) AS contentSequenceUnnested3\n",
4031+
"WHERE\n",
4032+
" dicom_all.analysis_result_id = \"RMS-Mutation-Prediction-Expert-Annotations\"\n",
4033+
" AND contentSequenceUnnested3.ConceptNameCodeSequence[SAFE_OFFSET(0)].CodeMeaning = \"Finding\"\n",
4034+
"\"\"\"\n",
4035+
"\n",
4036+
"selection_result = bq_client.query(selection_query)\n",
4037+
"selection_df = selection_result.result().to_dataframe()\n",
4038+
"\n",
4039+
"display(selection_df)"
40464040
]
40474041
},
40484042
{
@@ -4066,9 +4060,9 @@
40664060
],
40674061
"metadata": {
40684062
"colab": {
4063+
"include_colab_link": true,
40694064
"provenance": [],
4070-
"toc_visible": true,
4071-
"include_colab_link": true
4065+
"toc_visible": true
40724066
},
40734067
"kernelspec": {
40744068
"display_name": "Python 3",
@@ -4080,4 +4074,4 @@
40804074
},
40814075
"nbformat": 4,
40824076
"nbformat_minor": 0
4083-
}
4077+
}

0 commit comments

Comments
 (0)