Skip to content

Commit 65cb4ca

Browse files
authored
Merge pull request #14 from ImagingDataCommons/reorg-intro
reorganization of the introduction tutorial series
2 parents a4ea7df + 8286ccb commit 65cb4ca

3 files changed

Lines changed: 2574 additions & 0 deletions

File tree

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"colab_type": "text",
7+
"id": "view-in-github"
8+
},
9+
"source": [
10+
"<a href=\"https://colab.research.google.com/github/ImagingDataCommons/IDC-Examples/blob/master/notebooks/getting_started/part1_prerequisites.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"metadata": {
16+
"id": "KmXfYFZtja2F"
17+
},
18+
"source": [
19+
"# Getting started with IDC - Part 1: Setting up the prerequisites\n",
20+
"\n",
21+
"## Summary\n",
22+
"\n",
23+
"This notebook is part of [the \"Getting started with IDC\" series](https://github.com/ImagingDataCommons/IDC-Examples/tree/master/notebooks/getting_started) introducing NCI Imaging Data Commons to the users who want to interact with IDC programmatically.\n",
24+
"\n",
25+
"In this notebook you will learn how to set up your google account to be able to use Google Cloud Platform (GCP), which is the location of IDC data.\n",
26+
"\n",
27+
"Initial version: Nov 2022\n",
28+
"\n",
29+
"Updated: "
30+
]
31+
},
32+
{
33+
"cell_type": "markdown",
34+
"metadata": {},
35+
"source": [
36+
"## What is IDC?\n",
37+
"\n",
38+
"[NCI Imaging Data Commons (IDC)](https://datacommons.cancer.gov/repository/imaging-data-commons) is a cloud-based repository of publicly available cancer imaging data co-located with the analysis and exploration tools and resources. IDC is a node within the broader NCI Cancer Research Data Commons (CRDC) infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data."
39+
]
40+
},
41+
{
42+
"cell_type": "markdown",
43+
"metadata": {},
44+
"source": [
45+
"## Why Google Cloud Platform (GCP)?\n",
46+
"\n",
47+
"GCP is a cloud-based environment that provides access to a suite of tools and services that include compute, storage and database resources, to name a few.\n",
48+
"\n",
49+
"IDC is built upon the services provided by GCP, and stores its data within GCP. In order to fully leverage capabilities of IDC, you will need to activate GCP for your google account."
50+
]
51+
},
52+
{
53+
"cell_type": "markdown",
54+
"metadata": {},
55+
"source": [
56+
"## GCP is not a free service - do I need to pay to use IDC?\n",
57+
"\n",
58+
"**NO!**\n",
59+
"\n",
60+
"None of the activities in this tutorial series will require you to pay for use of any GCP services, to have cloud credits, or even to connect your credit card to your account.\n",
61+
"\n",
62+
"Egress of IDC data out of the cloud is free. While query of the data is not free, GCP [BigQuery free tier](https://cloud.google.com/bigquery/pricing#free-tier) includes 1 TB of query data per month, which will be sufficient to do a lot of queries of IDC data."
63+
]
64+
},
65+
{
66+
"cell_type": "markdown",
67+
"metadata": {},
68+
"source": [
69+
"## What do I need to get started?\n",
70+
"\n",
71+
"All you need is a Google account (identity) and a web browser. If you don't have a Google account, you can learn how to get one [here](https://accounts.google.com/signup/v2/webcreateaccount?dsh=308321458437252901&continue=https%3A%2F%2Faccounts.google.com%2FManageAccount&flowName=GlifWebSignIn&flowEntry=SignUp#FirstName=&LastName=). Note that you do NOT need a Gmail email account - [you can use your non-Gmail email address to create one instead](https://support.google.com/accounts/answer/27441?hl=en#existingemail).\n",
72+
"\n",
73+
"<font color='red'>**WARNING**</font>: if you have a Google account that was provided by your organization, it may not be suitable for this tutorial due to the restrictions imposed by your organization. "
74+
]
75+
},
76+
{
77+
"cell_type": "markdown",
78+
"metadata": {},
79+
"source": [
80+
"# Let's do it!\n",
81+
"\n",
82+
"1. Go to https://console.cloud.google.com/, and accept Terms and conditions.\n",
83+
"\n",
84+
"![agree](https://www.dropbox.com/s/d570wqaqt72zzaz/agreed.png?raw=1)\n",
85+
"\n",
86+
"2. In the upper left corner of the GCP console click \"Select a project\"\n",
87+
"\n",
88+
"![select](https://www.dropbox.com/s/hzty1pgfq6ll7hy/select.png?raw=1)\n",
89+
"\n",
90+
"3. In the project selector click \"Create new project\". If you already have a project, you may be able to reuse it for this tutorial.\n",
91+
"\n",
92+
"![new](https://www.dropbox.com/s/ybhdloqsjnffdb1/new.png?raw=1)\n",
93+
"\n",
94+
"4. Open the GCP console menu by clicking the ☰ menu icon in the upper left corner, and select \"Dashboard\". You will see information about your project, including your Project ID. Insert that project ID in the cell below in place of REPLACE_ME_WITH_YOUR_PROJECT_ID. The cell below will also prompt you to give Colab permissions to act on your behalf."
95+
]
96+
},
97+
{
98+
"cell_type": "code",
99+
"execution_count": null,
100+
"metadata": {
101+
"vscode": {
102+
"languageId": "plaintext"
103+
}
104+
},
105+
"outputs": [],
106+
"source": [
107+
"# initialize this variable with your Google Cloud Project ID!\n",
108+
"my_ProjectID = \"REPLACE_ME_WITH_YOUR_PROJECT_ID\"\n",
109+
"\n",
110+
"import os\n",
111+
"os.environ[\"GCP_PROJECT_ID\"] = my_ProjectID\n",
112+
"\n",
113+
"from google.colab import auth\n",
114+
"auth.authenticate_user()"
115+
]
116+
},
117+
{
118+
"cell_type": "markdown",
119+
"metadata": {},
120+
"source": [
121+
"Finally, let's run a query to confirm that the setup is working for your account."
122+
]
123+
},
124+
{
125+
"cell_type": "code",
126+
"execution_count": null,
127+
"metadata": {
128+
"vscode": {
129+
"languageId": "plaintext"
130+
}
131+
},
132+
"outputs": [],
133+
"source": [
134+
"%%bigquery --project=$my_ProjectID\n",
135+
"\n",
136+
"SELECT COUNT(DISTINCT(collection_id)) as collections_cnt\n",
137+
"FROM bigquery-public-data.idc_current.dicom_all"
138+
]
139+
},
140+
{
141+
"cell_type": "markdown",
142+
"metadata": {},
143+
"source": [
144+
"If the cell above completed without errors, you completed the prerequisites and can proceed to the next tutorial in the series, keeping the project ID handy - you will need it."
145+
]
146+
},
147+
{
148+
"cell_type": "markdown",
149+
"metadata": {},
150+
"source": [
151+
"## Support\n",
152+
"\n",
153+
"You can contact IDC support by sending email to support@canceridc.dev or posting your question on [IDC User forum](https://discourse.canceridc.dev)."
154+
]
155+
},
156+
{
157+
"cell_type": "markdown",
158+
"metadata": {},
159+
"source": [
160+
"## Acknowledgments\n",
161+
"\n",
162+
"Imaging Data Commons has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.\n",
163+
"\n",
164+
"If you use IDC in your research, please cite the following publication:\n",
165+
"\n",
166+
"> Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S., Aerts, H. J. W. L., Homeyer, A., Lewis, R., Akbarzadeh, A., Bontempi, D., Clifford, W., Herrmann, M. D., Höfener, H., Octaviano, I., Osborne, C., Paquette, S., Petts, J., Punzo, D., Reyes, M., Schacherer, D. P., Tian, M., White, G., Ziegler, E., Shmulevich, I., Pihl, T., Wagner, U., Farahani, K. & Kikinis, R. NCI Imaging Data Commons. Cancer Res. 81, 4188–4193 (2021). http://dx.doi.org/10.1158/0008-5472.CAN-21-0950"
167+
]
168+
}
169+
],
170+
"metadata": {
171+
"colab": {
172+
"include_colab_link": true,
173+
"provenance": []
174+
},
175+
"gpuClass": "standard",
176+
"kernelspec": {
177+
"display_name": "Python 3",
178+
"name": "python3"
179+
},
180+
"language_info": {
181+
"name": "python"
182+
},
183+
"orig_nbformat": 4
184+
},
185+
"nbformat": 4,
186+
"nbformat_minor": 0
187+
}

0 commit comments

Comments
 (0)