|
148 | 148 | "In this query we work with the [`dicom_all` table](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=idc_current&t=dicom_all&page=table), which contains the DICOM metadata extracted from IDC images along with collection-level metadata that does not originate from DICOM." |
149 | 149 | ] |
150 | 150 | }, |
| 151 | + { |
| 152 | + "cell_type": "markdown", |
| 153 | + "metadata": {}, |
| 154 | + "source": [ |
| 155 | + "## Organization of IDC metadata in BigQuery tables\n", |
| 156 | + "\n", |
| 157 | + "Let's take a moment to look into the table used in the `FROM` clause of our query: [`bigquery-public-data.idc_current.dicom_all`](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=idc_current&t=dicom_all&page=table).\n", |
| 158 | + "\n", |
| 159 | + "This name is like an address that allows to locate the specific table in BigQuery. This \"address\" consists of three components: <project_id>.<dataset_id>.<table_id>\n", |
| 160 | + "\n", |
| 161 | + "1. `bigquery-public-data` is a public GCP _project_ that is maintained by Google Public Datasets Program. IDC-curated BigQuery tables with the metadata about IDC images is included in this project.\n", |
| 162 | + "2. `idc_current`is a _dataset_ within the `bigquery-public-data` project. Think of BigQuery datasets as containers that are used to organize and control access to the tables within the project.\n", |
| 163 | + "3. `dicom_all` is one of the tables within the `idc_current` dataset. As you spend more time learning about IDC, you will hopefully leverage other tables available in that dataset.\n", |
| 164 | + "\n", |
| 165 | + "If you now look back at the [BigQuery console](https://console.cloud.google.com/bigquery) and expand the list of datasets under the `bigquery-public-data` project, you will see that in addition to the `idc_current` dataset there are also datasets `idc_v12`, `idc_v11`, etc all the way to `idc_v1`. Those datasets correspond to the IDC data release versions, with `idc_current` being an alias for the latest (at the moment, v12) version of IDC data. \n", |
| 166 | + "\n", |
| 167 | + "We will not spend time discussing how IDC versioning works, but it is important to know that \n", |
| 168 | + "\n", |
| 169 | + "1. IDC data is versioned;\n", |
| 170 | + "2. queries against the `idc_current` dataset are equivalent to the queries against the latest version (currently, `idc_v12`) of IDC data;\n", |
| 171 | + "3. if you want the results of the queries to be persistent, write those against `idc_v*` datasets instead of `idc_current`." |
| 172 | + ] |
| 173 | + }, |
151 | 174 | { |
152 | 175 | "cell_type": "markdown", |
153 | 176 | "metadata": { |
|
345 | 368 | " bigquery-public-data.idc_current.dicom_all\n", |
346 | 369 | "WHERE\n", |
347 | 370 | " # write the selection criteria under this line!\n", |
| 371 | + " # Use AND operator to combine the filter values for the\n", |
| 372 | + " # Modality and tcia_tumorLocation to select collections that\n", |
| 373 | + " # include MR images for Lung cancer locations\n", |
348 | 374 | "\"\"\"\n", |
349 | 375 | "\n", |
350 | 376 | "selection_result = bq_client.query(selection_query)\n", |
|
615 | 641 | "* learned about BigQuery as the tool for searching IDC metadata\n", |
616 | 642 | "* are motivated to start experimenting with the SQL interface to select subsets of IDC data at different levels of data model (collection, patient, study, series)\n", |
617 | 643 | "\n", |
618 | | - "If you have any questions about this tutorial, or about searching IDC metadata, please send us an email to support@canceridc.dev or posting your question on [IDC User forum](https://discourse.cancer.dev)!" |
| 644 | + "If you have any questions about this tutorial, or about searching IDC metadata, please send us an email to support@canceridc.dev or posting your question on [IDC User forum](https://discourse.cancer.dev)!\n", |
| 645 | + "\n", |
| 646 | + "This tutorial barely scratches the surface of what you can do with BigQuery SQL. If you are interested in a comprehensive tutorial about BigQuery SQL, check out this [\"Intro to SQL\" course on Kaggle](https://www.kaggle.com/learn/intro-to-sql)!" |
619 | 647 | ] |
620 | 648 | }, |
621 | 649 | { |
|
0 commit comments