|
| 1 | +# NCI Imaging Data Commons: Getting started tutorial series |
| 2 | + |
| 3 | +## Background |
| 4 | + |
| 5 | +**What is Imaging Data Commons (IDC)?** |
| 6 | + |
| 7 | + |
| 8 | + |
| 9 | +* [NCI Imaging Data Commons (IDC)](https://datacommons.cancer.gov/repository/imaging-data-commons) is a cloud-based repository of publicly available cancer imaging data co-located with the analysis and exploration tools and resources |
| 10 | +* IDC is a node within the broader NCI Cancer Research Data Commons (CRDC) infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data |
| 11 | +* IDC provides unmatched capabilities in search, visualization and subsetting of public cancer imaging, image-derived and image-related (i.e., clinical) data (45 TB and counting!): radiology and digital pathology, clinical and preclinical, images and segmentations |
| 12 | +* Our mission is to make it easier for you to discover relevant cancer imaging data and analyze it using state of the art tools on the cloud |
| 13 | + |
| 14 | +**Programmatic access to IDC content** |
| 15 | + |
| 16 | +[IDC Portal](https://imaging.datacommons.cancer.gov/) is the interactive interface that allows exploring data available in IDC using a small subset of metadata attributes accompanying IDC data, visualize radiology and microscopy images and annotations, save cohorts (subsets of data) under user account based on the available metadata filters. |
| 17 | + |
| 18 | +The real power of IDC comes, however, from programmatic interfaces available to work with IDC data. Most of the capabilities available through those interfaces and APIs are not available in the portal. As few examples |
| 19 | +* download of the image files available in IDC (portal allows you to download a manifest, but download of files referenced from the manifest is currently not easily doable) |
| 20 | +* flexible definition of the selection filters at the level of series, studies or patients (portal exposes a very tiny set of metadata attributes, does not have the flexibility in defining filters that is needed for analysis tasks) |
| 21 | +* portal does not expose clinical metadata available for imaging collections |
| 22 | + |
| 23 | +## Learning objectives |
| 24 | + |
| 25 | +In this tutorial you will: |
| 26 | +* set up the (very minimal!) prerequisites to start your explorations of programmatic interfaces to IDC data |
| 27 | +* understand the basics of IDC metadata organization |
| 28 | +* learn how to use SQL interface to query IDC data to define subsets relevant to specific tasks |
| 29 | +* practice how to visualize images from your cohort, download files and learn about licenses covering the data in your cohort |
| 30 | +* identify pointers to futher learning materials that demonstate how you can develop reproducible analysis workflows using Colab applied to IDC data |
| 31 | + |
| 32 | +## Components of the tutorial |
| 33 | + |
| 34 | +This tutorial will leverage learning materials available in the [IDC notebooks repository](https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks): |
| 35 | + |
| 36 | +1. [Setting up the prerequisites](https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/getting_started/part1_prerequisites.ipynb) |
| 37 | +2. [Basics of searching IDC data](https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/getting_started/part2_searching_basics.ipynb) |
| 38 | +3. [Working with the cohorts: visualization, download, licenses](https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/getting_started/part3_exploring_cohorts.ipynb) |
| 39 | + |
| 40 | +## Notebooks to explore on your own |
| 41 | + |
| 42 | +* [IDC Segmentation primer: using nnU-Net for segmenting abdominal organs in chest CT](https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/IDC_segmentation_primer.ipynb) |
| 43 | +* [Introduction to exploring clinical data in IDC](https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/clinical_data_intro.ipynb) |
| 44 | + |
| 45 | +A growing number of AI models available as Colab notebooks accompanied by demonstrations of applying them to data in IDC is available in the [ModelHub.AI](http://app.modelhub.ai/) platform. |
| 46 | + |
| 47 | +## Support and additional resources |
| 48 | + |
| 49 | +* You can contact IDC support by sending email to support@canceridc.dev or posting your question on [IDC User forum](https://discourse.canceridc.dev). |
| 50 | +* Please drop by IDC Office Hours to ask any questions about IDC: every Tuesday 16:30 – 17:30 (New York) and Wednesday 10:30-11:30 (New York) via Google Meet at [https://meet.google.com/xyt-vody-tvb ](https://imaging.datacommons.cancer.gov/). |
| 51 | +* Free cloud credits are available for those who want to explore features of Google Cloud not included in the free tier (e.g., Cloud Compute Engine, Vertex AI, using Healthcare API for your data): apply [here](https://docs.google.com/forms/d/e/1FAIpQLSfXvXqficGaVEalJI3ym6rKqarmW_YUUWG6A4U8pclvR8MmRQ/viewform) |
| 52 | + |
| 53 | +## Acknowledgments |
| 54 | + |
| 55 | +Imaging Data Commons has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l. |
| 56 | + |
| 57 | +If you use IDC in your research, please cite the following publication: |
| 58 | + |
| 59 | +> Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S., Aerts, H. J. W. L., Homeyer, A., Lewis, R., Akbarzadeh, A., Bontempi, D., Clifford, W., Herrmann, M. D., Höfener, H., Octaviano, I., Osborne, C., Paquette, S., Petts, J., Punzo, D., Reyes, M., Schacherer, D. P., Tian, M., White, G., Ziegler, E., Shmulevich, I., Pihl, T., Wagner, U., Farahani, K. & Kikinis, R. NCI Imaging Data Commons. Cancer Res. 81, 4188–4193 (2021). http://dx.doi.org/10.1158/0008-5472.CAN-21-0950 |
| 60 | +
|
0 commit comments