Merge pull request #334 from ImagingDataCommons/docs/remote_filesytems

CPBridge · web-flow · commit cbfb74c203d0 · 2025-03-04T11:19:19.000-05:00
Add docs section on working with remote filesystems
diff --git a/docs/general.rst b/docs/general.rst
@@ -14,3 +14,4 @@ parts of the library.
    pixel_transforms
    volume
    coding
+   remote
diff --git a/docs/image.rst b/docs/image.rst
@@ -241,6 +241,8 @@ matrix. In these cases, the first spatial dimension will always have shape 1.
 
 See :doc:`volume` for an overview of the :class:`highdicom.Volume` class.
 
+.. _lazy:
+
 Lazy Frame Retrieval
 --------------------
 
@@ -267,3 +269,16 @@ tiled image:
     tpm = im.get_total_pixel_matrix(row_end=20)
 
 Whether this saves time depends on your usage patterns and hardware.
+Furthermore in certain situations highdicom needs to parse the entire pixel
+data element in order to determine frame boundaries. This occurs when the
+frames are compressed using an encapsulated transfer syntax but there is no
+offset table giving the locations of frame boundaries within the file. An
+offset table can take the form of either a `basic offset table <BOT>`_ (BOT) at
+the start of the PixelData element or an `extended offset table <EOT>`_ (EOT)
+as a separate attribute in the metadata. These offset tables are not required,
+but often one of them is included in images. Without an offset table, the
+potential speed benefits of using lazy frame retrieval are usually eliminated,
+even if only a small number of frames are loaded.
+
+.. _BOT: https://dicom.nema.org/dicom/2013/output/chtml/part05/sect_A.4.html
+.. _EOT: http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.3.html#sect_C.7.6.3.1.8
diff --git a/docs/images/slide_screenshot.png b/docs/images/slide_screenshot.png
diff --git a/docs/remote.rst b/docs/remote.rst
@@ -0,0 +1,122 @@
+.. _remote:
+
+Reading from Remote Filesystems
+===============================
+
+Functions like `dcmread` from pydicom and :meth:`highdicom.imread`,
+:meth:`highdicom.seg.segread`, :meth:`highdicom.sr.srread`, and
+:meth:`highdicom.ann.annread` from highdicom can read from any object that
+exposes a "file-like" interface. Many alternative and remote filesystems have
+python clients that expose such an interface, and therefore can be read from
+directly.
+
+One such example is blobs on Google Cloud Storage buckets when accessed using
+the official Python SDK (installed through the `google-cloud-storage` PyPI
+package). This is particularly relevant since this is the storage mechanism
+underlying the `Imaging Data Commons <IDC>`_ (IDC), a large repository of
+public DICOM images.
+
+Coupling this with :ref:`"lazy" frame retrieval <lazy>` option is especially
+powerful, allowing frames to be retrieved from the remote filesystem only as
+and when they are needed. This is particularly useful for large multiframe
+files such as those found in slide microscopy or multi-segment binary
+or fractional segmentations.
+
+In this first example, we use lazy frame retrieval to load only a specific
+spatial patch from a large whole slide image from the IDC.
+
+.. code-block:: python
+
+  import numpy as np
+  import highdicom as hd
+
+  # Additional libraries (install these separately)
+  import matplotlib.pyplot as plt
+  from google.cloud import storage
+
+
+  # Create a storage client and use it to access the IDC's public data package
+  client = storage.Client()
+  bucket = client.bucket("idc-open-data")
+
+  # This is the path (within the above bucket) to a whole slide image from the
+  # IDC collection called "CCDI MCI"
+  blob = bucket.blob(
+      "763fe058-7d25-4ba7-9b29-fd3d6c41dc4b/210f0529-c767-4795-9acf-bad2f4877427.dcm"
+  )
+
+  # Read directly from the blob object using lazy frame retrieval
+  im = hd.imread(
+      blob.open(mode="rb"),
+      lazy_frame_retrieval=True
+  )
+
+  # Grab an arbitrary region of tile full pixel matrix
+  region = im.get_total_pixel_matrix(
+      row_start=15000,
+      row_end=15512,
+      column_start=17000,
+      column_end=17512,
+      dtype=np.uint8
+  )
+
+  # Show the region
+  plt.imshow(region)
+  plt.show()
+
+.. figure:: images/slide_screenshot.png
+   :width: 512px
+   :alt: Image of retrieved slide region
+   :align: center
+
+   Figure produced by the above code snippet showing an arbitrary spatial
+   region of a slide loaded directly from a Google Cloud bucket
+
+As a further example, we use lazy frame retrieval to load only a specific set
+of segments from a large multi-organ segmentation of a CT image in the IDC
+stored in binary format (meaning each segment is stored using a separate set of
+frames). See :ref:`seg` for more information on working with DICOM
+segmentations.
+
+.. code-block:: python
+
+  import highdicom as hd
+
+  # Additional libraries (install these separately)
+  from google.cloud import storage
+
+
+  # Create a storage client and use it to access the IDC's public data package
+  client = storage.Client()
+  bucket = client.bucket("idc-open-data")
+
+  # This is the path (within the above bucket) to a segmentation of a CT series
+  # from IDC collection called "CCDI MCI", containing a large number of
+  # different organs
+  blob = bucket.blob(
+    "3f38511f-fd09-4e2f-89ba-bc0845fe0005/c8ea3be0-15d7-4a04-842d-00b183f53b56.dcm"
+  )
+
+  # Open the blob with "segread" using the "lazy frame retrieval" option
+  seg = hd.seg.segread(
+    blob.open(mode="rb"),
+    lazy_frame_retrieval=True
+  )
+
+  # Find the segment number corresponding to the liver segment
+  selected_segment_numbers = seg.get_segment_numbers(segment_label="Liver")
+
+  # Read in the selected segments lazily
+  volume = seg.get_volume(
+      segment_numbers=selected_segment_numbers,
+      combine_segments=True,
+  )
+
+This works because running the ``.open("rb")`` method on a Blob object returns
+a `BlobReader <blob_reader>`_ object, which has a "file-like" interface
+(specifically the ``seek``, ``read``, and ``tell`` methods). If you can provide
+examples for reading from storage provided by other cloud providers, please
+consider contributing them to this documentation.
+
+.. _IDC: https://portal.imaging.datacommons.cancer.gov/
+.. _blob_reader: https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.fileio.BlobReader
diff --git a/docs/tid1500.rst b/docs/tid1500.rst
@@ -21,6 +21,7 @@ A diagram of the structure of TID1500 content is shown here:
 .. figure:: images/tid1500_overview.svg
    :scale: 100 %
    :alt: TID1500 diagram
+   :align: center
 
    Simplified diagram of the structure of the TID1500 template and major
    subtemplates. Note that this is intended to give a quick overview, please
diff --git a/src/highdicom/image.py b/src/highdicom/image.py
@@ -4813,6 +4813,8 @@ class Image(_Image):
     The class may not be instantiated directly, but should be created from an
     existing dataset.
 
+    See :doc:`image` for an introduction to using this class.
+
     """
 
     def __init__(self, *args, **kwargs):
diff --git a/src/highdicom/volume.py b/src/highdicom/volume.py
@@ -1807,10 +1807,13 @@ class VolumeGeometry(_VolumeBase):
 
     """Class encapsulating the geometry of a volume.
 
-    Unlike the similar :class:`highdicom.Volume`, items of this class do
-    not contain voxel data for the underlying volume, just a description of the
+    Unlike the similar :class:`highdicom.Volume`, items of this class do not
+    contain voxel data for the underlying volume, just a description of the
     geometry.
 
+    See :doc:`volume` for an introduction to using volumes and volume
+    geometries.
+
     """
 
     def __init__(
@@ -2253,6 +2256,9 @@ class Volume(_VolumeBase):
     further non-spatial dimensions, known as "channel" dimensions, whose
     meaning is explicitly specified.
 
+    See :doc:`volume` for an introduction to using volumes and volume
+    geometries.
+
     """
 
     def __init__(

-Original file line number
+Diff line change
    pixel_transforms
    volume
    coding
 +   remote