Skip to content

Commit cbfb74c

Browse files
authored
Merge pull request #334 from ImagingDataCommons/docs/remote_filesytems
Add docs section on working with remote filesystems
2 parents 40aa4f1 + d9ff3d8 commit cbfb74c

7 files changed

Lines changed: 149 additions & 2 deletions

File tree

docs/general.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,4 @@ parts of the library.
1414
pixel_transforms
1515
volume
1616
coding
17+
remote

docs/image.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,8 @@ matrix. In these cases, the first spatial dimension will always have shape 1.
241241

242242
See :doc:`volume` for an overview of the :class:`highdicom.Volume` class.
243243

244+
.. _lazy:
245+
244246
Lazy Frame Retrieval
245247
--------------------
246248

@@ -267,3 +269,16 @@ tiled image:
267269
tpm = im.get_total_pixel_matrix(row_end=20)
268270
269271
Whether this saves time depends on your usage patterns and hardware.
272+
Furthermore in certain situations highdicom needs to parse the entire pixel
273+
data element in order to determine frame boundaries. This occurs when the
274+
frames are compressed using an encapsulated transfer syntax but there is no
275+
offset table giving the locations of frame boundaries within the file. An
276+
offset table can take the form of either a `basic offset table <BOT>`_ (BOT) at
277+
the start of the PixelData element or an `extended offset table <EOT>`_ (EOT)
278+
as a separate attribute in the metadata. These offset tables are not required,
279+
but often one of them is included in images. Without an offset table, the
280+
potential speed benefits of using lazy frame retrieval are usually eliminated,
281+
even if only a small number of frames are loaded.
282+
283+
.. _BOT: https://dicom.nema.org/dicom/2013/output/chtml/part05/sect_A.4.html
284+
.. _EOT: http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.3.html#sect_C.7.6.3.1.8

docs/images/slide_screenshot.png

1.16 MB
Loading

docs/remote.rst

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
.. _remote:
2+
3+
Reading from Remote Filesystems
4+
===============================
5+
6+
Functions like `dcmread` from pydicom and :meth:`highdicom.imread`,
7+
:meth:`highdicom.seg.segread`, :meth:`highdicom.sr.srread`, and
8+
:meth:`highdicom.ann.annread` from highdicom can read from any object that
9+
exposes a "file-like" interface. Many alternative and remote filesystems have
10+
python clients that expose such an interface, and therefore can be read from
11+
directly.
12+
13+
One such example is blobs on Google Cloud Storage buckets when accessed using
14+
the official Python SDK (installed through the `google-cloud-storage` PyPI
15+
package). This is particularly relevant since this is the storage mechanism
16+
underlying the `Imaging Data Commons <IDC>`_ (IDC), a large repository of
17+
public DICOM images.
18+
19+
Coupling this with :ref:`"lazy" frame retrieval <lazy>` option is especially
20+
powerful, allowing frames to be retrieved from the remote filesystem only as
21+
and when they are needed. This is particularly useful for large multiframe
22+
files such as those found in slide microscopy or multi-segment binary
23+
or fractional segmentations.
24+
25+
In this first example, we use lazy frame retrieval to load only a specific
26+
spatial patch from a large whole slide image from the IDC.
27+
28+
.. code-block:: python
29+
30+
import numpy as np
31+
import highdicom as hd
32+
33+
# Additional libraries (install these separately)
34+
import matplotlib.pyplot as plt
35+
from google.cloud import storage
36+
37+
38+
# Create a storage client and use it to access the IDC's public data package
39+
client = storage.Client()
40+
bucket = client.bucket("idc-open-data")
41+
42+
# This is the path (within the above bucket) to a whole slide image from the
43+
# IDC collection called "CCDI MCI"
44+
blob = bucket.blob(
45+
"763fe058-7d25-4ba7-9b29-fd3d6c41dc4b/210f0529-c767-4795-9acf-bad2f4877427.dcm"
46+
)
47+
48+
# Read directly from the blob object using lazy frame retrieval
49+
im = hd.imread(
50+
blob.open(mode="rb"),
51+
lazy_frame_retrieval=True
52+
)
53+
54+
# Grab an arbitrary region of tile full pixel matrix
55+
region = im.get_total_pixel_matrix(
56+
row_start=15000,
57+
row_end=15512,
58+
column_start=17000,
59+
column_end=17512,
60+
dtype=np.uint8
61+
)
62+
63+
# Show the region
64+
plt.imshow(region)
65+
plt.show()
66+
67+
.. figure:: images/slide_screenshot.png
68+
:width: 512px
69+
:alt: Image of retrieved slide region
70+
:align: center
71+
72+
Figure produced by the above code snippet showing an arbitrary spatial
73+
region of a slide loaded directly from a Google Cloud bucket
74+
75+
As a further example, we use lazy frame retrieval to load only a specific set
76+
of segments from a large multi-organ segmentation of a CT image in the IDC
77+
stored in binary format (meaning each segment is stored using a separate set of
78+
frames). See :ref:`seg` for more information on working with DICOM
79+
segmentations.
80+
81+
.. code-block:: python
82+
83+
import highdicom as hd
84+
85+
# Additional libraries (install these separately)
86+
from google.cloud import storage
87+
88+
89+
# Create a storage client and use it to access the IDC's public data package
90+
client = storage.Client()
91+
bucket = client.bucket("idc-open-data")
92+
93+
# This is the path (within the above bucket) to a segmentation of a CT series
94+
# from IDC collection called "CCDI MCI", containing a large number of
95+
# different organs
96+
blob = bucket.blob(
97+
"3f38511f-fd09-4e2f-89ba-bc0845fe0005/c8ea3be0-15d7-4a04-842d-00b183f53b56.dcm"
98+
)
99+
100+
# Open the blob with "segread" using the "lazy frame retrieval" option
101+
seg = hd.seg.segread(
102+
blob.open(mode="rb"),
103+
lazy_frame_retrieval=True
104+
)
105+
106+
# Find the segment number corresponding to the liver segment
107+
selected_segment_numbers = seg.get_segment_numbers(segment_label="Liver")
108+
109+
# Read in the selected segments lazily
110+
volume = seg.get_volume(
111+
segment_numbers=selected_segment_numbers,
112+
combine_segments=True,
113+
)
114+
115+
This works because running the ``.open("rb")`` method on a Blob object returns
116+
a `BlobReader <blob_reader>`_ object, which has a "file-like" interface
117+
(specifically the ``seek``, ``read``, and ``tell`` methods). If you can provide
118+
examples for reading from storage provided by other cloud providers, please
119+
consider contributing them to this documentation.
120+
121+
.. _IDC: https://portal.imaging.datacommons.cancer.gov/
122+
.. _blob_reader: https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.fileio.BlobReader

docs/tid1500.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ A diagram of the structure of TID1500 content is shown here:
2121
.. figure:: images/tid1500_overview.svg
2222
:scale: 100 %
2323
:alt: TID1500 diagram
24+
:align: center
2425

2526
Simplified diagram of the structure of the TID1500 template and major
2627
subtemplates. Note that this is intended to give a quick overview, please

src/highdicom/image.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4813,6 +4813,8 @@ class Image(_Image):
48134813
The class may not be instantiated directly, but should be created from an
48144814
existing dataset.
48154815
4816+
See :doc:`image` for an introduction to using this class.
4817+
48164818
"""
48174819

48184820
def __init__(self, *args, **kwargs):

src/highdicom/volume.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1807,10 +1807,13 @@ class VolumeGeometry(_VolumeBase):
18071807

18081808
"""Class encapsulating the geometry of a volume.
18091809
1810-
Unlike the similar :class:`highdicom.Volume`, items of this class do
1811-
not contain voxel data for the underlying volume, just a description of the
1810+
Unlike the similar :class:`highdicom.Volume`, items of this class do not
1811+
contain voxel data for the underlying volume, just a description of the
18121812
geometry.
18131813
1814+
See :doc:`volume` for an introduction to using volumes and volume
1815+
geometries.
1816+
18141817
"""
18151818

18161819
def __init__(
@@ -2253,6 +2256,9 @@ class Volume(_VolumeBase):
22532256
further non-spatial dimensions, known as "channel" dimensions, whose
22542257
meaning is explicitly specified.
22552258
2259+
See :doc:`volume` for an introduction to using volumes and volume
2260+
geometries.
2261+
22562262
"""
22572263

22582264
def __init__(

0 commit comments

Comments
 (0)