Skip to content

Commit f39600e

Browse files
committed
Minor doc fixes
1 parent 6dc83a0 commit f39600e

3 files changed

Lines changed: 64 additions & 58 deletions

File tree

docs/remote.rst

Lines changed: 27 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -126,42 +126,38 @@ consider contributing them to this documentation.
126126
Amazon Web Services S3
127127
----------------------
128128

129-
The `smart_open`_ package wraps an S3 client to expose a "file-like"
129+
The `s3fs`_ package wraps an S3 client to expose a "file-like"
130130
interface for accessing blobs. It can be installed with ``pip install
131-
'smart_open[s3]'``.
131+
s3fs``.
132132

133133
In order to be able to access open IDC data without providing AWS credentials,
134134
it is necessary to configure your own client object such that it does not
135135
require signing. This is demonstrated in the following example, which repeats
136136
the GCS from above using the counterpart of the same blob on AWS S3 (each DICOM
137137
file in the IDC is stored in two places, one on GSC and the other on S3). If
138138
you are accessing private files on S3, these steps will be different (consult
139-
the ``smart_open`` documentation for details).
139+
the ``s3fs`` documentation for details).
140140

141141
.. code-block:: python
142142
143-
import boto3
144-
from botocore import UNSIGNED
145-
from botocore.config import Config
146-
import smart_open
147-
148143
import numpy as np
149144
import highdicom as hd
150145
import matplotlib.pyplot as plt
146+
import s3fs
151147
152148
153149
# Configure a client to avoid the need for AWS credentials
154-
s3_client = boto3.client('s3', config=Config(signature_version=UNSIGNED))
150+
s3_client = s3fs.S3FileSystem(
151+
anon=True, # no credentials needed to access pubilc data
152+
default_block_size=500_000, # see note below
153+
use_ssl=False # disable encryption for a further speed boost
154+
)
155155
156156
# URL to a whole slide image from the IDC "CCDS MCI" collection on AWS S3
157157
url = 's3://idc-open-data/763fe058-7d25-4ba7-9b29-fd3d6c41dc4b/210f0529-c767-4795-9acf-bad2f4877427.dcm'
158158
159159
# Read the imge directly from the blob
160-
with smart_open.open(
161-
url,
162-
mode="rb",
163-
transport_params=dict(client=s3_client),
164-
) as reader:
160+
with s3_client.open(url, mode="rb") as reader:
165161
im = hd.imread(reader, lazy_frame_retrieval=True)
166162
167163
# Grab an arbitrary region of tile full pixel matrix
@@ -177,13 +173,24 @@ the ``smart_open`` documentation for details).
177173
plt.imshow(region)
178174
plt.show()
179175
180-
The ``smart_open`` package can also wrap many other filesystems in this way,
181-
including Microsoft Azure, Hadoop distributed filesystem (HDFS), gzipped local
182-
files, files over ssh/scp/sftp, and more. In all cases, be aware that the
183-
mechanics of the underlying retrieval, as well as configuration such as
184-
buffering and chunk size, can have a significant impact on the performance of
185-
lazy frame retrieval.
176+
It is important to tune the ``default_block_size`` parameter to optimize performance. Ideally this value (in bytes) should be large enough to match the size of the raw (probably compressed) data for individual frames of the images, ensuring that each can be retrieved in a single request. However, any larger and unnecessary data will be retrieved, reducing efficiency. The default block size is around 50MB, which is orders of magnitude too large for most images. Above we set it to approximately 500kB, which is probably a reasonable choice for many types of DICOM image.
177+
178+
The ``s3fs`` package is based on `fsspec`_, which provides abstractions over
179+
various file systems. There are a large number of other filesystems covered by
180+
either the `built-in`_ or `third-party`_ implementations (such as Azure,
181+
Hadoop, SFTP, HTTP, etc). The `smart_open`_ package also provides many similar
182+
wrappers for various filesystems, but is generally optimized for streaming use
183+
cases, not random-access use cases needed for this application.
184+
185+
In all cases, be aware that the mechanics of the underlying retrieval, as well
186+
as configuration such as buffering and chunk size, can have a significant
187+
impact on the performance of lazy frame retrieval.
188+
186189

187190
.. _IDC: https://portal.imaging.datacommons.cancer.gov/
188191
.. _BlobReader: https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.fileio.BlobReader
189192
.. _smart_open: https://github.com/piskvorky/smart_open
193+
.. _s3fs: https://s3fs.readthedocs.io/en/latest/
194+
.. _fsspec: https://filesystem-spec.readthedocs.io/en/latest/
195+
.. _built-in: https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations
196+
.. _third-party: https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations

docs/seg.rst

Lines changed: 36 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -569,7 +569,7 @@ Segmentations from Volumes
569569
In the simple cases we have seen so far, the geometry of the segmentation
570570
``pixel_array`` has matched that of the source images, i.e. there is a spatial
571571
correspondence between a given pixel in the ``pixel_array`` and the
572-
corresponding pixel in the relevant source frame. While this covers most use
572+
corresponding pixel in the relevant source frame. While this covers many use
573573
cases, DICOM SEGs actually allow for more general segmentations in which there
574574
is a more complicated geometrical relationship between the source frames and
575575
the segmentation masks. This could arise when a source image is resampled or
@@ -892,7 +892,7 @@ There are three possibilities here:
892892

893893
If you require the "Derivation Image Sequence" be populated and you are using a
894894
:class:`highdicom.Volume` as input to the constructor, follow the method in the
895-
previous section to match the geometry before passing to the constructor.
895+
`seg-from-volume`_ section to match the geometry before passing to the constructor.
896896

897897
Constructing SEG Images from a Total Pixel Matrix
898898
-------------------------------------------------
@@ -922,10 +922,9 @@ tile size of the source image will be used (regardless of whether the
922922
segmentation is represented at the same resolution as the source image).
923923

924924
If you need to specify the plane positions of the image explicitly, you should
925-
pass a single item to the ``plane_positions`` argument giving the location of
926-
the top left corner of the full total pixel matrix, or alternatively (and more
927-
conveniently) pass a :meth:`highdicom.Volume`. Otherwise, all the usual options
928-
are available to you.
925+
pass a :meth:`highdicom.Volume` or alternatively, pass single item to the
926+
``plane_positions`` argument giving the location of the top left corner of the
927+
full total pixel matrix. Otherwise, all the usual options are available to you.
929928

930929
.. code-block:: python
931930
@@ -1318,7 +1317,7 @@ attribute (5200, 9230) with the matching frame number, it is possible to
13181317
determine the meaning of a certain segmentation frame. We will not describe the
13191318
full details of this mechanism here.
13201319

1321-
Instead, `highdicom` provides a family of methods to help users reconstruct
1320+
Instead, `highdicom` provides a family of methods to help users access
13221321
segmentation masks from SEG objects in a predictable and more intuitive way. We
13231322
recommend using these methods over the basic ``.pixel_array`` in nearly all
13241323
circumstances.
@@ -1461,23 +1460,24 @@ whose descriptions meet certain criteria. For example:
14611460
14621461
.. _seg-get-pixels:
14631462

1464-
Reconstructing Segmentation Masks By Source Frame or Source Instance
1465-
--------------------------------------------------------------------
1466-
1467-
`Highdicom` provides the
1468-
:meth:`highdicom.seg.Segmentation.get_pixels_by_source_instance()` and
1469-
:meth:`highdicom.seg.Segmentation.get_pixels_by_source_frame()` methods to
1470-
handle reconstruction of segmentation masks from SEG objects in which each
1471-
frame in the SEG object is derived from one or more known source images or
1472-
image frames, as described within the "Derivation Image Sequence" (see
1473-
:ref:`derivation-sequence`). The only difference between the two methods is
1474-
that the :meth:`highdicom.seg.Segmentation.get_pixels_by_source_instance()` is
1475-
used when the segmentation is derived from a source series consisting of
1476-
multiple single-frame instances, while
1463+
Accessing Segmentation Masks By Source Frame or Source Instance
1464+
---------------------------------------------------------------
1465+
1466+
`Highdicom` provides various methods for accessing the pixels of a segmentation
1467+
image. The choice of method depends on how you would like the resulting array
1468+
to be arranged.
1469+
1470+
The first two methods are used to access frames of the segmentation according
1471+
to the frames/instances of the source image from which they are derived, as
1472+
described within the "Derivation Image Sequence" (see
1473+
:ref:`derivation-sequence`). The
1474+
:meth:`highdicom.seg.Segmentation.get_pixels_by_source_instance()` is used when
1475+
the segmentation is derived from a source series consisting of multiple
1476+
single-frame instances, while
14771477
:meth:`highdicom.seg.Segmentation.get_pixels_by_source_frame()` is used when
14781478
the segmentation is derived from a single multiframe source instance.
14791479

1480-
When reconstructing a segmentation mask using
1480+
When accessing a segmentation mask using
14811481
:meth:`highdicom.seg.Segmentation.get_pixels_by_source_instance()`, the user
14821482
must provide a list of SOP Instance UIDs of the source images for which the
14831483
segmentation mask should be constructed. Whatever order is chosen here will be
@@ -1524,7 +1524,7 @@ on GitHub.
15241524
assert pixels.shape == (2, 16, 16, 1)
15251525
assert np.unique(pixels).tolist() == [0, 1]
15261526
1527-
This second example demonstrates reconstructing segmentation masks from a
1527+
This second example demonstrates accessing segmentation masks from a
15281528
segmentation derived from a multiframe image, in this case a whole slide
15291529
microscopy image, and also demonstrates an example with multiple, in
15301530
this case 20, segments:
@@ -1555,18 +1555,17 @@ this case 20, segments:
15551555
# Each segment is still binary
15561556
assert np.unique(pixels).tolist() == [0, 1]
15571557
1558-
Note that these two methods may only be used when the segmentation's metadata
1559-
indicates that each segmentation frame is derived from exactly one source
1560-
instance or frame of a source instance. If this is not the case, a
1561-
``RuntimeError`` is raised.
1558+
Note that these two methods may only be used when the provided frame numbers or
1559+
instance UIDs match exactly one segmentation frame per segment in the
1560+
segmentation metadata If this is not the case, a ``RuntimeError`` is raised.
15621561

15631562
In the general case, the
15641563
:meth:`highdicom.seg.Segmentation.get_pixels_by_dimension_index_values()` method
15651564
is available to query directly by the underlying dimension index values. We
15661565
will not cover this advanced topic.
15671566

1568-
Reconstructing Specific Segments
1569-
--------------------------------
1567+
Accessing Specific Segments
1568+
---------------------------
15701569

15711570
A further optional parameter, ``segment_numbers``, allows the user to request
15721571
only a subset of the segments available within the SEG object by providing a
@@ -1603,8 +1602,8 @@ After this, the array ``pixels[:, :, :, 0]`` contains the pixels for segment
16031602
number 10, ``pixels[:, :, :, 1]`` contains the pixels for segment number 9, and
16041603
``pixels[:, :, :, 2]`` contains the pixels for segment number 8.
16051604

1606-
Reconstructing Segmentation Masks as "Label Maps"
1607-
-------------------------------------------------
1605+
Accessing Segmentation Masks as "Label Maps"
1606+
--------------------------------------------
16081607

16091608
If the segments do not overlap, it is possible to combine the multiple segments
16101609
into a simple "label map" style mask, as described above. This can be achieved
@@ -1696,11 +1695,11 @@ the ``relabel`` parameter.
16961695
# Now the output segments have been relabelled to 1, 2, 3
16971696
assert np.unique(pixels).tolist() == [0, 1, 2, 3]
16981697
1699-
Reconstructing Fractional Segmentations
1700-
---------------------------------------
1698+
Accessing Fractional Segmentations
1699+
----------------------------------
17011700

17021701
For ``"FRACTIONAL"`` SEG objects, `highdicom` will rescale the pixel values in
1703-
the segmentation masks from the integer values as which they are stored back
1702+
the segmentation masks from the integer values used to store them back
17041703
down to the range `0.0` to `1.0` as floating point values by scaling by the
17051704
"MaximumFractionalValue" attribute. If desired, this behavior can be disabled
17061705
by specifying ``rescale_fractional=False``, in which case the raw integer array
@@ -1730,8 +1729,8 @@ as stored in the SEG will be returned.
17301729
print(np.unique(pixels))
17311730
# [0. 0.2509804 0.5019608]
17321731
1733-
Reconstructing Volumes
1734-
----------------------
1732+
Accessing Volumes
1733+
-----------------
17351734

17361735
If the segmentation is defined on a regularly-sampled 3D grid (possibly with
17371736
omittted frames, tiled frames, and/or multiple segments), the
@@ -1759,8 +1758,8 @@ are also available here.
17591758
# [ 0. 0. 0. 1. ]]
17601759
17611760
1762-
Reconstructing Total Pixel Matrices from Tiled Segmentations
1763-
------------------------------------------------------------
1761+
Accessing Total Pixel Matrices from Tiled Segmentations
1762+
-------------------------------------------------------
17641763

17651764
For segmentations of digital pathology images that are stored as tiled images,
17661765
the :meth:`highdicom.seg.Segmentation.get_pixels_by_source_frame()` method will

docs/volume.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -721,7 +721,7 @@ Writing a volume to a NIfTI file:
721721
import highdicom as hd
722722
723723
724-
vol = Volume(...)
724+
vol = hd.Volume(...)
725725
726726
nifti = nib.Nifti1Image(
727727
vol.array,

0 commit comments

Comments
 (0)