Skip to content

[3/3] cp_measure features + coordinate alignment for calculate_image_features#1216

Open
timtreis wants to merge 12 commits into
scverse:mainfrom
timtreis:feature/image-features-alignment-cpmeasure
Open

[3/3] cp_measure features + coordinate alignment for calculate_image_features#1216
timtreis wants to merge 12 commits into
scverse:mainfrom
timtreis:feature/image-features-alignment-cpmeasure

Conversation

@timtreis

@timtreis timtreis commented Jun 15, 2026

Copy link
Copy Markdown
Member

Final piece of #982 (split from #1189, #1190). Adds cp_measure features and image/labels coordinate alignment.

New deps (hard)

cp-measure>=0.1.19,<0.2, centrosome>=1.2.3.

Usage

sq.experimental.im.calculate_image_features(
    sdata, image_key="img", labels_key="cells",
    features=["cpmeasure:sizeshape", "skimage:morphology", "squidpy:summary"],
)
  • features=None -> all features, all backends (needs an image).
  • align_mode="strict" (default) raises on non-pixel-aligned image/labels; "rasterize" resamples onto the image grid.

Notes

  • Boundary cells outside the image/labels overlap are dropped + reported.
  • Multiscale labels: alignment works for identity/integer-translation; rasterize raises (pre-align instead).
  • cp_measure columns keep raw CellProfiler names.

@timtreis timtreis force-pushed the feature/image-features-alignment-cpmeasure branch 3 times, most recently from 7bc0f28 to ef48286 Compare June 15, 2026 19:14
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.79811% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.84%. Comparing base (55572c4) to head (8a8b8a3).

Files with missing lines Patch % Lines
...uidpy/experimental/im/_calculate_image_features.py 91.07% 10 Missing and 9 partials ⚠️
src/squidpy/experimental/im/_tiling.py 94.05% 2 Missing and 4 partials ⚠️
src/squidpy/gr/_ppatterns.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1216      +/-   ##
==========================================
+ Coverage   76.55%   76.84%   +0.29%     
==========================================
  Files          63       63              
  Lines        9067     9186     +119     
  Branches     1521     1532      +11     
==========================================
+ Hits         6941     7059     +118     
+ Misses       1541     1540       -1     
- Partials      585      587       +2     
Files with missing lines Coverage Δ
src/squidpy/experimental/tl/_tiling_qc.py 71.28% <100.00%> (-0.06%) ⬇️
src/squidpy/gr/_ppatterns.py 80.43% <0.00%> (ø)
src/squidpy/experimental/im/_tiling.py 88.58% <94.05%> (+2.52%) ⬆️
...uidpy/experimental/im/_calculate_image_features.py 89.88% <91.07%> (+1.28%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timtreis timtreis force-pushed the feature/image-features-alignment-cpmeasure branch 9 times, most recently from 25df4eb to 0bc0c1f Compare June 15, 2026 20:13
Final piece of scverse#982 (split from scverse#1189, scverse#1190): wires cp_measure into
calculate_image_features and adds image/labels coordinate-system alignment
(align_mode strict/rasterize), with boundary-cell drop accounting.

- deps: cp-measure>=0.1.19,<0.2, centrosome>=1.2.3 (hard)
- features=None now runs all features across all backends
- _tiling: bbox_y0/bbox_x0 on CellInfo; _Accum accumulator; shared yx_size
- key_added (was adata_key_added); keyword-only params
- skimage backend: single regionprops_table pass for morphology+intensity
- dedupe skimage morphology vs cp_measure:sizeshape when both requested
  (cp wins, computed identically; skimage-only props kept)
- purge non-ASCII em/en-dashes from text files repo-wide (ASCII-only policy)
@timtreis timtreis force-pushed the feature/image-features-alignment-cpmeasure branch from 0bc0c1f to 8bed3e7 Compare June 15, 2026 20:35
timtreis added 10 commits June 16, 2026 11:38
cp_measure and skimage featurize per cell on tiny arrays, where NumPy's
default multi-threaded OpenBLAS/OpenMP pools (sized to all cores) add pure
thread-dispatch overhead with nothing to amortize it, and oversubscribe
under parallelism. Wrapping the per-tile featurize body in
threadpool_limits(1) cuts the serial cp:texture path ~2.6x end-to-end on
an 8-core machine and removes the n_jobs>1 oversubscription slowdown.

The clamp lives inside the per-tile function so it also applies in worker
processes (a driver-side limit would not reach them), which the upcoming
distributed execution engine relies on. Output is bit-identical.
The tiled featurization parallelism uses distributed.LocalCluster (the
local "processes" scheduler does not fork for this graph) and clamps
worker BLAS/OpenMP via threadpoolctl. Both were only present transitively;
declare them. The dask[distributed] extra co-pins distributed to the
resolved dask version (they release lockstep).
Extract the per-tile dispatch shared by the tiled featurizers into
_run_tiled(specs, process_fn, *, n_jobs, kind, scatter): active-Client-first
dispatch, n_jobs->worker resolution, serial fast path (no scheduler when one
worker or one tile), streaming collection via as_completed with progress, and
safe LocalCluster teardown.

The execution kind is parametrized because the callers differ: tiling QC's
per-tile work is numba `nogil` (threads scale), while image featurization is
GIL-bound (needs processes). Migrate calculate_tiling_qc onto the engine with
kind="threads" - behavior-preserving (its tests are unchanged). The duplicated
_has_distributed_client probe is lifted into the shared module.
Replace the joblib threads loop in calculate_image_features with _run_tiled
(kind="processes"). cp_measure is GIL-bound, so real parallelism needs worker
processes via a distributed LocalCluster (an active Client is used if present);
the local multiprocessing scheduler does not fork for this graph. n_jobs=1 and
single-tile inputs stay serial in-process with zero cluster overhead.

The image and labels arrays are passed as scattered arguments (broadcast once)
rather than captured, so the backing graph is not re-embedded in every task.
Per-tile crops are now materialized with an explicit synchronous scheduler
(_materialize): a bare .values inside a distributed worker routes the inner
compute back to the scheduler and deadlocks. Drop accounting is reduced from the
returned results. Results are bit-identical to serial (verified on zarr-backed
skimage and cp_measure runs). Remove the dead joblib/tqdm/time imports.
The n_jobs equivalence tests used the default tile_size, which yields a single
tile and never leaves the serial path. Use a small tile_size so n_jobs>1 forks a
real LocalCluster, and add TestParallelEngine: an active-Client run over
zarr-backed data (scatter of a zarr graph + cp_measure in worker processes) and
the align_mode="rasterize" path under a LocalCluster, each asserted bit-identical
to serial. These lock the picklability of the scattered closures on the real
zarr and rasterize paths, not just synthetic in-memory data.
Expand the n_jobs docstring (active Client else LocalCluster; n_jobs=1 serial;
the macOS/Windows __main__ guard requirement and the reuse-your-own-Client tip
for scale-out) and add a changelog entry noting the new distributed/threadpoolctl
dependencies.
- Don't warn "n_jobs ignored" on the default path: the active-Client branch
  warned for any n_jobs not in (None, -1), but featurization defaults to 1, so
  every default call under a user's Client emitted a spurious warning. Exempt 1
  (and -1, None) so only an explicit worker count triggers it.
- Release each tile future as its result is gathered in _run_on_client, so
  worker-side results are freed incrementally instead of pinned for the whole
  batch (the docstring already claimed this; now it is true).
- Fix a stale calculate_tiling_qc docstring that referenced dask.compute (the
  active-Client path now submits via as_completed).
Surfaced by an A-Z run on a real MIBI-TOF FOV:

- Warn when per-channel features are requested on an image whose channels are
  positional integers (the default for unlabeled data), pointing users to set
  c_coords; per-channel columns would otherwise be index-named.
- Join the channel onto skimage/squidpy feature names with a double underscore
  (intensity_mean__dsDNA, summary_mean__dsDNA, texture_contrast__dsDNA) so the
  marker is unambiguously separable, matching cp_measure's __ convention.
  cp_measure names are left unchanged.
- Rename squidpy:color_hist -> squidpy:histogram (the classic sq.im name);
  columns become histogram_bin{n}__{channel}.
- Show a dask ProgressBar on the serial path too (run via the synchronous
  scheduler), not just the threaded path.
- obs_names are now the label-image cell IDs (str) instead of cell_<id>.

Feature values are unchanged; this only affects column/obs names, a new warning,
and progress output.
- _run_tiled local path: bind the scatter arrays into the task closure instead
  of passing them as dask.delayed positional args. A dask collection given to
  delayed is materialized in full before the call, so the previous form read the
  whole image for every tile (verified: 100-block image x N tiles) - defeating
  the lazy per-tile crop. Closing over keeps each task reading only its crop.
- Unnamed-channel warning now fires only for the exact positional default
  ([0, 1, ...]) instead of any all-numeric names, so real numeric panels
  (e.g. wavelengths "405","488") no longer warn spuriously.
- obs_names are now derived from label_id on both the labels and shapes paths
  (and in calculate_tiling_qc), so obs_names == label_id consistently across
  the sibling featurizers.
Behaviour-preserving cleanups from review:
- Derive _CP_PER_CHANNEL from _CP_CORRELATION_KEYS instead of re-listing the
  four correlation names (removes a drift hazard).
- Inline the single-use _CLIENT_OVERRIDES_NJOBS string.
- Trim over-long docstrings/comments (_run_tiled, _materialize, _process_one,
  the unnamed-channel note) down to intent.
@selmanozleyen selmanozleyen self-requested a review June 16, 2026 14:19
`distributed` is not in the intersphinx mapping, so the :class:/{class}
`dask.distributed.LocalCluster` cross-references failed under nitpicky + -W.
Render the dask/distributed class names as inline literals (matching how the
Client is already written), and point the calculate_tiling_qc reference at its
documented API entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant