Fable review#54
Merged
Merged
Conversation
B2: validate node ids at the UnionFind binding boundary. UnionFind::find/ merge/merge_to index their parent vectors without a bounds check (intentional for the hot path), but the scalar and array bindings passed user-supplied ids straight through, so an out-of-range id was undefined behavior / a segfault from Python. Add check_node() and route the scalar find/merge/merge_to binds and the merge_edges/find_nodes helpers through it, throwing std::invalid_argument before releasing the GIL. B1: route the mutex-watershed and semantic-mutex-watershed neighbor computation through detail::valid_offset_target instead of an unchecked flat-index shift. This bounds-checks the neighbor per axis, so the kernel is memory-safe and rejects row/plane wrap-around independently of the caller's valid_edges mask. Drops the now-unused offset_strides precompute. Add regression tests for the new UnionFind validation and for the existing binding-layer point-coordinate validation in non_maximum_distance_suppression. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
B3: argsort_by_first broke ties on the primary key nondeterministically (std::sort is unstable). Add a secondary tie-break by original index so the ordering is a total order. This makes the label-multiset downsample argmax and restrict_set truncation deterministic on count ties (smaller id wins). B4: GreedyAdditiveMulticutProposalGenerator derived its per-call seed as seed_ + call_count_, which collided with the fusion-move driver's per-slot seed + slot offset (different (slot, iteration) cells computed identical proposals). Mix the packed (seed, counter) pair through a SplitMix64 finalizer to remove the linearity. B7: WatershedProposalGenerator::reset re-seeded the RNG but left the cached normal-distribution deviate, so a reset generator was not identical to a fresh one. Add noise_.reset(). flow: round_to_flat_index used std::nearbyint (FP-rounding-mode dependent, half-to-even); switch to floor(x + 0.5) to match the half-up convention used in affine.hxx and watershed.hxx. Add a determinism regression test for the downsample count-tie case. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
`_bp_##__LINE__` does not expand __LINE__ (token-paste suppresses operand expansion), so every BIOIMAGE_PROFILE_SCOPE declared the same identifier `_bp___LINE__`. It compiled only because each call site happened to wrap its scope in its own block; two scopes in one block (the documented per-phase pattern) would be a redefinition error in BIOIMAGE_PROFILE=ON builds. Add the standard two-level concat indirection so __LINE__ is expanded first. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ulation I1: GreedyAdditiveWorkspace::reset called heap.reset_capacity, and so did detail::initialize_dynamic_graph which every caller runs immediately after. The second call re-wiped the heap locator vector (O(E)) on every solve — hot under fusion-move (per fuse x slot x iteration). Drop it from reset() in both the multicut and lifted-multicut workspaces; initialize_dynamic_graph owns it. I3: BfsWorkspace::reset cleared the visited/distance buffers (O(N) over the whole graph) on every call, making the "k-hop neighborhood from every node" pattern O(N^2) of memset. Replace the boolean visited buffer with a uint32 generation stamp: reset just increments the generation (O(1)), clearing only on the rare wraparound. distance_ is sized but written before read. I4: accumulate_labels used an n_threads x n_nodes map-of-maps plus a per-node merge map. Switch to one combined-key (node, other) histogram per thread, merged once, with a single-pass per-node argmax. Far fewer container allocations; the argmax/tie-break/empty-node semantics are unchanged (covered by the existing accumulate_labels tests, incl. parallel-vs-single and nifty cross-checks). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
R1: three byte-identical number_of_pixels helpers (label_accumulation, feature_accumulation, node_label_projection) — the feature_accumulation copy was dead. Add detail::number_of_elements to detail/grid.hxx and route the two live call sites through it; drop the dead copy. R9: drop the redundant nb::cast wrappers inside nb::make_tuple in relabel_sequential_t (make_tuple already casts its arguments). Docs/naming: - AGENTS.md: correct the union-find path (util/union_find.hxx, namespace bioimage_cpp::util, not detail/) and list number_of_elements under grid.hxx. - agglomeration/detail.hxx: the rekey comment described the opposite of the code; reword to explain that a kRejectEdge'd edge is deliberately left out of the heap. - filters: the public kwarg was window_ratio while the Python API documents window_size; align the nb::arg keyword and the validation message to window_size (calls are positional, so behavior is unchanged). - feature_accumulation: note that float feature values are bit-reproducible only for a fixed thread count. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A/B benchmarks (pre-review build vs the review branch) showed two of the review's changes were net-negative, so they are reverted: I4 (label accumulation): the combined-key (node, other) histogram was 3-15x SLOWER than the per-node map-of-maps and scaled negatively with threads (a giant hashmap probed once per pixel, plus a serial argmax). Restore the per-node map-of-maps with the parallel merge. The detail::number_of_elements dedup (R1) is kept. Tests confirm identical output. B1 (mutex watershed): routing the neighbor through detail::valid_offset_target (per-axis bounds check) cost ~17% on 3D in the hot loop. The public API already guarantees valid_edges excludes out-of-bounds / wrapping edges, so restore the single precomputed offset-stride add and document that precondition instead. B2 (UnionFind bindings): keep the safety bounds check but replace the per-element validation loop with a single std::max_element scan, cutting bulk find overhead from ~1.35x to ~1.1x. The residual cost is the unavoidable extra read of the id array; it only affects the non-hot Python convenience API. BFS (I3) stays 4.5x faster; greedy-additive (I1) unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…, mutex-ws Standalone (no nifty/affogato) benchmarks for the areas touched by the code review that lacked timing coverage, used for the A/B perf validation: - utils/benchmark_union_find.py: bulk merge/find (B2 bounds-check cost) - graph/benchmark_accumulate_labels.py: accumulate_labels across 1/4/8 threads - graph/benchmark_bfs.py: lifted_edges_from_node_labels (workspace-reused BFS, I3) - segmentation/benchmark_mutex_watershed.py: mutex_watershed grid kernel (B1) Each follows the warmup + median/min pattern and exposes run() for a driver. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.