Skip to content

Add vessel-graph connected-component labeling and node annotation rule#169

Draft
Copilot wants to merge 3 commits into
mainfrom
copilot/add-connected-components-rule
Draft

Add vessel-graph connected-component labeling and node annotation rule#169
Copilot wants to merge 3 commits into
mainfrom
copilot/add-connected-components-rule

Conversation

Copilot AI commented May 18, 2026

Copy link
Copy Markdown
Contributor

Vessel nodes/edges are now generated, but nodes lacked graph connectivity context. This change adds a workflow step that computes connected components from the vessel graph and annotates each node with a component label ranked from largest to smallest.

  • Workflow graph updates

    • Split node generation into two stages:
      • vessel_graph_to_nodes_edges now emits nodes_raw.parquet + edges.parquet
      • new vessel_graph_connected_components consumes nodes_raw.parquet + edges.parquet and writes final nodes.parquet
    • Preserves existing final artifact naming (nodes.parquet) while introducing an explicit intermediate node table.
  • Connected-components annotation

    • Added spimquant/workflow/scripts/annotate_vessel_graph_connected_components.py.
    • Computes connected components from src_node_id/dst_node_id and adds component_label to nodes.
    • Labels are assigned by descending component size (1 = largest), with deterministic tie-break by minimum node_id.
  • Compatibility + robustness

    • Updated convert_vessel_graph_to_nodes_edges.py to support both output contracts (nodes_raw_parquet preferred, nodes_parquet fallback) for backward compatibility.
    • Added guards for malformed edge tables (missing endpoint columns / unknown node references).
# Component ranking policy in annotate_vessel_graph_connected_components.py
ranked_components = sorted(
    components_by_root.values(),
    key=lambda comp_node_ids: (-len(comp_node_ids), min(comp_node_ids)),
)
for label, component_node_ids in enumerate(ranked_components, start=1):
    for node_id in component_node_ids:
        component_label_by_node_id[node_id] = label

@akhanf

akhanf commented May 21, 2026

Copy link
Copy Markdown
Member

This ends up being too memory inefficient - ideally should move to using cugraph to do this, making use of a GPU -- however, currently dask-cuda pins to an older version of dask than zarrnii (zarrnii is pinned to newer version to support sharding).. so may need to 1) implement the vessel graph processing in a distinct conda env for now, 2) fork dask-cuda and remove the pin, or 3) wait for next dask-cuda release where the pin will move.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants