Skip to content

fix: include port id in materialization reader actor id to fix self-join deadlock#4985

Open
Ma77Ball wants to merge 5 commits intoapache:mainfrom
Ma77Ball:fix/differenceOperator
Open

fix: include port id in materialization reader actor id to fix self-join deadlock#4985
Ma77Ball wants to merge 5 commits intoapache:mainfrom
Ma77Ball:fix/differenceOperator

Conversation

@Ma77Ball
Copy link
Copy Markdown
Contributor

@Ma77Ball Ma77Ball commented May 8, 2026

What changes were proposed in this PR?

Fixes the Difference operator hanging when one upstream operator (e.g., a single CSV) is wired to both of its input ports.

The bug

  • Scheduler materializes the upstream to break the self-join cycle.
  • Each downstream worker spawns one InputPortMaterializationReaderThread per upstream URI.
  • Each thread's virtual "from" actor ID was built from (uri, workerActorId) only.
  • With a self-join, both threads share the same URI and worker → same ChannelIdentity → FIFO sequence numbers and end-of-channel markers cross-routed → one port never drains → Difference hangs.

The fix

  • Mix the destination PortIdentity into the actor name: MATERIALIZATION_READER_<uri>_port<n>[i]_<workerActorId>.
  • Thread toPortId through the three callers of getFromActorIdForInputPortStorage:
    • ResourceAllocatorglobalPortId.portId.
    • AssignPortHandlermsg.portId.
    • InputManagerInputPortMaterializationReaderThread (new ctor field)

Any related issues, documentation, or discussions?

Closes: #2588

How was this PR tested?

  • VirtualIdentityUtilsSpec — checks the new ID format and asserts distinct IDs for the same (uri, worker) but different port IDs.
  • ExpansionGreedyScheduleGeneratorSpec — builds csv → difference with both inputs from the same csv; asserts levelSets.size > 1, proving materialization happens and the schedule isn't a deadlocked single-level region.
  • Manual: ran a workflow with one CSV connected to both Difference ports — previously hung, now completes.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with: Claude Opus 4.7 in compliance with ASF

@Ma77Ball Ma77Ball changed the title Fix/difference operator fix: include port id in materialization reader actor id to fix self-join deadlock May 8, 2026
@Ma77Ball
Copy link
Copy Markdown
Contributor Author

Ma77Ball commented May 8, 2026

@Yicong-Huang please review

@Ma77Ball Ma77Ball changed the title fix: include port id in materialization reader actor id to fix self-join deadlock fix: include port id in materialization reader actor id to fix self-join deadlock May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

self-diff cannot finish execution

1 participant