Skip to content

[runtime] Fix prune state corner cases around checkpoints#649

Merged
xintongsong merged 1 commit into
apache:release-0.2from
joeyutong:codex/pr603-release-0.2
May 12, 2026
Merged

[runtime] Fix prune state corner cases around checkpoints#649
xintongsong merged 1 commit into
apache:release-0.2from
joeyutong:codex/pr603-release-0.2

Conversation

@joeyutong
Copy link
Copy Markdown
Contributor

@joeyutong joeyutong commented May 8, 2026

Purpose of change

This PR backports #603 to release-0.2.

ActionExecutionOperator had two prune-state boundary issues around checkpoints.

  • prune bookkeeping used messageSequenceNumber, which tracks the latest started sequence, not the latest completed sequence. That means an in-flight sequence could be treated as prune-safe.

  • the operator called pruneState as soon as a run completed, before that completion was covered by a completed checkpoint.

Both issues are operator-side prune-boundary problems. In the current Kafka backend, pruneState only evicts the in-memory cache, so the correctness impact is mostly masked by replay/rebuild from the durable log. Still, the operator should not rely on that backend detail: the state-store prune boundary should only include completed sequences that are part of a completed checkpoint.

On release-0.2, the backport also needs a small test-file adjustment so the cherry-picked regression helpers compile against the older branch state.

Fix

This change makes pruning checkpoint-safe again.

  • track a separate lastCompletedSequenceNumber keyed state
  • snapshot prune metadata from lastCompletedSequenceNumber instead of messageSequenceNumber
  • remove the eager prune path on run completion
  • prune action state only after notifyCheckpointComplete

Tests

The updated tests cover:

  • no pruning before checkpoint completion
  • no pruning of sequences that are still in flight
  • replay from an earlier checkpoint keeps durable state available
  • action state cleanup happens after checkpoint completion

API

No

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

@github-actions github-actions Bot added doc-label-missing The Bot applies this label either because none or multiple labels were provided. fixVersion/0.2.2 priority/major Default priority of the PR or issue. labels May 8, 2026
@joeyutong joeyutong changed the title [release-0.2][runtime] Backport prune state corner cases around checkpoints [runtime] Fix prune state corner cases around checkpoints May 8, 2026
@github-actions github-actions Bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels May 8, 2026
@joeyutong joeyutong force-pushed the codex/pr603-release-0.2 branch from 907b979 to 4372bad Compare May 8, 2026 01:36
@joeyutong joeyutong marked this pull request as ready for review May 11, 2026 01:34
Copy link
Copy Markdown
Contributor

@xintongsong xintongsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xintongsong xintongsong merged commit e0a6d51 into apache:release-0.2 May 12, 2026
41 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs fixVersion/0.2.2 priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants