[codex] Reproduce out-of-order event replay divergence#2228
Draft
pranaygp wants to merge 1 commit into
Draft
Conversation
Add a red fixture for late earlier-sorted hook insertion during reused sleep replay.
Contributor
|
Contributor
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (92 failed)mongodb (14 failed):
redis (9 failed):
turso-dev (1 failed):
turso (68 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details.
Check the workflow run for details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ReplayDivergenceError.wait_completedalready used by a prior replay to create the next wait.Context
Customer run
wrun_01KT4A9M0Q2W39YKWQG4J8G48Jfailed onworkflow4.3.1after all replay-divergence recovery attempts:The Dynamo event log was structurally balanced before the terminal pending wait. Production
workflow-serverwas already using strongly consistent event reads, so this is not explained by the prior eventual-read issue.What This Proves
The ordinary append-only control remains deterministic: adding a
hook_receivedafter the prefix observed by a replay does not overturn the earlier reused-sleep winner.The new red fixture models a stronger ordering violation:
wait_completedwithout a second hook and legitimately queues the nextwait_created.hook_receivedbefore that already-observedwait_completed, while retaining the queued wait.That fixture fails in both synchronous and asynchronous deserialization modes with the same
wait_createddivergence shape as the production run.This points at preserving append-only event ordering across cursor reads, rather than a general nondeterminism bug whenever a hook arrives during a live replay. In
workflow-server, event IDs are allocated before the Dynamo insert and are also used for event ordering/cursors, so a delayed earlier-ID insert is a candidate backend reproduction to test next.Validation
corepack pnpm exec biome check packages/core/src/hook-sleep-interaction.test.tsPasses.
corepack pnpm --filter @workflow/core exec vitest run src/hook-sleep-interaction.test.ts --reporter=verboseIntentionally fails only in the new fixture, once per deserialization mode:
This PR is intentionally draft/red while it serves as the failing reproduction for the remaining event-ordering investigation.