Skip to content

[codex] Reproduce out-of-order event replay divergence#2228

Draft
pranaygp wants to merge 1 commit into
stablefrom
pranaygp/codex/repro-event-order-replay-divergence
Draft

[codex] Reproduce out-of-order event replay divergence#2228
pranaygp wants to merge 1 commit into
stablefrom
pranaygp/codex/repro-event-order-replay-divergence

Conversation

@pranaygp
Copy link
Copy Markdown
Contributor

@pranaygp pranaygp commented Jun 2, 2026

Summary

  • Surface unconsumed replay events from the hook/sleep fixture as ReplayDivergenceError.
  • Add controls for the customer terminal suffix and for a normally appended hook during a reused-sleep replay.
  • Add a red fixture where a late hook is inserted before a wait_completed already used by a prior replay to create the next wait.

Context

Customer run wrun_01KT4A9M0Q2W39YKWQG4J8G48J failed on workflow 4.3.1 after all replay-divergence recovery attempts:

Replay could not consume event: eventType=wait_created,
correlationId=wait_01KT4A9M74G69AMBW0TQMC10MR,
eventId=evnt_01KT4AD6ECN7X90VR534PCKCH9.

The Dynamo event log was structurally balanced before the terminal pending wait. Production workflow-server was already using strongly consistent event reads, so this is not explained by the prior eventual-read issue.

What This Proves

The ordinary append-only control remains deterministic: adding a hook_received after the prefix observed by a replay does not overturn the earlier reused-sleep winner.

The new red fixture models a stronger ordering violation:

  1. Replay observes wait_completed without a second hook and legitimately queues the next wait_created.
  2. The durable history later presents a second hook_received before that already-observed wait_completed, while retaining the queued wait.
  3. Fresh replay takes the hook branch and fails to consume the recorded next wait.

That fixture fails in both synchronous and asynchronous deserialization modes with the same wait_created divergence shape as the production run.

This points at preserving append-only event ordering across cursor reads, rather than a general nondeterminism bug whenever a hook arrives during a live replay. In workflow-server, event IDs are allocated before the Dynamo insert and are also used for event ordering/cursors, so a delayed earlier-ID insert is a candidate backend reproduction to test next.

Validation

corepack pnpm exec biome check packages/core/src/hook-sleep-interaction.test.ts

Passes.

corepack pnpm --filter @workflow/core exec vitest run src/hook-sleep-interaction.test.ts --reporter=verbose

Intentionally fails only in the new fixture, once per deserialization mode:

ReplayDivergenceError: Replay could not consume event: eventType=wait_created, ... eventId=evnt_7.

This PR is intentionally draft/red while it serves as the failing reproduction for the remaining event-ordering investigation.

Add a red fixture for late earlier-sorted hook insertion during reused sleep replay.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Jun 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
example-nextjs-workflow-turbopack Ready Ready Preview, Comment Jun 2, 2026 8:49pm
example-nextjs-workflow-webpack Ready Ready Preview, Comment Jun 2, 2026 8:49pm
example-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workbench-astro-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workbench-express-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workbench-fastify-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workbench-hono-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workbench-nitro-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workbench-nuxt-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workbench-sveltekit-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workbench-tanstack-start-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workbench-vite-workflow Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workflow-swc-playground Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workflow-tarballs Ready Ready Preview, Comment Jun 2, 2026 8:49pm
workflow-web Ready Ready Preview, Comment Jun 2, 2026 8:49pm

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 2, 2026

⚠️ No Changeset found

Latest commit: 480479b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 923 0 67 990
✅ 💻 Local Development 994 0 86 1080
✅ 📦 Local Production 994 0 86 1080
✅ 🐘 Local Postgres 910 0 80 990
✅ 🪟 Windows 90 0 0 90
❌ 🌍 Community Worlds 136 92 0 228
✅ 📋 Other 504 0 36 540
Total 4551 92 355 4998

❌ Failed Tests

🌍 Community Worlds (92 failed)

mongodb (14 failed):

  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KT51F55QMHD54WVXQNHHJ582
  • webhookWorkflow | wrun_01KT51FAFWNS0DJ6BMRXRYFH9Q
  • sleepingWorkflow | wrun_01KT51FHAK4185VBHWGPHPP7FG
  • outputStreamWorkflow no startIndex (reads all chunks)
  • outputStreamWorkflow negative startIndex (reads from end)
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns correct index after stream completes
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns -1 before any chunks are written
  • outputStreamWorkflow - getTailIndex and getStreamChunks getStreamChunks returns same content as reading the stream
  • outputStreamInsideStepWorkflow - getWritable() called inside step functions | wrun_01KT51JRBDR5J2NTD2GGR8NMDZ
  • writableForwardedFromWorkflowWorkflow | wrun_01KT51K7CB5Z3XMTDCDMNF1M43
  • writableForwardedFromStepWorkflow | wrun_01KT51KCNV247NNR9MB1EE0ESK
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KT51QEWTJWT0H9T4DVW4696A
  • pages router sleepingWorkflow via pages router
  • resilient start: addTenWorkflow completes when run_created returns 500 | wrun_01KT51Y5QDSHGTNA0J6J8PE01T

redis (9 failed):

  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KT51F55QMHD54WVXQNHHJ582
  • sleepingWorkflow | wrun_01KT51FHAK4185VBHWGPHPP7FG
  • outputStreamWorkflow negative startIndex (reads from end)
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns correct index after stream completes
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns -1 before any chunks are written
  • outputStreamWorkflow - getTailIndex and getStreamChunks getStreamChunks returns same content as reading the stream
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KT51QEWTJWT0H9T4DVW4696A
  • pages router sleepingWorkflow via pages router
  • resilient start: addTenWorkflow completes when run_created returns 500 | wrun_01KT51Y5QDSHGTNA0J6J8PE01T

turso-dev (1 failed):

  • dev e2e should rebuild on imported step dependency change

turso (68 failed):

  • addTenWorkflow | wrun_01KT51DW1PJXJ3TE45BBG4GNZY
  • addTenWorkflow | wrun_01KT51DW1PJXJ3TE45BBG4GNZY
  • wellKnownAgentWorkflow (.well-known/agent) | wrun_01KT51FJJNZX60Q4A85ZKHY9Y5
  • should work with react rendering in step
  • promiseAllWorkflow | wrun_01KT51E4F4AT5RXPDMZ4K6F8YR
  • promiseRaceWorkflow | wrun_01KT51E953RC01W41Q8M2QSABN
  • promiseAnyWorkflow | wrun_01KT51ECTD40FXQAEM8PB08HFJ
  • importedStepOnlyWorkflow | wrun_01KT51FTRP0CPFBPVEM2HPA7ZG
  • readableStreamWorkflow | wrun_01KT51EG6CR74CEPSPYJ46VKX3
  • hookWorkflow | wrun_01KT51EX398JSTYSCFA0E4MTHY
  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KT51F55QMHD54WVXQNHHJ582
  • webhookWorkflow | wrun_01KT51FAFWNS0DJ6BMRXRYFH9Q
  • sleepingWorkflow | wrun_01KT51FHAK4185VBHWGPHPP7FG
  • parallelSleepWorkflow | wrun_01KT51G1479P7GCGT9GFSM963M
  • nullByteWorkflow | wrun_01KT51G5X1Q43W18WKT984Z7HH
  • workflowAndStepMetadataWorkflow | wrun_01KT51G8FSBJA3HHP7M8J2FXFQ
  • outputStreamWorkflow no startIndex (reads all chunks)
  • outputStreamWorkflow positive startIndex (skips first chunk)
  • outputStreamWorkflow negative startIndex (reads from end)
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns correct index after stream completes
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns -1 before any chunks are written
  • outputStreamWorkflow - getTailIndex and getStreamChunks getStreamChunks returns same content as reading the stream
  • outputStreamInsideStepWorkflow - getWritable() called inside step functions | wrun_01KT51JRBDR5J2NTD2GGR8NMDZ
  • writableForwardedFromWorkflowWorkflow | wrun_01KT51K7CB5Z3XMTDCDMNF1M43
  • writableForwardedFromStepWorkflow | wrun_01KT51KCNV247NNR9MB1EE0ESK
  • fetchWorkflow | wrun_01KT51KGNSP7ECXRNNW7TZ7ADE
  • promiseRaceStressTestWorkflow | wrun_01KT51KKJCM0WZR1QN3733S5KH
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • error handling not registered WorkflowNotRegisteredError fails the run when workflow does not exist
  • error handling not registered StepNotRegisteredError fails the step but workflow can catch it
  • error handling not registered StepNotRegisteredError fails the run when not caught in workflow
  • hookCleanupTestWorkflow - hook token reuse after workflow completion | wrun_01KT51Q2PBXMC9E8B05GTBAJ98
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KT51QEWTJWT0H9T4DVW4696A
  • hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running | wrun_01KT51QYCZEH78X9409VCH4XKW
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars) | wrun_01KT51RFX5RHT80039TNVGQ2CR
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument | wrun_01KT51RSPZ96D4JSBDFX13T4W8
  • closureVariableWorkflow - nested step functions with closure variables | wrun_01KT51S0X9TPASDJVDYFAH2MNQ
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step | wrun_01KT51S348Q6H3PDPAPXRV5518
  • health check (queue-based) - workflow and step endpoints respond to health check messages
  • health check (CLI) - workflow health command reports healthy endpoints
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly | wrun_01KT51SKFGR2QTCTST9TPVX23Y
  • Calculator.calculate - static workflow method using static step methods from another class | wrun_01KT51SSB5TQ4DPZ3R1XRTEVDZ
  • AllInOneService.processNumber - static workflow method using sibling static step methods | wrun_01KT51T0DQDGYG45JNJP3NZ6WT
  • ChainableService.processWithThis - static step methods using this to reference the class | wrun_01KT51T7RJ9XYC8RC4S4PS1H1W
  • thisSerializationWorkflow - step function invoked with .call() and .apply() | wrun_01KT51TEWFGCESKAY42TE2WTF4
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE | wrun_01KT51TNWC7J9Y2N59F86JG27W
  • instanceMethodStepWorkflow - instance methods with "use step" directive | wrun_01KT51TWYM9VQ1KF2YY8SR3W0W
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context | wrun_01KT51VAWQQX91NJPE1FZ0WXYC
  • stepFunctionAsStartArgWorkflow - step function reference passed as start() argument | wrun_01KT51VM7QGN83RB04N6DRDEWB
  • cancelRun - cancelling a running workflow | wrun_01KT51VVB4PAC6EMD2881JSWZR
  • cancelRun via CLI - cancelling a running workflow | wrun_01KT51W4VGDTECCB953X61CVVE
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router
  • hookWithSleepWorkflow - hook payloads delivered correctly with concurrent sleep | wrun_01KT51WH91CDG6RZYC79NJK21H
  • sleepInLoopWorkflow - sleep inside loop with steps actually delays each iteration | wrun_01KT51X1ETE4DS3NFJKYZ8RX81
  • sleepWithSequentialStepsWorkflow - sequential steps work with concurrent sleep (control) | wrun_01KT51XR44VF1NPMNETFBQ7MP2
  • importMetaUrlWorkflow - import.meta.url is available in step bundles | wrun_01KT51Y19901WZRR45BGR7XR3J
  • metadataFromHelperWorkflow - getWorkflowMetadata/getStepMetadata work from module-level helper (#1577) | wrun_01KT51Y3H0F9DGMDXAJBNWCC9T
  • resilient start: addTenWorkflow completes when run_created returns 500 | wrun_01KT51Y5QDSHGTNA0J6J8PE01T

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 83 0 7
✅ example 83 0 7
✅ express 83 0 7
✅ fastify 83 0 7
✅ hono 83 0 7
✅ nextjs-turbopack 88 0 2
✅ nextjs-webpack 88 0 2
✅ nitro 83 0 7
✅ nuxt 83 0 7
✅ sveltekit 83 0 7
✅ vite 83 0 7
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 84 0 6
✅ express-stable 84 0 6
✅ fastify-stable 84 0 6
✅ hono-stable 84 0 6
✅ nextjs-turbopack-canary 71 0 19
✅ nextjs-turbopack-stable 90 0 0
✅ nextjs-webpack-canary 71 0 19
✅ nextjs-webpack-stable 90 0 0
✅ nitro-stable 84 0 6
✅ nuxt-stable 84 0 6
✅ sveltekit-stable 84 0 6
✅ vite-stable 84 0 6
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 84 0 6
✅ express-stable 84 0 6
✅ fastify-stable 84 0 6
✅ hono-stable 84 0 6
✅ nextjs-turbopack-canary 71 0 19
✅ nextjs-turbopack-stable 90 0 0
✅ nextjs-webpack-canary 71 0 19
✅ nextjs-webpack-stable 90 0 0
✅ nitro-stable 84 0 6
✅ nuxt-stable 84 0 6
✅ sveltekit-stable 84 0 6
✅ vite-stable 84 0 6
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 84 0 6
✅ express-stable 84 0 6
✅ hono-stable 84 0 6
✅ nextjs-turbopack-canary 71 0 19
✅ nextjs-turbopack-stable 90 0 0
✅ nextjs-webpack-canary 71 0 19
✅ nextjs-webpack-stable 90 0 0
✅ nitro-stable 84 0 6
✅ nuxt-stable 84 0 6
✅ sveltekit-stable 84 0 6
✅ vite-stable 84 0 6
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 90 0 0
❌ 🌍 Community Worlds
App Passed Failed Skipped
✅ mongodb-dev 5 0 0
❌ mongodb 57 14 0
✅ redis-dev 5 0 0
❌ redis 62 9 0
❌ turso-dev 4 1 0
❌ turso 3 68 0
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 84 0 6
✅ e2e-local-dev-tanstack-start-stable 84 0 6
✅ e2e-local-postgres-nest-stable 84 0 6
✅ e2e-local-postgres-tanstack-start-stable 84 0 6
✅ e2e-local-prod-nest-stable 84 0 6
✅ e2e-local-prod-tanstack-start-stable 84 0 6

📋 View full workflow run


Some E2E test jobs failed:

  • Vercel Prod: success
  • Local Dev: success
  • Local Prod: success
  • Local Postgres: failure
  • Windows: success

Check the workflow run for details.

⚠️ Community world tests failed (non-blocking):

  • Community Worlds: failure

Check the workflow run for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant