[e2e] event-log-race-repro: actionable summary, slow≠stuck, fetch retries by VaguelySerious · Pull Request #2195 · vercel/workflow

VaguelySerious · 2026-06-01T10:13:30Z

Standalone CI-only changes for the event-log-race-repro job, extracted so they can be merged independently of any core fix. All on top of the classification work already on stable (#2194).

Actionable result summary

🚨 Event-Log Regressions table lists every gating run in full (never truncated), each with duration, a synthesised detail line, and a direct dashboard link.
Infra (non-gating) section groups harness noise by error code with a plain-language explanation and example run links, instead of flooding one table with thousands of rows.
Headline names the regression count and digests the infra noise.

Slow ≠ stuck

A run flagged at the poll budget can simply be slow on a loaded preview deployment (observed: a stuck run that actually completed shortly after). Added a generous post-budget grace window: a run that reaches a terminal state during grace is classified by its real outcome — completed → non-gating SLOW_COMPLETION, failed → its error class. Only a run still non-terminal after budget + grace is a genuine wedge (gating stuck). Stuck runs also record where they wedged (latest event/step).

Retry transient fetch failures

Investigating two HARNESS_ERRORs (fetch failed, Hook not found): both came from harness-side network calls to the deployment, not the SDK. Added withRetry (linear backoff, transient-network detection) around the harness network calls (getWorkflowMetadata, start, resumeHook, run-status poll); the poll no longer aborts on a flaky GET. On final failure the error is prefixed with the call site (e.g. start: fetch failed) so the infra breakdown says where it happened.

Renderer is unit-tested (node:test, run by the CI Scripts Tests job).

🤖 Generated with Claude Code

…ist + infra breakdown Rework the PR-comment renderer so a human can immediately see what gates the job and inspect every failing run: - 🚨 Event-Log Regressions table lists *every* gating run in full (never truncated), each with its duration, a synthesised detail line, and a direct dashboard link. Stuck runs render "no terminal state after <ms>". - Infra (non-gating) section groups harness noise by error code with a plain-language explanation and example run links, instead of flooding one table with thousands of rows. - Headline names the regression count and digests the infra noise (e.g. "904 HOOK_RESUME_FAILED, 61 NO_WAKE_BRANCH"). Adds unit coverage for the breakdown, message synthesis and the never-truncate-regressions guarantee. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 8f41186)

On run-poll timeout, fetch the run's event log and record the latest event (type, step name, elapsed) as the stuck run's errorMessage. The summary's regression table then shows "stuck after N events; latest step_started (foo) at +12.3s" with a dashboard link, instead of only a duration — so a human can see where the run wedged without opening every link. Best-effort; falls back to the duration-only note if the event fetch fails. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit dee4370)

A run flagged at the 150s poll budget can simply be slow on a loaded preview deployment — observed wrun_…EFDZ9 completed shortly after the harness gave up and was wrongly gated as `stuck`. Add a generous post-budget grace window: a run that reaches a terminal state during grace is classified by its real outcome (completed → non-gating `SLOW_COMPLETION` infra, surfaced for visibility; failed → its error class). Only a run still non-terminal after budget + grace is a genuine wedge (gating `stuck`). Renderer gains notes for SLOW_COMPLETION/CANCELLED and singular/plural agreement fixes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 31d5b99)

…e they occur Investigating HARNESS_ERRORs on a repro run: a `fetch failed` and a `Hook not found`. Both came from harness-side network calls to the deployment, not the SDK. A single dropped connection should never abort tracking an otherwise healthy run. - Add `withRetry` (linear backoff, transient-network detection) and apply it to the harness network calls: getWorkflowMetadata, start, resumeHook, and the run-status poll. On final failure the error is prefixed with the call site (e.g. "start: fetch failed", "poll runs.get: fetch failed"), so the infra breakdown says *where* it happened. - pollTerminalRun no longer aborts on a flaky GET: a transient error just retries/continues until the deadline. - waitForHook labels its surfaced error ("waitForHook: Hook not found") so the hook-propagation timeout is identifiable in the summary. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit a9b68c0)

vercel · 2026-06-01T10:13:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
example-nextjs-workflow-turbopack	Ready	Preview, Comment	Jun 1, 2026 10:17am
example-nextjs-workflow-webpack	Ready	Preview, Comment	Jun 1, 2026 10:17am
example-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workbench-astro-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workbench-express-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workbench-fastify-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workbench-hono-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workbench-nitro-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workbench-nuxt-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workbench-sveltekit-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workbench-tanstack-start-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workbench-vite-workflow	Ready	Preview, Comment	Jun 1, 2026 10:17am
workflow-docs	Ready	Preview, Comment, Open in v0	Jun 1, 2026 10:17am
workflow-swc-playground	Ready	Preview, Comment	Jun 1, 2026 10:17am
workflow-tarballs	Ready	Preview, Comment	Jun 1, 2026 10:17am
workflow-web	Ready	Preview, Comment	Jun 1, 2026 10:17am

changeset-bot · 2026-06-01T10:13:35Z

🦋 Changeset detected

Latest commit: b726540

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 0 packages

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-06-01T10:13:42Z

🧪 E2E Test Results

❌ Some tests failed

Summary

	Passed	Failed	Skipped	Total
✅ ▲ Vercel Production	901	0	67	968
✅ 💻 Local Development	970	0	86	1056
✅ 📦 Local Production	970	0	86	1056
✅ 🐘 Local Postgres	901	0	67	968
✅ 🪟 Windows	88	0	0	88
❌ 🌍 Community Worlds	134	88	0	222
✅ 📋 Other	492	0	36	528
Total	4456	88	342	4886

❌ Failed Tests

🌍 Community Worlds (88 failed)

mongodb (12 failed):

hookWorkflow is not resumable via public webhook endpoint | wrun_01KT1AXZWZYGRHV86RHXK1K527
webhookWorkflow | wrun_01KT1AY72F9A1W4AYGSMR6657C
sleepingWorkflow | wrun_01KT1AYDGJBH8V0RSK9W75Z07R
outputStreamWorkflow no startIndex (reads all chunks)
outputStreamWorkflow negative startIndex (reads from end)
outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns correct index after stream completes
outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns -1 before any chunks are written
outputStreamWorkflow - getTailIndex and getStreamChunks getStreamChunks returns same content as reading the stream
outputStreamInsideStepWorkflow - getWritable() called inside step functions | wrun_01KT1B1QFSNCQBBE7GN3H4EKZC
concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KT1B67Y90J0DV6YPWGEKNW78
pages router sleepingWorkflow via pages router
resilient start: addTenWorkflow completes when run_created returns 500 | wrun_01KT1BCF4ZB5BPQ6P6M8WGGYMY

redis (9 failed):

hookWorkflow is not resumable via public webhook endpoint | wrun_01KT1AXZWZYGRHV86RHXK1K527
sleepingWorkflow | wrun_01KT1AYDGJBH8V0RSK9W75Z07R
outputStreamWorkflow negative startIndex (reads from end)
outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns correct index after stream completes
outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns -1 before any chunks are written
outputStreamWorkflow - getTailIndex and getStreamChunks getStreamChunks returns same content as reading the stream
concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KT1B67Y90J0DV6YPWGEKNW78
pages router sleepingWorkflow via pages router
resilient start: addTenWorkflow completes when run_created returns 500 | wrun_01KT1BCF4ZB5BPQ6P6M8WGGYMY

turso-dev (1 failed):

dev e2e should rebuild on imported step dependency change

turso (66 failed):

addTenWorkflow | wrun_01KT1AWT51YNJGFZN56XX8B36Z
addTenWorkflow | wrun_01KT1AWT51YNJGFZN56XX8B36Z
wellKnownAgentWorkflow (.well-known/agent) | wrun_01KT1AZ04ST8YGJ9XGJYW52Q1F
should work with react rendering in step
promiseAllWorkflow | wrun_01KT1AX29BCSSES15F4Z7R0A7Q
promiseRaceWorkflow | wrun_01KT1AX5MC849XVT57H8BPRWYQ
promiseAnyWorkflow | wrun_01KT1AX7TDPZG9T68P6D12W12P
importedStepOnlyWorkflow | wrun_01KT1AZFNG12191X94VF6AGA8B
readableStreamWorkflow | wrun_01KT1AXA0FJRX591CGQGEA7JWY
hookWorkflow | wrun_01KT1AXQ5K6BY9QW56JA5M8XMS
hookWorkflow is not resumable via public webhook endpoint | wrun_01KT1AXZWZYGRHV86RHXK1K527
webhookWorkflow | wrun_01KT1AY72F9A1W4AYGSMR6657C
sleepingWorkflow | wrun_01KT1AYDGJBH8V0RSK9W75Z07R
parallelSleepWorkflow | wrun_01KT1AYY79G4DZ1DFD1NA15PPS
nullByteWorkflow | wrun_01KT1AZ3YNZNBYS8422EJXBVKZ
workflowAndStepMetadataWorkflow | wrun_01KT1AZ71KTSX3YS3CRD9A5Z7C
outputStreamWorkflow no startIndex (reads all chunks)
outputStreamWorkflow positive startIndex (skips first chunk)
outputStreamWorkflow negative startIndex (reads from end)
outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns correct index after stream completes
outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns -1 before any chunks are written
outputStreamWorkflow - getTailIndex and getStreamChunks getStreamChunks returns same content as reading the stream
outputStreamInsideStepWorkflow - getWritable() called inside step functions | wrun_01KT1B1QFSNCQBBE7GN3H4EKZC
fetchWorkflow | wrun_01KT1B27HX1ADK51QT0KEHHG3S
promiseRaceStressTestWorkflow | wrun_01KT1B2B3HZWV4FNM3S5SVZ5RH
error handling error propagation workflow errors nested function calls preserve message and stack trace
error handling error propagation workflow errors cross-file imports preserve message and stack trace
error handling error propagation step errors basic step error preserves message and stack trace
error handling error propagation step errors cross-file step error preserves message and function names in stack
error handling retry behavior regular Error retries until success
error handling retry behavior FatalError fails immediately without retries
error handling retry behavior RetryableError respects custom retryAfter delay
error handling retry behavior maxRetries=0 disables retries
error handling catchability FatalError can be caught and detected with FatalError.is()
error handling not registered WorkflowNotRegisteredError fails the run when workflow does not exist
error handling not registered StepNotRegisteredError fails the step but workflow can catch it
error handling not registered StepNotRegisteredError fails the run when not caught in workflow
hookCleanupTestWorkflow - hook token reuse after workflow completion | wrun_01KT1B5V6V77H5NJE832HY1NB2
concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KT1B67Y90J0DV6YPWGEKNW78
hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running | wrun_01KT1B6PWR05ZAV9NMGQ7EYNXW
stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars) | wrun_01KT1B76HJMYN3XQ1X6T9EKMGG
stepFunctionWithClosureWorkflow - step function with closure variables passed as argument | wrun_01KT1B7GG8HVQZZ9YA97QBHSTQ
closureVariableWorkflow - nested step functions with closure variables | wrun_01KT1B7PGVGXWSAD6JREW2WPSK
spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step | wrun_01KT1B7RST4B3S774KY3N2D30K
health check (queue-based) - workflow and step endpoints respond to health check messages
health check (CLI) - workflow health command reports healthy endpoints
pathsAliasWorkflow - TypeScript path aliases resolve correctly | wrun_01KT1B87WEW8F0C8YRCFY4Q76R
Calculator.calculate - static workflow method using static step methods from another class | wrun_01KT1B8DQAN9NY288MQEX8TNDC
AllInOneService.processNumber - static workflow method using sibling static step methods | wrun_01KT1B8MM9NQ34X8VCFHKAN7VA
ChainableService.processWithThis - static step methods using this to reference the class | wrun_01KT1B8VF7HCMC4ACQS06TN1KK
thisSerializationWorkflow - step function invoked with .call() and .apply() | wrun_01KT1B92Q380VFNTQDJ7CV7SQT
customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE | wrun_01KT1B9AB7SKYGPWP47Y9B3MVJ
instanceMethodStepWorkflow - instance methods with "use step" directive | wrun_01KT1B9HF30Y76H26DVDDWV6PP
crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context | wrun_01KT1B9YGDRABKCMGWT48TS5ZB
stepFunctionAsStartArgWorkflow - step function reference passed as start() argument | wrun_01KT1BA7TMXK18A968S1Y2NY09
cancelRun - cancelling a running workflow | wrun_01KT1BAF0WJTS36QE9E7R5RHH9
cancelRun via CLI - cancelling a running workflow | wrun_01KT1BATFDPMSXDVEAF8TPYSJG
pages router addTenWorkflow via pages router
pages router promiseAllWorkflow via pages router
pages router sleepingWorkflow via pages router
hookWithSleepWorkflow - hook payloads delivered correctly with concurrent sleep | wrun_01KT1BB73CKGRPQWQFH307BKHT
sleepInLoopWorkflow - sleep inside loop with steps actually delays each iteration | wrun_01KT1BBR12C7E60RPPN6EBWXPY
sleepWithSequentialStepsWorkflow - sequential steps work with concurrent sleep (control) | wrun_01KT1BC3PRNEVTNXDJ92DY54HK
importMetaUrlWorkflow - import.meta.url is available in step bundles | wrun_01KT1BCAJNRBGQJGT92EAWSQWC
metadataFromHelperWorkflow - getWorkflowMetadata/getStepMetadata work from module-level helper (#1577) | wrun_01KT1BCCRGF28ESF56K9HR6SEH
resilient start: addTenWorkflow completes when run_created returns 500 | wrun_01KT1BCF4ZB5BPQ6P6M8WGGYMY

Details by Category

✅ ▲ Vercel Production

App	Passed	Skipped
✅ astro	81	7
✅ example	81	7
✅ express	81	7
✅ fastify	81	7
✅ hono	81	7
✅ nextjs-turbopack	86	2
✅ nextjs-webpack	86	2
✅ nitro	81	7
✅ nuxt	81	7
✅ sveltekit	81	7
✅ vite	81	7

✅ 💻 Local Development

App	Passed	Skipped
✅ astro-stable	82	6
✅ express-stable	82	6
✅ fastify-stable	82	6
✅ hono-stable	82	6
✅ nextjs-turbopack-canary	69	19
✅ nextjs-turbopack-stable	88	0
✅ nextjs-webpack-canary	69	19
✅ nextjs-webpack-stable	88	0
✅ nitro-stable	82	6
✅ nuxt-stable	82	6
✅ sveltekit-stable	82	6
✅ vite-stable	82	6

✅ 📦 Local Production

App	Passed	Skipped
✅ astro-stable	82	6
✅ express-stable	82	6
✅ fastify-stable	82	6
✅ hono-stable	82	6
✅ nextjs-turbopack-canary	69	19
✅ nextjs-turbopack-stable	88	0
✅ nextjs-webpack-canary	69	19
✅ nextjs-webpack-stable	88	0
✅ nitro-stable	82	6
✅ nuxt-stable	82	6
✅ sveltekit-stable	82	6
✅ vite-stable	82	6

✅ 🐘 Local Postgres

App	Passed	Skipped
✅ astro-stable	82	6
✅ express-stable	82	6
✅ fastify-stable	82	6
✅ hono-stable	82	6
✅ nextjs-turbopack-stable	88	0
✅ nextjs-webpack-canary	69	19
✅ nextjs-webpack-stable	88	0
✅ nitro-stable	82	6
✅ nuxt-stable	82	6
✅ sveltekit-stable	82	6
✅ vite-stable	82	6

✅ 🪟 Windows

App	Passed	Failed	Skipped
✅ nextjs-turbopack	88	0	0

❌ 🌍 Community Worlds

App	Passed	Failed
✅ mongodb-dev	5	0
❌ mongodb	57	12
✅ redis-dev	5	0
❌ redis	60	9
❌ turso-dev	4	1
❌ turso	3	66

✅ 📋 Other

App	Passed	Skipped
✅ e2e-local-dev-nest-stable	82	6
✅ e2e-local-dev-tanstack-start-stable	82	6
✅ e2e-local-postgres-nest-stable	82	6
✅ e2e-local-postgres-tanstack-start-stable	82	6
✅ e2e-local-prod-nest-stable	82	6
✅ e2e-local-prod-tanstack-start-stable	82	6

📋 View full workflow run

❌ Some E2E test jobs failed:

Vercel Prod: success
Local Dev: success
Local Prod: success
Local Postgres: failure
Windows: success

Check the workflow run for details.

⚠️ Community world tests failed (non-blocking):

Community Worlds: failure

Check the workflow run for details.

VaguelySerious and others added 5 commits June 1, 2026 12:12

chore: changeset for event-log-race-repro CI summary improvements

b726540

vercel Bot deployed to Preview – workflow-docs June 1, 2026 10:13 View deployment

VaguelySerious mentioned this pull request Jun 1, 2026

[world-vercel] [builders] Add WORKFLOW_SEQUENTIAL_REPLAYS option to limit flow route concurrency to one #2193

Open

vercel Bot deployed to Preview – workflow-web June 1, 2026 10:14 View deployment

vercel Bot deployed to Preview – workflow-tarballs June 1, 2026 10:14 View deployment

vercel Bot deployed to Preview – workbench-vite-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – workbench-hono-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – workbench-express-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – workbench-nitro-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – example-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – workbench-sveltekit-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – workbench-fastify-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – workbench-tanstack-start-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – workbench-nuxt-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – workbench-astro-workflow June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – example-nextjs-workflow-turbopack June 1, 2026 10:15 View deployment

vercel Bot deployed to Preview – example-nextjs-workflow-webpack June 1, 2026 10:16 View deployment

vercel Bot deployed to Preview – workflow-swc-playground June 1, 2026 10:17 View deployment

VaguelySerious marked this pull request as ready for review June 1, 2026 10:28

VaguelySerious requested a review from a team as a code owner June 1, 2026 10:28

VaguelySerious merged commit bdca8fc into stable Jun 1, 2026
92 of 97 checks passed

VaguelySerious deleted the peter/repro-ci-summary-improvements branch June 1, 2026 10:28

VaguelySerious mentioned this pull request Jun 1, 2026

[e2e] event-log-race-repro: classify polled run failures from structured error #2196

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[e2e] event-log-race-repro: actionable summary, slow≠stuck, fetch retries#2195

[e2e] event-log-race-repro: actionable summary, slow≠stuck, fetch retries#2195
VaguelySerious merged 5 commits into
stablefrom
peter/repro-ci-summary-improvements

VaguelySerious commented Jun 1, 2026

Uh oh!

vercel Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VaguelySerious commented Jun 1, 2026

Actionable result summary

Slow ≠ stuck

Retry transient fetch failures

Uh oh!

vercel Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot Bot commented Jun 1, 2026

🦋 Changeset detected

Uh oh!

github-actions Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 E2E Test Results

Summary

❌ Failed Tests

Details by Category

Check the workflow run for details.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 1, 2026 •

edited

Loading

github-actions Bot commented Jun 1, 2026 •

edited

Loading