[DB-1951] Treat writes as idempotent if all events were previously written, even with check-only streams#5547
Conversation
3faeb90 to
10a2369
Compare
a13f6d4 to
a32b0cc
Compare
a32b0cc to
66de44f
Compare
An idempotent multi-stream write retry should succeed even if a check-only stream's expected version no longer matches. The events are already written and the check-only stream's state may have changed since, ultimately we resend the same response to the client as they should have received the first time they sent the request. Previously, a check-only stream (failing or not) caused the entire retry to fail with WrongExpectedVersion.
66de44f to
a5b911e
Compare
Review Summary by QodoTreat idempotent writes as successful with failing check-only streams
WalkthroughsDescription• Add EventCount field to CommitCheckResult to distinguish check-only streams • Treat idempotent writes as successful even if check-only streams fail version checks • Update verification logic to only require streams with events to be idempotent • Remove HasEventsToWrite from WritePreparesStreamInfo struct Diagramflowchart LR
A["Multi-stream write retry"] --> B["Check all streams"]
B --> C["Streams with events"]
B --> D["Check-only streams"]
C --> E["All idempotent?"]
D --> F["Version mismatch?"]
E -->|Yes| G["Success: AlreadyCommitted"]
E -->|No| H["Failure: WrongExpectedVersion"]
F -->|Yes| G
F -->|No| H
File Changes1. src/KurrentDB.Core/Services/Storage/ReaderIndex/CommitCheckResult.cs
|
Code Review by Qodo🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)
Great, no issues found!Qodo reviewed your code and found no material issues that require reviewⓘ The new review experience is currently in Beta. Learn more |
There was a problem hiding this comment.
Pull request overview
This PR updates multi-stream write idempotency handling so that a retry can still succeed (returning an idempotent/AlreadyCommitted response) even when one or more check-only streams fail their expected-version checks, as long as all streams that actually had events to write were previously written.
Changes:
- Add
EventCount(and derivedHasEventsToWrite) toCommitCheckResultso “check-only” vs “write” streams are carried through commit checking. - Update
StorageWriterService.VerifyCommitChecksto treat retries as idempotent when all event-writing streams are idempotent, ignoring failures on check-only streams in that scenario. - Update multi-stream write tests to expect success for idempotent retries that include check-only streams.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/KurrentDB.Core/Services/Storage/StorageWriterService.cs | Adjusts commit-check verification and response shaping to ignore check-only failures when all write-streams are idempotent. |
| src/KurrentDB.Core/Services/Storage/ReaderIndex/IndexWriter.cs | Threads event-count through commit-check results so callers can identify check-only streams. |
| src/KurrentDB.Core/Services/Storage/ReaderIndex/CommitCheckResult.cs | Extends commit-check result with EventCount/HasEventsToWrite. |
| src/KurrentDB.Core.XUnit.Tests/TransactionLog/MultiStreamWrites/MultiStreamWritesTests.cs | Updates expectations so idempotent retries with check-only streams succeed. |
| src/KurrentDB.Core.Tests/Services/Storage/FakeIndexWriter.cs | Updates test fake to match the new CommitCheckResult constructor signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
@timothycoleman 👉 Created pull request targeting release/v26.0: #5577
* [DB-1964]: Added migration engine for secondary index tables (#5561) * Initial version of the index DB migration engine * Fixed regression * Row must by passed by ref because its instance method mutates the struct fields * Create schema from script only when fresh setup * Removed version setup from the script file * Fixed regression * Removed redundant migration action * Fixed occasional NRE * Align guide with the source code * Fixed typo * Review feedback * Fixed indentation * Fixed identation * [DB-1951] Add MSA idempotency tests that correspond to the existing behaviour (#5565) * Add MSA idempotency tests that correspond to the existing behaviour * chore: restructure CLAUDE.md with progressive disclosure and add architect review agent (#5572) Restructure AI documentation for better context efficiency: - CLAUDE.md reduced from 812 to 150 lines, keeping only rules that affect every coding decision (threading model, C# conventions, log levels, parameter design, naming) - Extracted reference docs to .claude/docs/ with clear "when to fetch" pointers: architecture.md, api-v2-patterns.md, testing.md, protocol-v2.md, patterns-and-conventions.md - New content from Tim Coleman's projections-v2 review (PR #5562): - TcsEnvelope<> vs CallbackEnvelope threading rule (was showing anti-pattern as correct) - Non-optional parameters / no-fallbacks convention - ISystemClient preference over raw IPublisher - ClaimsPrincipal threading-through requirement - Projections V2 engine architecture and threading model - Add architect-review agent (.claude/agents/architect-review.md) encoding Tim Coleman's review methodology for future agentic code reviews Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Treat writes as idempotent if all events were previously written (#5547) An idempotent multi-stream write retry should succeed even if a check-only stream's expected version no longer matches. The events are already written and the check-only stream's state may have changed since, ultimately we resend the same response to the client as they should have received the first time they sent the request. Previously, a check-only stream (failing or not) caused the entire retry to fail with WrongExpectedVersion. * Fix incorrect result stream name for root partition in V2 projections The double-dash `$projections-{name}--result` was produced when the partition key was empty. Now uses `ProjectionNamesBuilder.MakeResultStreamName` — consistent with V1's `MakePartitionResultStreamName`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [DB-1912] Persistent subscriptions: Fix wrong checkpoint when using pinned strategy (#5497) * Add failing test Co-Authored-By: Timothy Coleman <timothy.coleman@gmail.com> * Fixed: Wrong persistent subscription checkpoints can be produced when the consumer strategy skips events * When a consumer strategy skips an event, don't keep it in the buffer but add it to the retry list instead. The checkpointing mechanism considers the lowest message in `_retry` - not the lowest message in `_buffer`. Thus leaving it in the buffer may result in a wrong checkpoint being produced. * Always update the last known event when a new sequence number has been assigned A new sequence number being assigned means that we are seeing an event that was not processed before. In other words, it's the last event we've read from the source stream, so we update the last known sequence number / last known event position accordingly * fix: only update the _lastKnown vars when event leaves the buffer the newSequenceNumberAssigned check isn't enough, because in the NoMoreCapacity case we discard the message that the _lastKnown vars are based on. then next time we call OutstandingMessage.ForPushedEvent the _lastKnownMessage we pass in will be wrong, potentially leading to an incorrect checkpoint (see test) * safer and more efficient - safer because Scan() says we shouldn't add to each linked list while scanning it. before this commit we could add to _retry while scanning through it. now we only add to _retry while scanning _buffer - more efficient because before this commit adding to _retry involved searching through it, now we only add to _retry when skipping an event in _buffer in which case we know the event needs to go to the back of retry. --------- Co-authored-by: Timothy Coleman <timothy.coleman@gmail.com> * Use $ProjectionState event type for V2 state checkpoints instead of Result because what we are doing here is checkpointing not outputting results. "Result" event type can be used when we implement outputState() which would emit the state after every event processed (at least, as long as the state changed) * [DB-1872] Improve pinned persistent subscription performance under burst load (#5576) * Improve persistent subscription performance under burst load When events arrive in a burst, each NotifyLiveSubscriptionMessage call was scanning the entire buffer to push events to clients. Client acks are queued behind the event flood in the same FIFO queue, so clients appear full and every scan is fruitless. With N events, this results in O(N²) total work. Instead of pushing synchronously on event arrival, we now schedule a push message on the PS queue. This lets events accumulate in the buffer cheaply, gives acks a chance to interleave, and batches the buffer scan. All PushToClients call sites use the deferred path for consistency, since the buffer scan cost applies equally to acks, nacks, timeouts, and client changes. * update package ref for cve CVE-2026-26171 * [DB-2006]: Migration to major .NEXT version (#5581) * Migration to major .NEXT version * Fixed dependency * Fixed regression * Fixed regression * Removed item index * review: fix regression * restore previous order --------- Co-authored-by: Timothy Coleman <timothy.coleman@gmail.com> * Write V2 state checkpoints to -state streams instead of -result streams Result streams are for user-visible output (outputState()), not internal state persistence. V2 was writing $ProjectionState events to -result streams, conflating checkpointing with output. State now goes to dedicated -state streams: - $projections-{name}-state (root partition) - $projections-{name}-{partition}-state (per partition) This keeps -result streams clean for future outputState() support and avoids collisions with the -checkpoint stream used for position. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Rename V2 state event type to $ProjectionState.V2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Rename V2 checkpoint event type to $ProjectionCheckpoint.V2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove V1-to-V2 migration test that was silently passing V2 can't parse V1's checkpoint events (V1 stores state in data, position in metadata; V2 expects position in data), so it falls back to starting from the beginning. The test passed because V2 reprocessed everything from scratch, not because migration worked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Rename ProjectionsStateStreamSuffix to ProjectionsResultStreamSuffix The constant was confusingly named — it holds "-result", not "-state". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * better name... --------- Co-authored-by: Roman Sakno <roman.sakno@kurrent.io> Co-authored-by: Alexey Zimarev <alex@zimarev.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Shaan Nobee <sniper111@gmail.com>
An idempotent multi-stream write retry should succeed even if a
check-only stream's expected version no longer matches. The events are
already written and the check-only stream's state may have changed
since, ultimately we resend the same response to the client as they
should have received the first time they sent the request.
Previously, a check-only stream (failing or not) caused the entire retry
to fail with WrongExpectedVersion.