fix: key session store by sync_id to prevent concurrent sync clobbering#976
fix: key session store by sync_id to prevent concurrent sync clobbering#976c1-dev-bot[bot] wants to merge 1 commit into
Conversation
GRPCSessionServer previously held a single mutable store pointer shared across all concurrent sync tasks. When a second sync_full task started mid-flight, SetSessionStore overwrote the pointer, causing the first sync's reads to silently return empty results from the wrong c1z file. This led to dropped grants reconciled as revocations by the platform. Replace the single store pointer with a sync.RWMutex-guarded map[syncID]SessionStore. Each RPC resolves its backing store via the request's sync_id (already threaded through the wire protocol). Registration is deferred from loadStore to Sync() after the sync_id is known; cleanup happens in Close() via RemoveSessionStore. Fixes: CE-822
| type SetSessionStore interface { | ||
| SetSessionStore(ctx context.Context, store SessionStore) | ||
| SetSessionStore(ctx context.Context, syncID string, store SessionStore) | ||
| RemoveSessionStore(ctx context.Context, syncID string) | ||
| } |
There was a problem hiding this comment.
🟡 Suggestion: This changes the exported SetSessionStore interface signature (adds syncID) and adds a new RemoveSessionStore method, which breaks any external implementer of this interface. In practice the only implementers/consumers are SDK-internal (connectorClient, GRPCSessionServer, the syncer), so impact is likely low, but per the repo's compatibility criteria a breaking SDK contract change should be signaled with a 0.x minor bump in pkg/sdk/version.go (still v0.15.5) and noted in the PR. (confidence: medium)
General PR Review: fix: key session store by sync_id to prevent concurrent sync clobberingBlocking Issues: 0 | Suggestions: 1 | Threads Resolved: 0 Review SummaryScanned the full PR diff for security and correctness. The change replaces the single shared Security IssuesNone found. Correctness IssuesNone found. Suggestions
Prompt for AI agents |
|
Checking lint failure - will push a fix if needed. |
kans
left a comment
There was a problem hiding this comment.
This is the wrong solution to the problem. The backend tries really hard to not schedule concurrent syncs of any type because we dont' support it in the sdk. If the SDK gets parallel sync tasks, it should error one of them, or reject it, or hold on to it until it finishes another one (or something). This possibly requires some changes in c1.
Summary
GRPCSessionServerpreviously held a single mutablestorepointer shared across all concurrent sync tasks. In service mode withtask-concurrency > 1, when a secondsync_fulltask (full or targeted) started mid-flight,SetSessionStoreoverwrote the pointer. The first sync's reads were then silently served from the wrong c1z file, returning(nil, false, nil)— indistinguishable from "never written." This caused dropped grants that the platform reconciled as revocations.Changes
pkg/types/sessions/sessions.go: ExtendedSetSessionStoreinterface to acceptsyncIDand addedRemoveSessionStorefor cleanup.pkg/session/session_server.go: Replaced singlestorepointer with async.RWMutex-guardedmap[string]SessionStorekeyed bysyncID. Each RPC resolves its backing store viareq.GetSyncId()(already threaded through the wire protocol). AddedRemoveSessionStorefor teardown.internal/connector/connector.go: UpdatedconnectorClientto forwardsyncIDthroughSetSessionStore/RemoveSessionStore.pkg/sync/syncer.go: Moved session store registration fromloadStore(where syncID is not yet known) toSync()afterstartOrResumeSyncreturns the syncID. AddedRemoveSessionStorecall inClose()to prevent memory leaks.pkg/session/session_server_test.go: Added regression tests verifying concurrent syncs each read their own data, removal prevents stale access, and unregistered syncIDs fail loudly instead of silently returning empty.Trigger sequence (before this fix)
After this fix
Each sync registers its store keyed by
syncID. RPCs resolve the correct backing store via the request'ssync_id. Concurrent syncs never interfere with each other.Fixes: CE-822
Automated PR Notice
This PR was automatically created by c1-dev-bot as a potential implementation.
This code requires: