feat(payload): positional-cursor resume for chunk vtab (prototype)#52
Open
andinux wants to merge 1 commit into
Open
feat(payload): positional-cursor resume for chunk vtab (prototype)#52andinux wants to merge 1 commit into
andinux wants to merge 1 commit into
Conversation
cloudsync_payload_chunks could only resume a window from since>db_version, so a chunk boundary that lands inside a single committed db_version (or inside a fragmented oversized value) was not addressable — which is the whole reason the server stages the stream into a cloudsync_payload_spool table to page it out. This makes the vtab resumable by position instead. Add an optional positional cursor: hidden inputs resume_db_version, resume_seq, resume_frag_offset start the scan at (db_version, seq) inclusive and re-enter a mid-value fragment at a byte offset; new outputs next_db_version, next_seq, next_frag_offset and is_final report where the emitted chunk stopped. A stateless /check can then page the whole window with an O(1) seek per call (vs O(N^2) replay-from-since), no spool table, no server-side state. Legacy since>db_version callers (send path, spool fill) are unchanged: columns are appended and the positional branch only activates when resume_db_version is bound. Tiling is exact, not idempotent-overlap: (db_version, seq) is a unique total order (changes rowid = (db_version<<30)|seq), next_* names the row that did not fit or the exact byte already emitted, and the fragment plan is a deterministic function of the row, so a resumed fragment tiles identically. The drain-start cursor must seek exclusive-after the last applied change (see docs/internal design note) so the protocol never relies on changes-level idempotency to absorb a re-sent row. New unit test Payload Chunks Positional Resume drives a window mixing a db_version split across chunks with a value larger than the chunk budget, pages it one chunk per call via the cursor, and asserts byte-identity with a full-window scan (including a mid-fragment resume). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Make the
cloudsync_payload_chunksvtab resumable by position so a window canbe paged one chunk at a time from an arbitrary point — including boundaries that
fall inside a single committed
db_versionor inside a fragmented oversizedvalue.
resume_db_version,resume_seq,resume_frag_offsetstartthe scan at
(db_version, seq)inclusive and re-enter a mid-value fragment at abyte offset.
next_db_version,next_seq,next_frag_offset,is_finalreportexactly where the emitted chunk stopped, so the next call resumes from there.
payload_chunks_begin_fragmentso a streamedand a resumed fragment build identically.
db_versionsplit across chunks plus a value larger than the chunk budget) one chunk per
call via the cursor and asserts byte-for-byte identity with a full-window
scan, including a mid-fragment resume.
Why
Today the server stages a whole window into the
cloudsync_payload_spooltable sothe
/checkjob can page it out one chunk per node round-trip. That staging stepissues
CREATE TABLEunder the tenant'sreadwriteapikey, which is notauthorized to create tables — surfacing as
failures.check: database_permission_deniedand an endless
/checkpoll (the original bench failure on this branch).cloudsync_payload_chunkscould only resume fromsince > db_version, so a chunkboundary landing inside a single
db_version(or inside a fragmented value) was notaddressable — which is the whole reason the spool table exists. A positional
cursor makes those boundaries addressable, so the job can page the node directly:
O(N^2)replay-from-since), O(N) per window —matching the spool's compute-once cost.
CREATEprivilege → the permission bug is gone at the root(no provisioning, no migration, no dbadmin fallback, no SQLiteCloud-core change).
Exact tiling, no idempotency reliance.
(db_version, seq)is a unique totalorder (changes rowid =
(db_version << 30) | seq);next_*names the row that didnot fit or the exact byte already emitted, and the fragment plan is a deterministic
function of the row. So each change is emitted in exactly one chunk — verified
byte-for-byte by the new test. The only idempotency in play stays at the sqlite-sync
changes level.
Scope / compatibility
positions 0..10 untouched); the positional branch only activates when
resume_db_versionis bound.cloudsync_network_send_changes(binds onlysince_db_version) andcloudsync_payload_spool_filltake the byte-identicallegacy path. All existing payload/chunk/spool unit tests pass.
/checkwire protocol does not change. The positional cursor is aserver-internal (job ↔ node, SQL-level) mechanism that replaces the spool's chunk
addressing. The job loops once over the window, fetching one chunk per round-trip
(node memory stays at one chunk) and staging each chunk into the object store as
the legacy path already does; the client downloads the staged chunks at its own
pace with its existing cursor. See
docs/internal/chunked-check-spool-vs-positional-cursor.mdfor the full two-layer analysis and the drain-start "exclusive-after" rule.
This is a prototype: the vtab primitive + its test. Wiring it into the live
/checkjob is tracked below.TODO (follow-ups)
/checkjob (Go server): replacespool_fill+ spool paging with apositional
SELECT ... LIMIT 1staging loop into the object store(job-internal layer-2 cursor + pinned
untilwatermark; apply the drain-startexclusive-after rule for the first node fetch). Client wire protocol
unchanged.
cloudsync_payload_chunksSRF (cloudsync_postgresql.c/cloudsync*.sql) —this prototype is SQLite-only.
cloudsync_payload_spool: dropspool_fill/spool_drop/the tableand its SQL once the positional path is wired end-to-end and verified.
removing the spool.
cloudsync_init_table, existence-guard inspool_fill, core migrate-on-open)becomes unnecessary.
🤖 Generated with Claude Code