Skip to content

Thread ctx through v3 envelope, add CtxCopy helper, and detach pebble finalize#893

Open
arreyder wants to merge 3 commits into
mainfrom
ctx-aware-envelope
Open

Thread ctx through v3 envelope, add CtxCopy helper, and detach pebble finalize#893
arreyder wants to merge 3 commits into
mainfrom
ctx-aware-envelope

Conversation

@arreyder

Copy link
Copy Markdown
Contributor

Summary

Stacked on #891. Merge that one first.

WriteEnvelope previously held no context: a caller cancellation during the payload tar/zstd walk had to wait for filepath.Walk to drain the entire Pebble checkpoint before the function returned. Big-tenant checkpoints are GBs; that wait is the gap during which the io.Pipe upload pattern can end up with clean-EOF semantics instead of CloseWithError.

WriteEnvelope now takes ctx as its first parameter. The walk callbacks check ctx.Err() per entry, and each regular-file copy is routed through a new iox.CtxCopy: chunked io.CopyBuffer (4 MiB poll granularity, 32 KiB working buffer held across iterations — one allocation per call, not one per chunk).

registeredStore.save (the Pebble equivalent of C1File.finalize) now runs the Checkpoint + WriteEnvelope + rename sequence on a context detached from the caller via dotc1z.FinalizeTimeout(), inside a uotel.StartWithLink new-root span — same pattern as PR #891's C1File.finalize. This protects the Pebble write path even when the caller is a direct C1ZStore consumer that didn't already detach.

Changes

  • pkg/dotc1z/iox/copy.go (new, new package) — CtxCopy helper. Lives in a leaf subpackage of dotc1z so sibling subpackages (format/v3, engine/pebble, future manager/s3 / synccompactor callers) can import it without circular risk and without pulling in the rest of dotc1z.
  • pkg/dotc1z/format/v3/envelope.goWriteEnvelope(ctx, w, m, payloadDir) signature. Walk callbacks check ctx.Err() per entry and wrap with the in-flight tar entry name (c1z v3: tar_zstd walk canceled at "entry-X": ...) so operators triaging "context canceled" in logs can see exactly which Pebble checkpoint file was being visited. Godoc tightened to say magic+length+manifest bytes are written before the first ctx check, so io.Pipe callers must propagate ctx.Err() via CloseWithError on cancellation.
  • pkg/dotc1z/engine/pebble/register.goregisteredStore.save adopts the detach + new-root pattern. Honors the propagated parent ctx-err label from PR Detach ctx for c1z finalize so caller cancel cannot truncate the artifact #891 so the inner finalize span reports the real caller cancel state.
  • pkg/dotc1z/engine/pebble/adapter_clone_sync.go — passes ctx through to WriteEnvelope.
  • pkg/dotc1z/iox/copy_test.go (new) — 7 unit tests: happy, empty source, cancel-before-start, cancel-mid-stream (with read counter to prove the source is not over-consumed), source-error, dst-error, short-write.
  • pkg/dotc1z/format/v3/envelope_test.go — adds TestWriteEnvelopeCanceledMidWalk (tee-writer fires cancel after the first header bytes hit the writer; asserts wrapped error names the in-flight entry) and TestWriteEnvelopeEmptyPayloadDir.
  • pkg/dotc1z/engine/pebble/registered_store_save_cancel_test.go (new) — pebble symmetry with PR Detach ctx for c1z finalize so caller cancel cannot truncate the artifact #891's TestC1FileCloseSurvivesCanceledCtx. A regression that replaces finalizeCtx with the caller ctx in save() would fail this test.

Deferred to follow-up

writeZstdTar still allocates a fresh zstd.Encoder per save instead of using pkg/dotc1z/pool.go's existing encoder pool. Using the pool from format/v3 requires either moving the pool to a shared subpackage or threading an encoder via a new option — out of scope for this PR.

Test plan

🤖 Generated with Claude Code

@arreyder arreyder force-pushed the ctx-detach-finalize branch from 81d1b03 to 73a98cf Compare May 28, 2026 16:52
@arreyder arreyder force-pushed the ctx-aware-envelope branch from ad6a638 to 1567f72 Compare May 28, 2026 16:58
@linear-code

linear-code Bot commented May 28, 2026

Copy link
Copy Markdown

FDE-244

Base automatically changed from ctx-detach-finalize to main May 29, 2026 00:35
@arreyder arreyder requested review from a team and btipling May 29, 2026 00:35
@arreyder arreyder force-pushed the ctx-aware-envelope branch from 1567f72 to 3bcd4cd Compare May 29, 2026 14:05
Comment thread pkg/dotc1z/iox/copy.go
Comment on lines +48 to +50
if errors.Is(err, io.EOF) {
return total, nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: io.CopyBuffer never returns io.EOF from a source-side EOF — it swallows it and returns (n, nil). The only way this check triggers is if dst.Write() returns io.EOF as a write error, in which case this would incorrectly treat it as successful completion. The check is effectively dead code today (the n < ctxCopyChunkSize branch at line 59 handles normal source exhaustion), but if you want to keep a defensive guard, consider checking err == io.EOF (identity, not wrapped) to avoid swallowing a wrapped writer error.

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

General PR Review: Thread ctx through v3 envelope, add CtxCopy helper, and detach pebble finalize

Blocking Issues: 0 | Suggestions: 2 | Threads Resolved: 0
Review mode: full
View review run

Review Summary

This PR threads context.Context through WriteEnvelope and the tar walk/copy paths, adds a new iox.CtxCopy helper for context-aware chunked copying, and detaches the Pebble save() from caller cancellation using context.WithoutCancel bounded by FinalizeTimeout(). The implementation is clean and well-tested — the detach pattern, span linking, and per-entry/per-chunk ctx checks all look correct. Two minor suggestions below: an error-wrapping inconsistency in writeTar vs writeZstdTar, and a note on the exported WriteEnvelope signature change for downstream callers. The previously flagged errors.Is(err, io.EOF) dead-code concern in CtxCopy remains unchanged.

Security Issues

None found.

Correctness Issues

None found.

Suggestions

  • pkg/dotc1z/format/v3/envelope.go:316: writeTar copy errors lack entry-name wrapping, unlike the parallel writeZstdTar path — makes TAR-path failures harder to diagnose.
  • pkg/dotc1z/format/v3/envelope.go:80: WriteEnvelope is an exported function whose signature changed (added ctx context.Context). Downstream callers outside this repo will need a coordinated update.
Prompt for AI agents
Verify each finding against the current code and only fix it if needed.

## Suggestions

In `pkg/dotc1z/format/v3/envelope.go`:
- Around line 320: The `writeTar` function returns bare copy errors (`return err`) while the parallel `writeZstdTar` wraps them with the entry name (`fmt.Errorf("c1z v3: tar_zstd copy %q: %w", rel, err)`). Wrap the error on line 320 as `return fmt.Errorf("c1z v3: tar copy %q: %w", rel, err)` for diagnostic parity.

- Around line 80: `WriteEnvelope` is exported and its signature changed from `(io.Writer, *c1zv3.C1ZManifestV3, string) error` to `(context.Context, io.Writer, *c1zv3.C1ZManifestV3, string) error`. Ensure all downstream consumers outside this repo are updated. No code change needed here if downstream coordination is planned.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

arreyder added a commit that referenced this pull request May 30, 2026
…ally bounds the loop (#899)

* synccompactor: pass runCtx into the doOneCompaction loop

Compact() built runCtx = context.WithTimeout(ctx, c.runDuration) but the
loop body called doOneCompaction(ctx, ...). The deadline only gated the
pre-flight select; individual compactions ran under the unbounded parent
ctx and could blow past c.runDuration.

This is the sibling of the bug fixed in #893 (WriteEnvelope walking under
an unbounded context): a deadline-bearing context exists but the inner
work uses a different one, so the deadline dangles instead of bounding
the work.

Pass runCtx into doOneCompaction so c.runDuration actually limits the
loop, and when runCtx fires due to its deadline (rather than parent
cancellation), surface the same clean "compaction run duration has
expired" message the pre-flight check already uses — instead of letting
a bare context.DeadlineExceeded bubble out of inner SQLite operations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(synccompactor): cover runDuration-expired and parent-cancel disambiguation

TestCompactorRunDurationExpired locks in the user-visible contract that
when c.runDuration expires (with parent ctx still alive), Compact returns
the clean "compaction run duration has expired" error wrapping
context.DeadlineExceeded.

TestCompactorParentCtxCancelDoesNotMaskAsRunDuration locks in the
disambiguation between runDuration expiry and parent ctx cancellation —
a parent-cancelled runCtx must surface as the raw cause (context.Canceled)
and must NOT be misreported as runDuration expiry. Without the disambiguation
either branch could silently regress and mislabel cancellation as a deadline.

Both tests reuse the existing makeEmptySync helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
arreyder and others added 3 commits June 5, 2026 08:26
… dangle it

WriteEnvelope previously held no context: a caller cancellation during
the payload tar/zstd walk had to wait for filepath.Walk to drain the
entire Pebble checkpoint before the function returned. Big-tenant
checkpoints are GBs; that wait is the gap during which the io.Pipe
upload pattern can end up with clean-EOF semantics instead of
CloseWithError.

WriteEnvelope now takes ctx as its first parameter. The walk callbacks
check ctx.Err() per entry, and each regular-file copy is routed through
new CtxCopy: chunked io.CopyN (4 MiB) with a ctx.Done() check between
chunks. Bounded worst-case cancel latency stays well under a second
even on slow disks while keeping the per-chunk select overhead
invisible against typical c1z throughput.

registeredStore.save (the Pebble equivalent of C1File.finalize) now
runs the Checkpoint + WriteEnvelope + rename sequence on a context
detached from the caller via dotc1z.FinalizeTimeout(), inside a
uotel.StartWithLink new-root span. This protects the Pebble write path
even when the caller is a direct C1ZStore consumer that didn't already
detach (the syncer.Close path from PR 1 still detaches once at the
outer layer; double-detach is harmless and gives defense in depth).

Tests cover CtxCopy's four behaviors (happy path, cancel-before-start,
cancel-mid-stream after one chunk, source-error propagation) plus
WriteEnvelope refusing to walk on a cancelled context. The existing
envelope round-trip tests are updated to pass t.Context() through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…k cancel test, single-alloc copy, error context

Multi-angle review surfaced 10 fix-now items. Nine applied here; one
deferred (encoder-pool reuse for writeZstdTar) because pkg/dotc1z's
pool.go can't be imported from pkg/dotc1z/format/v3 without a
restructure — filed as follow-up.

1. CtxCopy relocated. pkg/dotc1z/format/v3 was the wrong home for a
   generic ctx-aware copy helper; sibling subpackages
   (synccompactor, manager/s3, c1's c1age) would have ended up
   importing format/v3 for an unrelated dep. New home:
   pkg/dotc1z/iox — a leaf subpackage of dotc1z importable by
   anything in the tree without circular risk.

2. Single allocation per CtxCopy call. Old impl called io.CopyN
   per chunk, which means io.Copy allocated a fresh 32 KiB buffer
   every iteration. New impl holds one buffer and uses
   io.CopyBuffer + io.LimitReader. At Lilly's 72 GiB checkpoint
   that's the difference between ~18k 32 KiB allocations vs one,
   per file walked.

3. Error context on cancellation. CtxCopy now wraps ctx.Err()
   with "iox.CtxCopy: copied %d bytes: %w". writeZstdTar/writeTar
   wrap their per-entry ctx.Err with "c1z v3: tar(_zstd) walk
   canceled at %q: %w" so operators triaging "context canceled"
   in logs can see exactly which Pebble checkpoint file the walk
   was visiting.

4. ctxCopyChunkSize unexported. Nothing external needed it and
   exporting locked it into the public API surface.

5. CtxCopy doc fixed: "checks ctx.Done()" → "checks ctx.Err()"
   (matches the implementation).

6. WriteEnvelope godoc tightened. Now explicit that magic, length,
   and manifest bytes are written BEFORE the first ctx check
   inside the walk, so callers using WriteEnvelope with an io.Pipe
   must propagate ctx.Err() via CloseWithError to make downstream
   consumers (e.g. s3manager) abort instead of finalizing a
   truncated upload.

7. TestRegisteredStoreSaveSurvivesCanceledCtx — Pebble symmetry
   with PR 1's TestC1FileCloseSurvivesCanceledCtx. Verifies a
   pre-cancelled or pre-deadlined ctx through Close still produces
   a durable, readable v3 envelope on disk.

8. TestWriteEnvelopeCanceledMidWalk — covers the mid-walk cancel
   case the prior test couldn't (it only handled pre-call cancel).
   Uses a teeWriter + cancelOnWriteOnce to fire the cancel after
   the very first header bytes hit the writer, then asserts the
   wrapped error names the in-flight tar entry.

9. CtxCopy edge cases — TestCtxCopyEmptySource (degenerate
   zero-byte file in payloadDir), TestCtxCopyPropagatesDstWriteError,
   TestCtxCopyPropagatesDstShortWrite. Plus
   TestWriteEnvelopeEmptyPayloadDir for the "Pebble checkpoint
   with no SSTs" round-trip.

10. Two pre-existing unused //nolint:gosec directives in the tar
    walkers were now flagged by nolintlint because the lines moved;
    removed them.

Deferred follow-up:
- writeZstdTar still allocates a fresh zstd.Encoder per save.
  pkg/dotc1z/pool.go has getEncoder/putEncoder for exactly this
  case; using it from format/v3 needs either a shared subpackage
  for the pool or an injection-based API. Out of scope for this
  review pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…filenames

PR 893 CI failures, fixed:
- gosec G122 fires under CI's pinned golangci-lint v2.12.2 on the
  os.Open(path) inside both writeTar and writeZstdTar walk callbacks.
  Local v2.8.0 doesn't have G122 yet, so nolintlint there reports
  the directive as unused. CI is authoritative.
- TestWriteEnvelopeCanceledMidWalk built filenames with
  string(rune('a'+i)) which generates Windows-invalid characters
  ('|', '{', '}', '~') at i >= 27. Switch to fmt.Sprintf("entry-%02d",
  i) so go-test on windows-latest can create the inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arreyder arreyder force-pushed the ctx-aware-envelope branch from 3bcd4cd to a7c0026 Compare June 5, 2026 13:27
// side is closed with CloseWithError(ctxErr) on cancellation, so a
// downstream consumer (e.g. s3manager) aborts instead of finalizing
// a truncated upload.
func WriteEnvelope(ctx context.Context, w io.Writer, m *c1zv3.C1ZManifestV3, payloadDir string) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Adding ctx context.Context as the first parameter changes the exported signature of WriteEnvelope. Any downstream caller outside this repo (e.g. connectors or tools importing pkg/dotc1z/format/v3) will fail to compile until updated. If external callers exist, consider coordinating the rollout or bumping the module version.

return err
}
_, err = io.Copy(tw, f)
_, err = iox.CtxCopy(ctx, tw, f)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: In writeZstdTar, copy errors are wrapped with the entry name (fmt.Errorf("c1z v3: tar_zstd copy %q: %w", rel, err)), but the equivalent error path here in writeTar (line 320) returns the bare error. Consider wrapping for parity so operators can identify which file failed during a TAR-encoded write.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant