chore(deps): update rust crate distributed_cli to v1.8.1#61
Open
renovate[bot] wants to merge 1 commit into
Open
chore(deps): update rust crate distributed_cli to v1.8.1#61renovate[bot] wants to merge 1 commit into
renovate[bot] wants to merge 1 commit into
Conversation
b854593 to
bea2b4f
Compare
bea2b4f to
3966304
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
1.6.3→1.8.1Release Notes
patrickleet/distributed (distributed_cli)
v1.8.1Compare Source
What's changed in v1.8.1
perf: reduce commit/hydrate hot-path round trips and clones (#87) (by @patrickleet)
hydrate() previously deep-cloned the entire event history via
events().to_vec() on every replay even with no upcasters, purely to
satisfy the borrow checker. Take the events out of the owned entity,
replay from the local Vec, and restore via load_from_history. The
no-upcaster path now replays with zero clones; the upcaster path keeps
its single bounded clone (the durable history is restored verbatim).
The snapshot tail path (hydrate_prepared_snapshot) did a similar
.cloned() to collect post-snapshot events. Replay the common
(no-upcaster) path directly from a filtered borrow so the tail is no
longer cloned, while still restoring the full history so new_events()
slicing on the next commit is unchanged. PR #81's snapshot+tail
fetch behavior is preserved.
Adds Entity::take_events / Entity::restore_history for the
borrow-out-then-restore pattern.
Hydrated state is identical: same replay order, same final
version/committed_version, same upcaster semantics.
commit_batch issued one INSERT per event and one INSERT per outbox
message inside the open transaction, holding row/index locks across
many round trips. Batch each into a single multi-row INSERT built with
QueryBuilder::push_values (postgres) — and the same, chunked under
SQLite's bound-parameter limit (sqlite). The per-stream version
pre-check is kept: it enforces sequence contiguity, which the unique
PK alone does not catch for expected<actual gaps.
Conflict detection is unchanged. An event unique violation still maps
to ConcurrentWrite (postgres re-reads the conflicting stream's version
over the pool because a failed statement aborts the tx; sqlite re-reads
in-tx since SQLite does not abort on a constraint error). An outbox
unique violation still maps to DuplicateOutboxMessageInBatch.
get_streams looped awaiting a query per identity (N+1). Replace with one
WHERE aggregate_type = $1 AND aggregate_id = ANY($2) ORDER BY
aggregate_id, sequence query per aggregate type (sqlite uses a bound IN
list, having no array type), splitting the flat result into entities
client-side. Callers of get_all already accept storage-order results.
Ordering and hydrated state are unchanged.
publish() owns the Message and drops it immediately after, but cloned
message.payload before handing it to JetStream. Move the buffer out with
std::mem::take and convert via Bytes::from(Vec) (zero-copy) instead.
Behavior is unchanged: the same bytes are published.
Implements [[tasks/commit-hydrate-hot-path-efficiency]]
See full diff: v1.8.0...v1.8.1
v1.8.0Compare Source
What's changed in v1.8.0
feat: give RepositoryError and LockError a retryability signal (#83) (by @patrickleet)
repository_storage_error mapped every sqlx failure — connection refused,
pool timeout, SQLITE_BUSY, and constraint violations alike — to
RepositoryError::Model(String). RepositoryError had no storage shape and no
source(), and From/From collapsed to
strings. Downstream, HandlerError::transport_error_kind classified every
Repository(_) as Retryable, so a deterministic model error redelivered
forever while a transient DB outage was indistinguishable from it.
Add a Storage { operation, retryable, source } variant to RepositoryError
plus a kind() -> RetryClass method, mirroring TransportError. RetryClass
lives in the lock module (the lowest layer both errors reach) so repository
does not depend on the bus. repository_storage_error now classifies sqlx
errors via is_sqlx_transient (connection/pool/timeout and SQLITE_BUSY ->
retryable; constraint/decode -> permanent) and preserves the source. The
From impls map to Storage, keeping the source and the transient/permanent
distinction. ConcurrentWrite and NotFound stay retryable (race resolved by
a later attempt); Model/Replay/invalid-identity/InvalidState are permanent.
HandlerError::transport_error_kind now defers to RepositoryError::kind()
instead of bucketing all Repository(_) as retryable.
LockError gains a Busy variant and the same kind()/is_retryable() signal:
Poisoned is permanent (broken invariant); contention, lease loss, and busy
are retryable. The existing AcquireFailed/ReleaseFailed lease mappings were
already retryable under this classification.
Storage carries a boxed source, so RepositoryError drops Clone/PartialEq/Eq
(they were test-only). The handful of assert_eq! comparisons became
matches! on the variant. Both enums are now #[non_exhaustive] so future
variants are non-breaking.
BREAKING CHANGE: RepositoryError no longer derives Clone/PartialEq/Eq, both
RepositoryError and LockError are #[non_exhaustive], and From
/From now produce RepositoryError::Storage instead of
::Model.
Implements [[tasks/repository-error-taxonomy]]
The
retryable_failure_returns_503ingress test handler returnedRepositoryError::Model("transient"). After the error-taxonomy rework,Modelis — correctly — a deterministic fault classified Permanent, sothe ingress mapped it to 422 (do-not-retry), failing the assertion.
This was a stale expectation asserting the old redeliver-forever bug, not
a misclassification: a
Modelerror is deterministic and re-running theidentical message cannot change its outcome, so it MUST be permanent (the
deterministic_repository_errors_are_permanentunit test pins this).The fix is in the test, not the classification logic. The handler now
returns
RepositoryError::retryable_storage("load stream", ...)— atransient storage outage (connection refused) that genuinely may succeed
on redelivery — which is what a "temporarily failed" handler should
signal. The 503 redeliver semantics the test names are now honestly
exercised, and the deterministic-permanent path is covered by
order.rejected-> 422.Refines [[tasks/repository-error-taxonomy]]
A read-model CHECK-constraint violation is a deterministic, non-retryable
fault. With the reworked error taxonomy it now surfaces as the new
RepositoryError::Storage { retryable: false, .. }variant rather than theold
RepositoryError::Model(String), so re-running the identical commit isnever retried forever.
PR #85 added
read_model_failure_mid_plan_rolls_back_events_and_outboxtoboth the sqlite and postgres repository suites asserting the old
Modelerror. Update both assertions to expect the permanent
Storagevariant(and
!err.is_retryable()), matching the corrected classification and theidiom already used by this PR's own backend-storage tests. The rollback
invariant the test actually guards — events, outbox row, and the first
read-model mutation all absent after the failed commit — is unchanged.
Refines [[tasks/repository-error-taxonomy]]
40001 (serialization_failure) and 40P01 (deadlock_detected) are transient
write-race outcomes that should be retried, not handed to the failure policy.
is_sqlx_transient previously only matched pool/timeout/IO and SQLite busy, so
these fell through as permanent. Mirrors the existing 23505 unique-violation
detection (no feature gate: SQLite never carries these SQLSTATEs).
Addresses CodeRabbit review on PR #83.
Refines [[tasks/repository-error-taxonomy]]
See full diff: v1.7.6...v1.8.0
v1.7.6Compare Source
What's changed in v1.7.6
fix: harden transport and HTTP ingress against hostile input (S3-S6) (#84) (by @patrickleet)
Security hardening batch for the Knative/HTTP ingress, bus message-name
routing, consumer inbox growth, generated DDL defaults, and request body
limits. Excludes the gRPC error-masking item (owned by a separate PR).
S3 — Knative ingress error leak: the CloudEvents ingress returned
err.to_string()verbatim, which can carry SQL/driver/path detail.Internal faults are now masked to "Internal server error" (and logged
server-side) by reusing a shared
HandlerError::redacted_message()helper, which the HTTP ingress now also uses instead of its private
masking fn — single source of truth.
S4 — Message name validation:
Message.nameflows unmodified into theNATS subject, Kafka topic, and RabbitMQ routing/binding key but had no
rules. Added
validate_message_name(mirrorsvalidate_stable_message_id):rejects empty, over-long (>256B), control-character-bearing, and
wildcard-bearing (
*/#/>) names;.stays allowed since dotted typenames are the convention. Enforced inbound on the attacker-controlled
ce-type(binary + structured) and outbound on the RabbitMQsend/publish/bind paths (a
#/*in a binding key is a subscriptionwildcard).
.routing semantics documented onMessage::name.S5 — Unbounded inbox growth: added
InboxStore::purge_inbox_older_than(Postgres/SQLite issue a DB-clock-relative bounded DELETE; HashMap is a
documented no-op). HashMapRepository gains
clear_inboxas the in-memoryequivalent and its inbox is now marked dev-only in rustdoc. Retention is
documented as the operator's responsibility.
S6 — Raw DEFAULT in generated DDL: the table SQL generator spliced
column.defaultunquoted — the one unescaped hole in an otherwise fullyquoted generator. Now validated against an allowlist (numeric/boolean/
NULL/CURRENT_TIMESTAMP keywords or a properly-escaped single-quoted
literal); anything else fails generation loudly. Atlas consumes the
validated output unchanged.
Body limits: pinned axum's implicit 2MiB default to an explicit 1MiB
DefaultBodyLimiton both the command router and the CloudEventsingress (both buffer the whole body), via one shared
MAX_HTTP_BODY_BYTES.Doc caveat:
Message::payload_bitcodenow warns bitcode is not hardenedagainst hostile input — decode only from trusted producers.
Implements [[tasks/transport-ingress-security-hardening]]
Co-authored-by: Claude Fable 5 noreply@anthropic.com
See full diff: v1.7.5...v1.7.6
v1.7.5Compare Source
What's changed in v1.7.5
chore: add red-team concurrency and failure-injection conformance tests (#85) (by @patrickleet)
Pin guarantees the publish-on-commit + snapshot work introduced that were
previously only covered by unit tests against fakes. All tests exercise
EXISTING behavior; none required a production change, and all pass against
current code (no #[ignore]d tests).
Tests added:
N tasks load the same aggregate at v0 and commit behind a barrier; asserts
exactly 1 success, N-1 ConcurrentWrite, and a gap/duplicate-free final
stream — the unique-PK backstop behind the version-check TOCTOU window under
real concurrency.
backends): worker A claims a short lease and "crashes"; after expiry worker B
reclaims and completes; A's late complete/release are fenced (InvalidState).
Exercises the SQL claimed_until expiry predicate end-to-end.
all 3 backends): dispatcher with a publisher that fails twice then succeeds;
row goes Pending after each failure (attempts incrementing), then Published,
with exactly one bus delivery.
(sqlite_repository + postgres_repository): a commit with events + outbox + a
two-mutation read-model plan whose second mutation hits a real DB constraint
(SQLite trigger / Postgres CHECK); asserts events, outbox row, and the first
mutation are all absent (whole-tx rollback).
repo-path variant (upcasting): a stream with an unregistered event_name fails
hydration/get with a RepositoryError::Replay naming the event.
for one aggregate; asserts final version == N+1 and event order equals the
lock-grant order.
arbitrary command sequences — reload reproduces in-memory state; snapshot
hydration at any frequency equals full replay.
Adds proptest as a dev-dependency. Postgres-backed variants env-gate on
DATABASE_URL and skip locally; SQLite/HashMap variants run without docker.
Bugs discovered: none. Every test passes against current code.
Implements [[tasks/red-team-conformance-tests]]
Co-authored-by: Claude Fable 5 noreply@anthropic.com
fix: route corrupt PostgresBus rows through failure policy (#80) (by @patrickleet)
A
bus_queue/bus_logrow whose required columns (name,kind,payload) failed to decode was silently lost.message_from_rowusedtry_get(...).unwrap_or_default(), so a corrupt row became aMessagewith an empty
name. The runner classifies an empty/unrouted name as"no handler", which takes the ack-and-ignore path:
QueueReceived::ackdeletes the row and
LogReceived::ackadvances the consumer offset pastit. The corrupt row vanished with no trace, bypassing the otherwise
careful never-silently-drop failure policy.
Silent-data-loss path:
recv -> message_from_row (unwrap_or_default -> name = "")
-> runner: router.handles(kind, "") == false
-> received.ack() (queue: DELETE row; log: advance offset)
-> message gone, no dead-letter, no log
Fix:
message_from_rowreturnsResult<Message, TransportError>, raising aPERMANENT error when a required column is missing or fails to decode.
message_id/content_type/metadatakeep tolerant defaults (optionalor schema-defaulted; they don't make a message unhandleable).
recvclaims the row/offset before decoding, so the claimmust still be settled when decoding fails.
QueueReceived/LogReceivedcarry the decode error and surface it via a new
ReceivedMessage::decode_error()(defaults toNone, so the other 8adapters are unchanged).
decode_error()first and routes the claimed rowthrough the configured
FailurePolicy— dead-letter by default — exactlylike a permanent dispatch failure. Queue corrupt rows are dead-lettered
(deleted, logged), log corrupt rows advance the offset past the poison
entry (don't get stuck), and the failure is visible instead of silent.
Tests:
(empty) name is dead-lettered under the default policy (not ack-and-
ignored), parked under Park, and stops with a permanent error under Stop.
corrupt
bus_queuerow leaves the queue while the valid row beside it ishandled; a corrupt
bus_logrow advances the consumer offset past itwhile the valid following event is handled.
Implements [[tasks/postgres-bus-corrupt-row-handling]]
The prior commit routed undecodable rows through the failure policy, but
that path was unreachable for the most common corruption: a NULL
name.Both sources select work by name — the queue claim filters
WHERE name = ANY($2)and the log read filtersWHERE name = ANY($1).A row whose
namecolumn is NULL matches neither, so it is never claimed(queue) or never read (log). The decode-error handling never fired because
the corrupt row was never even fetched. The queue row therefore sat
undelivered forever (the failing test: queue still had 1 row), and a
corrupt log entry was silently skipped when the offset jumped to a later
healthy entry — dropped with no trace, the exact ack-and-ignore behavior
the hardening set out to prevent.
The missing piece was not settlement identity (the claim already captures
seqindependently of payload decode); it was visibility: an un-routablerow must still be selectable so the runner can settle it.
Fix: both sources also select rows with a NULL
name(un-routablepoison). Such a row belongs to no consumer, so claiming/reading it to
dead-letter it is correct —
FOR UPDATE SKIP LOCKEDkeeps the queue claimsafe under competing listeners, and each log group advances its own offset
past it. The runner then surfaces the decode error and routes it through
the failure policy (dead-letter by default: the queue row is deleted, the
log offset advances past the entry).
Also strengthen the log test: it now places a corrupt entry as the highest
seq, so a consumer that silently skips by name would leave the offset short
of max_seq. Reaching max_seq proves the corrupt entries were settled
through the policy, not skipped. Verified against a real Postgres 18
(docker compose): the improved test fails without the LogSource fix and
passes with it; full postgres_transport (9) and lib (253) suites green.
Refines [[tasks/postgres-bus-corrupt-row-handling]]
See full diff: v1.7.4...v1.7.5
v1.7.4Compare Source
What's changed in v1.7.4
refactor: harden distributed_macros expansion and diagnostics (#82) (by @patrickleet)
Restructure the four proc-macros in distributed_macros/src/lib.rs
(sourced, aggregate!, digest, enqueue) to the testable
expand_* -> syn::Result<TokenStream2>shape already used bysnapshot.rs and read_model.rs. The thin
#[proc_macro*]entry pointsnow convert errors via
.unwrap_or_else(|e| e.to_compile_error()).Generated output is byte-identical — only the structure and error
plumbing changed (verified: all 250+ macro-using tests in the main
crate still pass).
Diagnostics: unknown keyword args in parse_digest_args,
parse_sourced_args, parse_event_args, and parse_enqueue_args now
produce pointed, spanned syn::Error messages that name the bad key and
list the valid ones, instead of being silently left unconsumed and
surfacing as a bare "unexpected token". Also added up-front checks for
duplicate #[event] names and #[event] methods missing a self receiver,
and a clearer "missing entity field" message for bare #[sourced()].
enqueue fix: #[enqueue] now accepts
entity = fieldso a renamedentity field produces a correct
is_replaying()guard instead of aconfusing "no field
entity" error pointing at the user's method.ReplayError: kept as
String. Replay errors are flattened fromheterogeneous sources (per-event decode errors, user method errors of
arbitrary E, unknown-event messages) via
e.to_string(); a typederror would have to be generic over each method's error type or erase
them anyway. Rationale documented inline.
Tests: added unit tests for the new expand_/parse_ functions and a
trybuild compile-fail suite (tests/compile_fail/*.rs + harness)
covering an unsupported #[event] signature, duplicate event names,
unknown attribute keys, #[sourced] missing the entity field, and the
renamed-entity #[enqueue] footgun.
Implements [[tasks/macro-crate-hardening]]
See full diff: v1.7.3...v1.7.4
v1.7.3Compare Source
What's changed in v1.7.3
fix: skip snapshotted I/O on load + gate decode on schema version (#81) (by @patrickleet)
Snapshot hydration previously fetched the ENTIRE event stream and only then
filtered out events already covered by the snapshot, so a fresh snapshot over a
long stream still paid full I/O and decode cost — snapshots saved replay CPU but
not the dominant I/O. And the schema-version field was written but never checked:
bitcode is positional, so a layout-compatible change to a Snapshot struct would
decode SUCCESSFULLY into wrong state and then commit new events atop corruption.
Task 1 — skip already-snapshotted events (perf):
get_stream; Postgres/SQLite override with
WHERE sequence > $version; Queuedforwards under its lock). The single-aggregate
gethot path now reads thesnapshot FIRST, then fetches only the tail.
prefix_version(events folded into a snapshot andnot held in memory) so version/committed_version/new_events and event
sequencing stay correct when only a tail is loaded. Add
Entity::load_tail_from_history.
stream version in both load shapes.
Task 2 — schema-version gate (fix):
the derive (rejects zero / non-integer with helpful errors; emits the const
only when set so existing impls are unaffected).
-> full replay, never a decode of incompatible bytes.
(cache miss -> replay) like the internal path, instead of turning cache misses
into hard Replay errors.
gated on (type_name is unstable across compilers; gating would cause spurious
misses). Schema compatibility is enforced solely by SNAPSHOT_VERSION.
Tests: sqlite-backed proofs that a snapshot-hydrated load reads only the tail
(pre-snapshot rows deleted, load still correct), that a stale-schema snapshot
falls back to replay with correct state, and that snapshot-hydrated state equals
full-replay state; plus entity tail-sequencing unit tests and a derive
version-attribute test.
Implements [[tasks/snapshot-hydration-skip-replayed-events]] and [[tasks/snapshot-schema-version-gating]]
Co-Authored-By: Claude Fable 5 noreply@anthropic.com
snapshot_type was a write-only
type_namefield: written into everySnapshotRecord and persisted, but never read to make a decision. Since
type_nameis explicitly unstable across compiler versions, it couldnever have been a safe compatibility gate anyway.
SNAPSHOT_VERSION (Snapshottable::SNAPSHOT_VERSION, written into each
record and compared on load) is the sole compatibility gate for cached
snapshots: a version mismatch is treated as a cache miss and the
aggregate is rebuilt by full replay. With that in place, snapshot_type
is pure vestigial scaffolding.
This removes it end to end: the SnapshotRecord field, the constructor
parameter, the empty-string validation, the now-unused
DEFAULT_SNAPSHOT_VERSION const, the snapshot_type_name helper, the
sqlite/postgres SELECT/INSERT/upsert/bind/row-read paths, the migration
columns and their CHECK constraints, and all test call sites and
assertions. The SNAPSHOT_VERSION doc comments are reworded to state the
bump-on-layout-change intent positively rather than framing the default
as backwards-compat for existing implementations.
Pre-release cleanup: there is no released version or persisted data to
stay compatible with.
Refines [[tasks/snapshot-schema-version-gating]]
Co-authored-by: Claude Fable 5 noreply@anthropic.com
See full diff: v1.7.2...v1.7.3
v1.7.2Compare Source
What's changed in v1.7.2
fix: trust gRPC metadata over payload session vars; mask internal errors (#79) (by @patrickleet)
The gRPC transport let the request payload
session_variablesoverridetransport metadata when building the
Session. Behind a trusted gatewaythat injects authenticated identity headers as gRPC metadata, a client
could spoof identity by putting
x-hasura-user-id/x-hasura-roleinthe request body (e.g. claim role
adminor impersonate another user) —the payload silently won. This is an identity-spoofing hole.
Trust model (now documented loudly on
Session, the HTTP/gRPC entrypoints, and the README "Security / Trust Boundary" section): the
framework does NOT authenticate. A trusted proxy must strip
client-supplied
x-hasura-*headers and inject only authenticated ones.Transport metadata/headers are trusted; the request payload is not.
Changes:
build_session: apply payload vars first, then let trustedmetadata overwrite colliding keys. Metadata now wins. Payload-only
keys still pass through (preserves the Hasura-action path where
verified claims arrive in the payload with no metadata injected).
HandlerError::client_facing_message(), masking internal (5xx)detail to "Internal server error" and logging the original
server-side. Previously gRPC returned raw
e.to_string(), whichcould leak SQL/driver detail — HTTP already masked. The masking
policy now lives in one place (error.rs) and both transports reuse
it (no duplicated logic).
Session,session_from_headers, andbuild_session; README HTTP/gRPCtransport notes plus a dedicated "Security / Trust Boundary" section.
payload-applies-when-metadata-absent; HTTP client-supplied identity
header trusted verbatim (documents why the proxy is required).
Implements [[tasks/grpc-session-metadata-precedence]]
Also covers the gRPC error-masking item from
[[tasks/transport-ingress-security-hardening]]
Co-authored-by: Claude Fable 5 noreply@anthropic.com
See full diff: v1.7.1...v1.7.2
v1.7.1Compare Source
What's changed in v1.7.1
perf: make Kafka ack/nack non-blocking on the async runtime (#78) (by @patrickleet)
Per-message offset commits used CommitMode::Sync, which blocks the calling
OS thread on a broker round trip; called from the async
ack(anddead_letter/park) for every message, combined with the strictly sequential
consume loop in bus/runner.rs, this capped throughput and stalled the tokio
worker. nack additionally called a blocking seek with a 5s timeout in async
context.
Switch offset commits to CommitMode::Async: the offset is handed to
librdkafka's background thread and returns immediately. This is at-least-once
(unchanged from the current contract): the runner already acks only after
handler effects complete, so a crash before an async commit lands simply
redelivers the record — a duplicate the consumer already tolerates.
seek has no async variant, so run it on tokio's blocking pool via
spawn_blocking (the Arc-shared consumer clones cleanly into the task).
Implements [[tasks/kafka-nonblocking-ack]]
See full diff: v1.7.0...v1.7.1
v1.7.0Compare Source
What's changed in v1.7.0
chore(deps): update codecov/codecov-action action to v7 (#76) (by @renovate[bot])
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
feat: infer and harden bus consumer topology (#77) (by @patrickleet)
Adds Service::named-derived consumer groups, awaitable bus constructors, shared topology validation, and inferred-group transport coverage.
Implements [[tasks/infer-bus-topology-from-service-name]] and [[tasks/harden-inferred-bus-topology]].
Addresses CodeRabbit review on [[tasks/address-coderabbit-inferred-bus-topology]].
See full diff: v1.6.3...v1.7.0
Configuration
📅 Schedule: (UTC)
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.