Commit 144bf72
authored
perf: Hoist
## Summary
Follow-up to #2674. `Transaction.table_metadata` replays all staged
updates via `model_copy(deep=True)` on every access, so reading it (or
`spec()`/`schema()` derived from it) repeatedly within a single
snapshot-producer method is redundant deep-copy work.
\#2674 hoisted the property access in `_summary()`; this PR extends the
same pattern to three more call sites in
`pyiceberg/table/update/snapshot.py` that still read the property more
than once per invocation.
## Changes
- `_SnapshotProducer._summary`: hoist `spec()`/`schema()` out of the
per-data-file loop (they are invariant across files; still called 2× per
file before this change)
- `_DeleteFiles._compute_deletes`: hoist `table_metadata`/`schema` once
at method entry (was 3 accesses — two via `self.schema()` for the
metrics evaluators and one direct for `snapshot_by_id`)
- `_MergeAppendFiles.__init__`: 3 consecutive
`self._transaction.table_metadata.properties` accesses → 1
All hoists are at method entry. Nothing inside these methods stages a
transaction update (the `AddSnapshotUpdate` is staged by the caller
after `_commit()` returns), so `table_metadata` is invariant for the
duration of each method.
Not touched here: the `new_manifest_writer(self.spec(id))` calls inside
per-manifest loops in `_write_delete_manifest` / `_compute_deletes` /
`_OverwriteFiles._existing_manifests` also trigger 2–3 property accesses
per iteration via the `schema()`/`spec()`/`new_manifest_writer()`
helpers. Those loops are O(partition-groups or rewritten-manifests)
rather than O(files), and fixing them cleanly would mean changing the
helper signatures — happy to do that in a follow-up if there's interest.
## Testing
New `test_snapshot_producer_bounded_metadata_access` wraps
`Transaction.table_metadata` with a call counter and asserts:
- `_summary()` access count is identical for 10 vs 100 appended files
(independent of N), and ≤ 2
- `_MergeAppendFiles.__init__` makes exactly 1 more access than
`_FastAppendFiles.__init__` (was 3 before this change — verified the
test fails with the production diff reverted)
The test constructs `_FastAppendFiles` / `_MergeAppendFiles` directly
rather than going through the public append path, since the public path
writes manifest avro files; the property-access count it measures is the
behaviour under test and doesn't require I/O.
Existing `tests/table/test_snapshots.py` passing.
## Motivation
For appends/deletes/overwrites touching large numbers of files or
manifests, the per-iteration property access dominates wall-clock (each
access replays the staged-updates list through pydantic `model_copy`).
This keeps the cost constant per method call.
---------
Co-authored-by: Ruiyang Wang <rynewang@users.noreply.github.com>table_metadata at remaining repeat-access (#3301)1 parent 842d01c commit 144bf72
2 files changed
Lines changed: 58 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
228 | 228 | | |
229 | 229 | | |
230 | 230 | | |
| 231 | + | |
| 232 | + | |
231 | 233 | | |
232 | 234 | | |
233 | 235 | | |
| |||
239 | 241 | | |
240 | 242 | | |
241 | 243 | | |
242 | | - | |
243 | | - | |
| 244 | + | |
| 245 | + | |
244 | 246 | | |
245 | 247 | | |
246 | 248 | | |
| |||
249 | 251 | | |
250 | 252 | | |
251 | 253 | | |
252 | | - | |
| 254 | + | |
253 | 255 | | |
254 | 256 | | |
255 | 257 | | |
| |||
424 | 426 | | |
425 | 427 | | |
426 | 428 | | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
427 | 433 | | |
428 | | - | |
429 | | - | |
430 | | - | |
| 434 | + | |
431 | 435 | | |
432 | | - | |
| 436 | + | |
433 | 437 | | |
434 | 438 | | |
435 | 439 | | |
| |||
441 | 445 | | |
442 | 446 | | |
443 | 447 | | |
444 | | - | |
| 448 | + | |
445 | 449 | | |
446 | 450 | | |
447 | 451 | | |
| |||
542 | 546 | | |
543 | 547 | | |
544 | 548 | | |
| 549 | + | |
545 | 550 | | |
546 | | - | |
| 551 | + | |
547 | 552 | | |
548 | 553 | | |
549 | 554 | | |
550 | 555 | | |
551 | | - | |
| 556 | + | |
552 | 557 | | |
553 | 558 | | |
554 | 559 | | |
555 | 560 | | |
556 | | - | |
| 561 | + | |
557 | 562 | | |
558 | 563 | | |
559 | 564 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
551 | 551 | | |
552 | 552 | | |
553 | 553 | | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
0 commit comments