Generic Plans and Initial Pruning: Fewer Locks for Partitioned Tables

zhjwpku · zhjwpku · commit f31f02a72a72 · 2026-03-08T09:49:58.000+08:00
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -3,6 +3,8 @@
 # 🇬🇧 English
 
 - [2026](./en/2026/README.md)
+  - [Week 10](./en/2026/10/README.md)
+    - [Generic Plans and Initial Pruning: Fewer Locks for Partitioned Tables](./en/2026/10/generic-plans-initial-pruning.md)
   - [Week 09](./en/2026/09/README.md)
     - [More Speedups for Tuple Deformation: Precalculating attcacheoff](./en/2026/09/tuple-deformation-speedups.md)
   - [Week 08](./en/2026/08/README.md)
@@ -20,12 +22,13 @@
   - [Week 03](./en/2026/03/README.md)
     - [Extended Statistics Import/Export Functions](./en/2026/03/extended-statistics-import-functions.md)
     - [pg_plan_advice: Query Plan Control](./en/2026/03/pg-plan-advice.md)
-
 ---
 
 # 🇨🇳 中文
 
 - [2026](./cn/2026/README.md)
+  - [第 10 周](./cn/2026/10/README.md)
+    - [通用计划与初始裁剪：为分区表减少锁竞争](./cn/2026/10/generic-plans-initial-pruning.md)
   - [第 09 周](./cn/2026/09/README.md)
     - [元组解构的进一步加速：预计算 attcacheoff](./cn/2026/09/tuple-deformation-speedups.md)
   - [第 08 周](./cn/2026/08/README.md)
diff --git a/src/cn/2026/10/README.md b/src/cn/2026/10/README.md
@@ -0,0 +1,9 @@
+# 第 10 周（2026）
+
+2026 年第 10 周的 PostgreSQL 邮件列表讨论。
+
+🇬🇧 [English Version](../../../en/2026/10/index.html)
+
+## 文章
+
+- [通用计划与初始裁剪：为分区表减少锁竞争](./generic-plans-initial-pruning.md)
diff --git a/src/cn/2026/10/generic-plans-initial-pruning.md b/src/cn/2026/10/generic-plans-initial-pruning.md
@@ -0,0 +1,116 @@
+# 通用计划与初始裁剪：为分区表减少锁竞争
+
+## 引言
+
+2021 年 12 月，Amit Langote 提出了一组补丁，用于加速在**分区表**上执行**通用计划**（generic plan）时的表现。使用带参数的预处理语句时，若采用通用计划，则无法在规划阶段做分区裁剪，计划中会包含所有分区的节点。这样一来，`AcquireExecutorLocks()` 成为主要瓶颈：它会对计划中涉及到的每个关系加锁，而分区数量可能非常大。本文概述该思路、基准测试结果，以及 pgsql-hackers 上关于安全性与设计的讨论。
+
+## 为什么这很重要
+
+当使用如下预处理语句时：
+
+```sql
+PREPARE q AS SELECT * FROM partitioned_table WHERE key = $1;
+EXECUTE q(123);
+```
+
+在 `plan_cache_mode = force_generic_plan`（或优化器已选择通用计划）的情况下，计划在多次执行间共享，且无法在规划时做分区裁剪，因此计划树会包含**所有**分区。每次执行前，`CheckCachedPlan()` 都要确认计划仍然有效，其中绝大部分开销来自 `AcquireExecutorLocks()`——它会对计划中的每个关系加锁。分区数量增加到数百或数千时，加锁成本占主导，吞吐会明显下降。
+
+David Rowley 曾提出将加锁推迟到执行器里完成「初始」裁剪之后再做。该方案因存在竞态而被否决：部分分区不加锁时，并发会话可能在计划被判定有效之后、实际执行之前修改分区，导致计划部分失效。
+
+Amit 的方案则仍在计划检查阶段加锁，但通过复用执行器将使用的「初始」裁剪逻辑，**缩小需要加锁的关系集合**：只对在初始裁剪后仍保留的分区加锁。这样既保证计划一致性，又让加锁数量随实际参与执行的分区数量增长，而不是随总分区数增长。
+
+## 技术分析
+
+### 思路：先裁剪再加锁
+
+分区表的 Append、MergeAppend 节点上带有**分区裁剪信息**：哪些子计划会被「初始」（执行前）步骤裁掉，哪些会被「执行时」步骤裁掉。初始步骤只依赖执行前就已知的值（例如绑定参数），不依赖行级数据。补丁让 `AcquireExecutorLocks()` 在收集要加锁的关系时：
+
+1. 像现在一样遍历计划树；
+2. 对带有**初始裁剪步骤**（`contains_init_steps`）的 Append/MergeAppend 节点，执行与执行器相同的初始裁剪逻辑，得到保留的子计划集合；
+3. 仅将这些保留子计划对应的关系加入待加锁集合。
+
+因此加锁集合与真正会执行到的关系一致：被初始裁剪掉的分区不会加锁，会用到的分区不会漏锁。
+
+### 裁剪的重复执行
+
+这样一来，通用计划下「初始」裁剪会被执行**两次**：一次在 `AcquireExecutorLocks()` 里用于决定加锁对象，一次在 `ExecInit[Merge]Append()` 里用于决定要初始化哪些分区子节点。Amit 表示没有找到在不调整加锁时机（例如把加锁挪到执行器启动阶段）的前提下消除这种重复的简洁做法，而后者属于更大的改动。
+
+### 基准测试
+
+在 pgbench 分区库上使用 `plan_cache_mode = force_generic_plan`：
+
+- **HEAD**：分区数增加时吞吐明显下降（例如 32 分区约 2.05 万 tps，2048 分区约 1.3k tps）。
+- **打补丁后**：吞吐维持较高（例如 32 分区约 2.75 万 tps，2048 分区约 1.63 万 tps）。
+
+说明在通用计划、多分区场景下，补丁显著消除了加锁带来的扩展性瓶颈。
+
+## 社区讨论
+
+### 适用场景
+
+Ashutosh Bapat 问在哪些情况下会存在可用来减少加锁的「执行前」裁剪指令。Amit 说明：
+
+- 主要场景是使用**通用计划**的**预处理语句**，例如 `PREPARE q AS SELECT * FROM partitioned_table WHERE key = $1;` 配合 `EXECUTE q(...)`。
+- 其他瓶颈（例如遍历完整 range table 的执行器启动/关闭逻辑）本补丁未改动。
+
+### 代码审查（Amul Sul）
+
+Amul 对 v1 提出多处风格与结构建议：
+
+- 将变量声明移入 `if (pruneinfo && pruneinfo->contains_init_steps)` 分支内部。
+- 在该条件为假时补充简短注释：`plan_tree_walker()` 会继续遍历子节点，因此加锁行为仍然正确。
+- 优先使用已有的 `GetLockableRelations_worker()` 等，避免新增 `get_plan_scanrelids()`。
+- 对 CustomScan 使用 `plan_walk_members()`，与其他节点类型一致。
+- 在锁收集路径中用于裁剪的临时 `EState` 应由**调用方**创建和释放，而不是在收集加锁关系的辅助函数内部创建/释放。
+- 在相关循环中使用 `foreach_current_index()` 提高可读性。
+
+### 安全性（Robert Haas）
+
+Robert 提出两点重要顾虑。
+
+**1. 计划「部分有效」**
+目前我们只执行完全有效的计划：对所有关系加锁，从而会接受失效消息并发现可能使计划失效的 DDL。若跳过对部分关系的加锁，就可能永远收不到这些关系的失效消息。例如：
+
+- 某个分区有额外索引，计划中使用了该索引的 Index Scan；
+- 该分区被初始裁剪掉，因此我们不对其加锁；
+- 另一会话删除了该索引；
+- 我们仍认为计划有效。虽然不会执行被裁掉的部分，但遍历整棵计划树的代码（如 EXPLAIN、auto_explain）可能访问该节点并出错（例如查找索引名）。
+
+也就是说会引入「计划部分有效」的情况，而目前代码没有这种假设。Robert 虽未断言核心代码里一定存在由此触发的具体 bug，但认为这是一类新风险。
+
+Amit 回复说，他检查了在执行器初始化之前访问计划树的路径，未发现会触及被裁掉部分；EXPLAIN 在 `ExecutorStart()` 之后运行，此时已构建 PlanState 树，只包含未裁掉的部分。他也同意不能据此断言绝对安全。
+
+**2. 加锁集合与初始化集合必须一致**
+在两处分别做初始裁剪意味着两次独立计算。若结果不一致（例如函数误标为 IMMUTABLE 实为 VOLATILE），可能出现加锁一组分区、初始化另一组分区的情况。Robert 认为应通过设计保证两者不可能不一致，而不是依赖两处结果永远相同。
+
+Amit 同意补丁的前提是初始裁剪是确定性的（裁剪表达式中无 VOLATILE）。若 IMMUTABLE 标错，可能导致 Assert 失败，或在非 Assert 构建下使用未加锁分区，后果严重。
+
+## 技术细节
+
+### 适用范围
+
+- **仅通用计划**：自定义计划可在规划时裁剪分区，不会像通用计划那样在计划中保留全部分区。
+- **仅初始裁剪**：只有被**初始**（执行前）裁剪掉的分区才会从加锁集合中排除。执行时裁剪（例如另一次执行中不同的参数值）所涉及的分区仍在计划中，仍会被加锁；补丁不改变这一点。
+
+### 边界情况
+
+- EXPLAIN / auto_explain：担心它们会访问我们未加锁的关系对应的计划节点。Amit 的分析是它们在执行器初始化之后运行，看到的只是初始化后的（裁剪后的）计划。
+- VOLATILE/IMMUTABLE 标错：可能导致加锁集合与初始化集合不一致；补丁未针对此增加额外防护。
+
+## 当前状态
+
+该线程中只出现了一个补丁版本（v1），未显示后续提交。讨论形成了以下共识与待办：
+
+- 通用计划在大量分区下的基准测试收益明显。
+- 需要落实代码风格与重构建议（EState 生命周期、walker 用法、注释等）。
+- 设计上仍有待明确：如何保证加锁集合与初始化集合不会分歧，以及「部分有效」计划对遍历整棵计划树的代码路径是否可接受。
+
+## 小结
+
+Amit 的补丁通过只对在初始裁剪后仍保留的分区加锁，降低了通用计划在分区表上的 `AcquireExecutorLocks()` 开销，同时避免了此前「推迟加锁」方案带来的竞态。基准测试表明在分区数较多时吞吐提升显著。讨论厘清了适用场景（使用通用计划的预处理语句）、提出了安全性与一致性方面的合理顾虑（部分有效计划、重复裁剪），并给出了具体的代码审查意见。若能在设计上保证「加锁」与「执行器初始化」使用同一份裁剪结果，将进一步提高可靠性，可能需要对加锁时机或执行器启动流程做一定重构。
+
+## 参考
+
+- [讨论串：generic plans and "initial" pruning](https://www.postgresql.org/message-id/flat/CA%2BHiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg%40mail.gmail.com)（Amit Langote，2021 年 12 月）
+- [1] David Rowley 的早期提案： [message-id](https://www.postgresql.org/message-id/CAKJS1f_kfRQ3ZpjQyHC7=PK9vrhxiHBQFZ+hc0JCwwnRKkF3hg@mail.gmail.com)
+- [2] 推迟加锁的竞态说明: [message-id](https://www.postgresql.org/message-id/CAKJS1f99JNe+sw5E3qWmS+HeLMFaAhehKO67J1Ym3pXv0XBsxw@mail.gmail.com)
diff --git a/src/cn/2026/README.md b/src/cn/2026/README.md
@@ -4,6 +4,8 @@
 
 ## 各周
 
+- [第 10 周](/cn/2026/10/index.html)
+  - [通用计划与初始裁剪：为分区表减少锁竞争](/cn/2026/10/generic-plans-initial-pruning.html)
 - [第 09 周](/cn/2026/09/index.html)
   - [元组解构的进一步加速：预计算 attcacheoff](/cn/2026/09/tuple-deformation-speedups.html)
 - [第 08 周](/cn/2026/08/index.html)
diff --git a/src/en/2026/10/README.md b/src/en/2026/10/README.md
@@ -0,0 +1,9 @@
+# Week 10 (2026)
+
+PostgreSQL mailing list discussions for Week 10, 2026.
+
+🇨🇳 [中文版本](../../../cn/2026/10/index.html)
+
+## Articles
+
+- [Generic Plans and Initial Pruning: Fewer Locks for Partitioned Tables](./generic-plans-initial-pruning.md)
diff --git a/src/en/2026/10/generic-plans-initial-pruning.md b/src/en/2026/10/generic-plans-initial-pruning.md
@@ -0,0 +1,116 @@
+# Generic Plans and Initial Pruning: Fewer Locks for Partitioned Tables
+
+## Introduction
+
+In December 2021, Amit Langote proposed a patch to speed up execution of **generic plans** over partitioned tables. Generic plans (e.g. from prepared statements with parameters) cannot prune partitions at plan time, so they contain nodes for every partition. That makes `AcquireExecutorLocks()` a major bottleneck: it locks every relation in the plan, and partition count grows without bound. This post summarizes the idea, the benchmark gains, and the safety and design discussion that followed on pgsql-hackers.
+
+## Why This Matters
+
+When you use a prepared statement like:
+
+```sql
+PREPARE q AS SELECT * FROM partitioned_table WHERE key = $1;
+EXECUTE q(123);
+```
+
+with `plan_cache_mode = force_generic_plan` (or after the planner has chosen a generic plan), the plan is shared across executions and has no plan-time pruning. So the plan tree includes **all** partitions. Before each execution, `CheckCachedPlan()` must ensure the plan is still valid; most of that cost is in `AcquireExecutorLocks()`, which locks every relation in the plan. As partition count grows (hundreds or thousands), lock acquisition dominates and throughput drops sharply.
+
+A previous attempt by David Rowley was to delay locking until after "initial" (pre-execution) pruning in the executor. That was rejected because leaving some partitions unlocked opened race conditions: a concurrent session could alter a partition after the plan was deemed valid but before execution, invalidating part of the plan.
+
+Amit's approach keeps locking at plan-check time but **reduces the set of relations to lock** by reusing the same "initial" pruning logic that the executor will later use. Only partitions that survive initial pruning are locked, so the plan stays safe while lock count scales with the number of partitions that are actually used.
+
+## Technical Analysis
+
+### The Idea: Prune Before Locking
+
+Append and MergeAppend nodes for partitioned tables carry **partition pruning information**: which subplans are discarded by "initial" (pre-execution) steps versus "execution-time" steps. Initial steps depend only on values available before execution (e.g. bound parameters), not on per-row values. The patch teaches `AcquireExecutorLocks()` to:
+
+1. Walk the plan tree as it does today to collect relations to lock.
+2. For Append/MergeAppend nodes that have **initial pruning steps** (`contains_init_steps`), run those steps (using the same logic as the executor) to get the set of subplans that survive.
+3. Only add relations from those surviving subplans to the lock set.
+
+So the lock set is exactly the set of relations that will be used when the plan runs. No partition that is pruned away by initial steps is locked, and no partition that is used is left unlocked.
+
+### Duplication of Pruning
+
+Initial pruning is therefore performed **twice** for generic plans: once in `AcquireExecutorLocks()` to decide what to lock, and again in `ExecInit[Merge]Append()` to decide which partition subnodes to create. Amit noted he couldn't find a clean way to avoid this duplication without restructuring where locking happens (e.g. moving it into executor startup), which would be a larger change.
+
+### Benchmark
+
+Using pgbench with a partitioned database and `plan_cache_mode = force_generic_plan`:
+
+- **HEAD**: throughput falls as partition count increases (e.g. 32 partitions ≈ 20.5k tps, 2048 partitions ≈ 1.3k tps).
+- **Patched**: throughput stays much higher (e.g. 32 partitions ≈ 27.5k tps, 2048 partitions ≈ 16.3k tps).
+
+So the patch removes most of the scaling cost from lock acquisition when generic plans are used with many partitions.
+
+## Community Insights
+
+### When Does This Apply?
+
+Ashutosh Bapat asked when "pre-execution" pruning instructions exist so that this approach helps. Amit clarified:
+
+- The main use case is **prepared statements** that use a **generic plan**, e.g. `PREPARE q AS SELECT * FROM partitioned_table WHERE key = $1;` with `EXECUTE q(...)`.
+- Other bottlenecks (e.g. executor startup/shutdown code that walk the full range table) are unchanged by this patch.
+
+### Code Review (Amul Sul)
+
+Amul suggested several cleanups for the v1 patch:
+
+- Move declarations inside the `if (pruneinfo && pruneinfo->contains_init_steps)` block.
+- Add a short comment that when the condition is false, `plan_tree_walker()` continues to child nodes, so locking behavior remains correct.
+- Prefer `GetLockableRelations_worker()` (or equivalent) over adding a new `get_plan_scanrelids()`.
+- Use `plan_walk_members()` for CustomScan like other node types.
+- Let the **caller** create/free the temporary `EState` used for pruning in the lock path, instead of doing it inside the lock-collection helper.
+- Use `foreach_current_index()` in the relevant loops for clarity.
+
+### Safety (Robert Haas)
+
+Robert raised two important points.
+
+**1. Partly valid plans**
+Today we only run plans that are fully valid: we lock every relation, so we accept invalidation messages and detect DDL that might invalidate the plan. If we skip locking some relations, we might never see invalidations for them. For example:
+
+- A partition has an extra index and the plan uses an Index Scan on it.
+- That partition is pruned by initial steps, so we don't lock it.
+- Another session drops the index.
+- We still consider the plan valid. We don't execute the pruned part, but code that walks the whole plan (e.g. EXPLAIN, auto_explain) might touch that node and break (e.g. looking up the index name).
+
+So we'd be in a situation where the plan is "partly valid," which we don't have today. Robert wasn't sure there is a concrete bug in core from that, but it's a new class of risk.
+
+Amit replied that he'd looked for places that inspect the plan tree before executor init and hadn't found one that would touch pruned-off parts; EXPLAIN runs after `ExecutorStart()`, which builds the PlanState tree and thus only the non-pruned portion. He agreed it's not something to assert with certainty.
+
+**2. Lock set vs. init set must match**
+Doing initial pruning in two places means two separate computations. If they ever disagree (e.g. due to a function misdeclared as IMMUTABLE but actually VOLATILE), we could lock one set of partitions and then initialize a different set. Robert argued we should ensure that cannot happen rather than rely on the two being always identical.
+
+Amit agreed the patch assumes initial pruning is deterministic (no VOLATILE in the pruning expressions). Misdeclared IMMUTABLE could lead to Assert failures or, in non-assert builds, using an unlocked partition, which would be bad.
+
+## Technical Details
+
+### Applicability
+
+- **Generic plans only**: Custom plans can prune at plan time, so they don't have the "all partitions in the plan" problem to the same degree.
+- **Initial pruning only**: Only partitions eliminated by **initial** (pre-execution) pruning are skipped for locking. Partitions pruned at execution time (e.g. by runtime parameter values in a different execution) are still in the plan and would still be locked; the patch doesn't change that.
+
+### Edge Cases
+
+- EXPLAIN / auto_explain: The concern is that they might walk plan nodes for relations we didn't lock. Amit's analysis is that they run after executor init, so they only see the initialized (post-pruning) plan.
+- Incorrect VOLATILE/IMMUTABLE: Could make lock set and init set differ; the patch doesn't add extra guards for that.
+
+## Current Status
+
+The thread carried a single patch version (v1) and did not show a follow-up commit in the thread. The discussion highlighted:
+
+- Strong benchmark gains for generic plans on many partitions.
+- Need to address code-style and refactor suggestions (EState lifecycle, walker usage, comments).
+- Open design points: ensuring lock set and init set cannot diverge, and whether "partly valid" plans are acceptable for code paths that walk the full plan tree.
+
+## Conclusion
+
+Amit's patch reduces the cost of `AcquireExecutorLocks()` for generic plans over partitioned tables by locking only partitions that survive initial pruning, avoiding the race conditions of the earlier "delay locking" approach. Benchmarks show large throughput improvements when many partitions are present. The discussion clarified the intended use case (prepared statements with generic plans), raised valid safety and consistency concerns (partly valid plans, duplicate pruning), and produced concrete code-review suggestions. Implementing similar logic in a way that guarantees a single pruning result for both locking and executor init would strengthen the approach and may require a somewhat larger refactor of where and how locks are acquired.
+
+## References
+
+- [Thread: generic plans and "initial" pruning](https://www.postgresql.org/message-id/flat/CA%2BHiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg%40mail.gmail.com) (Amit Langote, Dec 2021)
+- [1] David Rowley's earlier proposal: [message-id](https://www.postgresql.org/message-id/CAKJS1f_kfRQ3ZpjQyHC7=PK9vrhxiHBQFZ+hc0JCwwnRKkF3hg@mail.gmail.com)
+- [2] Race condition with delayed locking: [message-id](https://www.postgresql.org/message-id/CAKJS1f99JNe+sw5E3qWmS+HeLMFaAhehKO67J1Ym3pXv0XBsxw@mail.gmail.com)
diff --git a/src/en/2026/README.md b/src/en/2026/README.md