Eliminating SPI from RI Triggers: A Fast Path for Foreign Key Checks

zhjwpku · zhjwpku · commit 6b859001c684 · 2026-02-22T19:01:28.000+08:00
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -3,6 +3,8 @@
 # 🇬🇧 English
 
 - [2026](./en/2026/README.md)
+  - [Week 08](./en/2026/08/README.md)
+    - [Eliminating SPI from RI Triggers: A Fast Path for Foreign Key Checks](./en/2026/08/ri-fast-path-foreign-key-checks.md)
   - [Week 07](./en/2026/07/README.md)
     - [SQL Property Graph Queries (SQL/PGQ): Bringing Graph Queries to PostgreSQL](./en/2026/07/sql-property-graph-queries-pgq.md)
     - [Reducing LEFT JOIN to ANTI JOIN: A Planner Optimization for "WHERE col IS NULL"](./en/2026/07/anti-join-left-join-optimization.md)
@@ -22,6 +24,8 @@
 # 🇨🇳 中文
 
 - [2026](./cn/2026/README.md)
+  - [第 08 周](./cn/2026/08/README.md)
+    - [消除 RI 触发器中的 SPI：外键检查的快速路径](./cn/2026/08/ri-fast-path-foreign-key-checks.md)
   - [第 07 周](./cn/2026/07/README.md)
     - [SQL 属性图查询（SQL/PGQ）：为 PostgreSQL 引入图查询能力](./cn/2026/07/sql-property-graph-queries-pgq.md)
     - [将 LEFT JOIN 归约为 ANTI JOIN：针对 "WHERE col IS NULL" 的优化器优化](./cn/2026/07/anti-join-left-join-optimization.md)
diff --git a/src/cn/2026/08/README.md b/src/cn/2026/08/README.md
@@ -0,0 +1,9 @@
+# 第 08 周（2026）
+
+2026 年第 08 周 PostgreSQL 邮件列表讨论。
+
+🇬🇧 [English Version](../../../en/2026/08/index.html)
+
+## 文章
+
+- [消除 RI 触发器中的 SPI：外键检查的快速路径](./ri-fast-path-foreign-key-checks.md)
diff --git a/src/cn/2026/08/ri-fast-path-foreign-key-checks.md b/src/cn/2026/08/ri-fast-path-foreign-key-checks.md
@@ -0,0 +1,131 @@
+# 消除 RI 触发器中的 SPI：外键检查的快速路径
+
+## 引言
+
+PostgreSQL 中的引用完整性（Referential Integrity, RI）触发器传统上通过 **SPI**（Server Programming Interface）执行 SQL 查询，以验证引用表（Referencing Table）中新插入或更新的行是否在被引用表（Referenced Table, 主键表）中存在匹配行。对于批量操作（大批量 `INSERT` 或 `UPDATE`），这意味着**每一行**都会启动和销毁一次完整的执行计划，`ExecutorStart()` 和 `ExecutorEnd()` 带来的开销相当可观。
+
+Amit Langote 一直在致力于消除这一开销，通过用**直接索引探测**替代 SQL 计划来完成 RI 检查。这项工作最新迭代“Eliminating SPI / SQL from some RI triggers - take 3”通过绕过 SPI 执行器、在约束语义允许时直接调用索引访问方法，将批量外键检查的速度提升了最高 **57%**。
+
+补丁集历经多版演进，Junwang Zhao 于 2025 年底加入开发。当前方向为**混合快速路径 + 回退**：在简单场景下使用直接索引探测，在正确性依赖执行器复杂行为时回退到现有 SPI 路径。
+
+## 为什么重要
+
+外键约束无处不在。每次向引用表执行 `INSERT` 或 `UPDATE` 都会触发 RI 检查，验证每一行是否在被引用表的主键中存在匹配。传统做法下：
+
+```sql
+CREATE TABLE pk (a int PRIMARY KEY);
+CREATE TABLE fk (a int REFERENCES pk);
+
+INSERT INTO pk SELECT generate_series(1, 1000000);
+INSERT INTO fk SELECT generate_series(1, 1000000);  -- 100 万次 RI 检查
+```
+
+每一次插入都会触发 RI 检查，执行：
+
+1. 构建用于扫描主键索引的查询计划
+2. 调用 `ExecutorStart()` 和 `ExecutorEnd()`
+3. 执行计划查找（或确认不存在）匹配行
+
+每行都要经历一次执行计划的建立与销毁，主导了总耗时。在 Amit 的 v3 补丁下，同样的批量插入从**约 1000 ms** 降至**约 432 ms**（快 57%） —— 通过直接探测主键索引，而不经过执行器。
+
+## 技术背景
+
+### 传统 RI 路径
+
+`ri_triggers.c` 中的 RI 触发器函数（如 `RI_FKey_check`）调用 `ri_PerformCheck()`，其流程为：
+
+1. 构建形如 `SELECT 1 FROM pk WHERE pk.a = $1` 的 SQL 字符串
+2. 使用 `SPI_prepare` 和 `SPI_execute_plan` 执行
+3. 执行器在主键上执行索引扫描，若被引用值存在则返回一行
+
+这种方式在所有场景下都正确 —— 分区表、时态外键、并发更新 —— 但每行都承担完整的计划执行成本。
+
+### 快速路径思路
+
+对于简单外键（被引用表非分区、无非时态语义），检查本质上是：“用该值探测主键索引；若找到且能加锁，则检查通过”。可通过以下方式实现：
+
+1. 打开主键关系和其唯一索引
+2. 根据外键列值构建扫描键
+3. 调用 `index_getnext()`（或等效接口）查找元组
+4. 在当前快照下用 `LockTupleKeyShare` 加锁
+
+无需 SQL、计划或执行器，只需直接索引探测和元组加锁。
+
+## 补丁演进
+
+### v1：原始方案（2024 年 12 月）
+
+初版补丁集（3 个补丁）引入：
+
+- **0001**：重构 `PartitionDesc` 接口，显式传递 `omit_detached` 可见性（已分离挂起分区）所需的快照。解决了一个 bug：在 `REPEATABLE READ` 下，因 RI 查找会操作 `ActiveSnapshot`，而 `find_inheritance_children()` 对已分离挂起分区的可见性依赖该快照，导致主键查找可能返回错误结果。
+- **0002**：在 RI 触发器函数中避免使用 SPI，引入直接索引探测路径。
+- **0003**：对部分 RI 检查避免使用 SQL 查询——主要性能优化。
+
+Amit 指出 temporal foreign key 查询仍保留在 SPI 路径，因其计划涉及范围重叠和聚合，无法用简单索引探测处理。他还为快速路径增加了与 `EvalPlanQual()` 等价的逻辑，在 `READ COMMITTED` 下正确处理并发更新。
+
+### v2：Junwang 的混合快速路径（2025 年 12 月）
+
+Junwang Zhao 在此基础上继续推进，采用混合设计：
+
+- **0001**：为外键约束检查添加快速路径。适用条件：被引用表非分区，约束不涉及 temporal semantics 时。
+- **0002**：缓存快速路径元数据（操作符哈希条目、操作符 OID、策略号、子类型）。当时该元数据缓存尚未带来性能提升。
+
+基准测试（100 万行，`numeric` 主键 / `bigint` 外键）：
+
+- 主线：INSERT 13.5s，UPDATE 15s
+- 补丁版：INSERT 8.2s，UPDATE 10.1s
+
+### v3：Amit 的重构与按语句缓存（2026 年 2 月）
+
+Amit 将 Junwang 的补丁重构成两个补丁：
+
+- **0001**：功能完整的快速路径。包含并发处理、`REPEATABLE READ` 交叉检查、跨类型操作符、安全上下文（RLS/ACL）及元数据缓存。主要逻辑集中在 `ri_FastPathCheck()`；`RI_FKey_check` 仅负责分支判断并在需要时回退到 SPI。
+
+- **0002**：按语句的资源缓存。不共享 `trigger.c` 与 `ri_triggers.c` 的 `EState`，而是引入新的 **AfterTriggerBatchCallback** 机制，在每次触发器执行周期结束时调用。借此，可在单一周期内缓存主键关系、索引、扫描描述符和快照，从而在多次 FK 触发器调用之间复用，而不是每行都打开和关闭。
+
+Amit 的基准测试：
+
+| 场景 | 主线 | 0001 | 0001+0002 |
+|------|------|------|-----------|
+| 100 万行，numeric/bigint | 2444 ms | 1382 ms（快 43%） | 1202 ms（快 51%） |
+| 100 万行，int/int | 1000 ms | 520 ms（快 48%） | 432 ms（快 57%） |
+
+0002 的额外收益（约 13–17%）来自消除每行的关系打开/关闭、扫描开始/结束、槽分配/释放，并将每行的 `GetSnapshotData()` 替换为缓存中的快照副本。
+
+## 设计：何时走快速路径，何时走 SPI
+
+快速路径适用条件：
+
+- 被引用表**非分区**
+- 约束**不**涉及 temporal semantics（范围重叠、`range_agg()` 等）
+- 多列键、跨类型相等（通过索引操作符族）、排序规则匹配、RLS/ACL 均在快速路径内处理
+
+在以下情况回退到 SPI：
+
+1. **并发更新或删除**：若 `table_tuple_lock()` 报告目标元组已被更新或删除，则委托给 SPI，由 `EvalPlanQual` 和可见性规则按现有逻辑处理。
+2. **分区被引用表**：需要通过 `PartitionDirectory` 将探测路由到正确分区，可后续单独补丁支持。
+3. **Temporal foreign keys**：使用范围重叠和包含语义，本质上涉及聚合，保留在 SPI 路径。
+
+安全行为与现有 SPI 路径一致：快速路径在探测时临时切换到父表所有者，使用 `SECURITY_LOCAL_USERID_CHANGE | SECURITY_NOFORCE_RLS`，与 `ri_PerformCheck()` 保持一致。
+
+## 后续方向
+
+**David Rowley** 在私下交流中建议，将多个 FK 值批量为单次索引探测可进一步提升性能，利用 PostgreSQL 17 的 `ScalarArrayOp` 对 btree 的改进。思路：在按约束的缓存中跨触发器调用缓冲 FK 值，构建 `SK_SEARCHARRAY` 扫描键，让 btree AM 在一次有序遍历中扫描匹配的叶页，而不是每行一次树下降。加锁和重检查仍按元组进行。可作为独立补丁在现有系列之上探索。
+
+## 当前状态
+
+- 补丁系列位于 PG19-Drafts。Amit 于 2025 年 10 月移入；Junwang Zhao 正在继续推进。
+- Amit 的 v3 补丁（2026 年 2 月）已基本成型，等待审查。欢迎反馈，尤其是关于 `ri_LockPKTuple()` 中的并发处理及 0002 中快照生命周期的意见。
+- Pavel Stehule 表示愿意协助测试和审查。
+
+## 结论
+
+对简单外键检查消除 SPI 调用，可为批量操作带来可观的性能提升。混合快速路径 + 回退设计回应了审查者对正确性的关切：在正确性依赖执行器复杂行为时回退到 SPI。v3 中的按语句资源缓存进一步优化，将关系/索引的建立成本分摊到单一触发器执行周期内的多行上。
+
+对于具有大量外键的批量插入或更新场景——常见于 ETL、暂存加载、数据迁移 —— 该工作有望显著缩短运行时间。当前限制（分区主键、时态外键）使这些场景仍走现有路径，在保证正确性的同时优化大多数 FK 工作负载。
+
+## 参考资料
+
+- [讨论串：Eliminating SPI / SQL from some RI triggers - take 3](https://www.postgresql.org/message-id/flat/CA%2BHiwqF4C0ws3cO%2Bz5cLkPuvwnAwkSp7sfvgGj3yQ%3DLi6KNMqA%40mail.gmail.com)
+- [1] Simplifying foreign key/RI checks（早期讨论串）
+- [2] Eliminating SPI from RI triggers - take 2（早期讨论串）
diff --git a/src/cn/2026/README.md b/src/cn/2026/README.md
@@ -4,6 +4,8 @@
 
 ## 各周
 
+- [第 08 周](./08/index.html)
+  - [消除 RI 触发器中的 SPI：外键检查的快速路径](./08/ri-fast-path-foreign-key-checks.md)
 - [第 07 周](./07/index.html)
   - [SQL 属性图查询（SQL/PGQ）：为 PostgreSQL 引入图查询能力](./07/sql-property-graph-queries-pgq.md)
   - [将 LEFT JOIN 归约为 ANTI JOIN：针对 "WHERE col IS NULL" 的优化器优化](./07/anti-join-left-join-optimization.md)
diff --git a/src/en/2026/08/README.md b/src/en/2026/08/README.md
@@ -0,0 +1,9 @@
+# Week 08 (2026)
+
+PostgreSQL mailing list discussions for Week 08, 2026.
+
+🇨🇳 [中文版本](../../../cn/2026/08/index.html)
+
+## Articles
+
+- [Eliminating SPI from RI Triggers: A Fast Path for Foreign Key Checks](./ri-fast-path-foreign-key-checks.md)
diff --git a/src/en/2026/08/ri-fast-path-foreign-key-checks.md b/src/en/2026/08/ri-fast-path-foreign-key-checks.md
@@ -0,0 +1,130 @@
+# Eliminating SPI from RI Triggers: A Fast Path for Foreign Key Checks
+
+## Introduction
+
+Referential Integrity (RI) triggers in PostgreSQL traditionally execute SQL queries via **SPI** (Server Programming Interface) to verify that inserted or updated rows in a referencing table have matching rows in the referenced (primary key) table. For bulk operations—large `INSERT` or `UPDATE` statements—this means starting and tearing down a full executor plan for **each row**, with significant overhead from `ExecutorStart()` and `ExecutorEnd()`.
+
+Amit Langote has been working on eliminating this overhead by performing RI checks as **direct index probes** instead of SQL plans. The latest iteration of this work, "Eliminating SPI / SQL from some RI triggers - take 3," achieves up to **57% speedup** for bulk foreign key checks by bypassing the SPI executor and calling the index access method directly when the constraint semantics allow it.
+
+The patch set has evolved through several versions, with Junwang Zhao joining the effort in late 2025. The current direction is a **hybrid fast-path + fallback** design: use a direct index probe for straightforward cases, and fall back to the existing SPI path when correctness requires executor behavior that would be difficult or risky to replicate.
+
+## Why This Matters
+
+Foreign key constraints are ubiquitous. Every `INSERT` or `UPDATE` into a referencing table triggers RI checks that must verify each new or modified row against the referenced table's primary key. With the traditional approach:
+
+```sql
+CREATE TABLE pk (a int PRIMARY KEY);
+CREATE TABLE fk (a int REFERENCES pk);
+
+INSERT INTO pk SELECT generate_series(1, 1000000);
+INSERT INTO fk SELECT generate_series(1, 1000000);  -- 1M RI checks
+```
+
+Each of the 1 million inserts triggers an RI check that:
+
+1. Builds a query plan to scan the PK index.
+2. Runs `ExecutorStart()` and `ExecutorEnd()`.
+3. Executes the plan to find (or not find) the matching row.
+
+This per-row plan setup/teardown dominates the cost. With Amit's v3 patches, the same bulk insert drops from **~1000 ms** to **~432 ms** (57% faster) on his benchmark machine—by probing the PK index directly without going through the executor.
+
+## Technical Background
+
+### The Traditional RI Path
+
+RI trigger functions in `ri_triggers.c` (e.g. `RI_FKey_check`) call `ri_PerformCheck()`, which:
+
+1. Builds an SQL string for a query like `SELECT 1 FROM pk WHERE pk.a = $1`.
+2. Uses `SPI_prepare` and `SPI_execute_plan` to run it.
+3. The executor performs an index scan on the PK, returning a row if the referenced value exists.
+
+This works correctly for all cases—partitioned tables, temporal foreign keys, concurrent updates—but pays the full plan-execution cost per row.
+
+### The Fast-Path Idea
+
+For simple foreign keys (non-partitioned referenced table, non-temporal semantics), the check is conceptually: "probe the PK index for this value; if found and lockable, the check passes." That can be done by:
+
+1. Opening the PK relation and its unique index.
+2. Building a scan key from the FK column values.
+3. Calling `index_getnext()` (or equivalent) to find the tuple.
+4. Locking it with `LockTupleKeyShare` under the current snapshot.
+
+No SQL, no plan, no executor. Just a direct index probe and tuple lock.
+
+## Patch Evolution
+
+### v1: The Original Approach (December 2024)
+
+The first patch set (3 patches) introduced:
+
+- **0001**: Refactoring of the `PartitionDesc` interface to explicitly pass the snapshot needed for `omit_detached` visibility (detach-pending partitions). This addressed a bug where PK lookups could return incorrect results under `REPEATABLE READ` because `find_inheritance_children()`'s visibility of detach-pending partitions depended on `ActiveSnapshot`, which RI lookups were manipulating.
+- **0002**: Avoid using SPI in RI trigger functions by introducing a direct index probe path.
+- **0003**: Avoid using an SQL query for some RI checks—the main performance optimization.
+
+Amit noted that temporal foreign key queries would remain on the SPI path, as their plans involve range overlap and aggregation and are not amenable to a simple index probe. He also added an equivalent of `EvalPlanQual()` for the new path to handle concurrent updates correctly under `READ COMMITTED`.
+
+### v2: Junwang's Hybrid Fast Path (December 2025)
+
+Junwang Zhao took the work forward with a hybrid design:
+
+- **0001**: Add fast path for foreign key constraint checks. Applies when the referenced table is not partitioned and the constraint does not involve temporal semantics.
+- **0002**: Cache fast-path metadata (operator hash entries, operator OIDs, strategy numbers, subtypes). At that stage, the metadata cache did not yet improve performance.
+
+Benchmarks (1M rows, `numeric` PK / `bigint` FK):
+
+- Head: INSERT 13.5s, UPDATE 15s
+- Patched: INSERT 8.2s, UPDATE 10.1s
+
+### v3: Amit's Rework with Per-Statement Caching (February 2026)
+
+Amit reworked Junwang's patches into two patches:
+
+- **0001**: Functionally complete fast path. Includes concurrency handling, `REPEATABLE READ` crosscheck, cross-type operators, security context (RLS/ACL), and metadata caching. Most logic lives in `ri_FastPathCheck()`; `RI_FKey_check` just gates the call and falls back to SPI when needed.
+- **0002**: Per-statement resource caching. Instead of sharing `EState` between `trigger.c` and `ri_triggers.c`, a new **AfterTriggerBatchCallback** mechanism fires at the end of each trigger-firing cycle. It allows caching the PK relation, index, scan descriptor, and snapshot across all FK trigger invocations within a single cycle, rather than opening and closing them per row.
+
+Benchmarks on Amit's machine:
+
+| Scenario | Master | 0001 | 0001+0002 |
+|----------|--------|------|-----------|
+| 1M rows, numeric/bigint | 2444 ms | 1382 ms (43% faster) | 1202 ms (51% faster) |
+| 1M rows, int/int | 1000 ms | 520 ms (48% faster) | 432 ms (57% faster) |
+
+The incremental gain from 0002 (~13–17%) comes from eliminating per-row relation open/close, scan begin/end, slot allocation/free, and replacing per-row `GetSnapshotData()` with a snapshot copy in the cache.
+
+## Design: When to Use Fast Path vs. SPI
+
+The fast path applies when:
+
+- The referenced table is **not partitioned**.
+- The constraint does **not** involve temporal semantics (range overlap, `range_agg()`, etc.).
+- Multi-column keys, cross-type equality (via index opfamily), collation matching, and RLS/ACL are all handled directly in the fast path.
+
+The code falls back to SPI when:
+
+1. **Concurrent updates or deletes**: If `table_tuple_lock()` reports that the target tuple was updated or deleted, the code delegates to SPI so that `EvalPlanQual` and visibility rules apply as today.
+2. **Partitioned referenced tables**: Require routing the probe through the correct partition via `PartitionDirectory`. Can be added later as a separate patch.
+3. **Temporal foreign keys**: Use range overlap and containment semantics that inherently involve aggregation; they stay on the SPI path.
+
+Security behavior mirrors the existing SPI path: the fast path temporarily switches to the parent table's owner with `SECURITY_LOCAL_USERID_CHANGE | SECURITY_NOFORCE_RLS` around the probe, matching `ri_PerformCheck()`.
+
+## Future Directions
+
+**David Rowley** suggested off-list that batching multiple FK values into a single index probe could further improve performance, leveraging the `ScalarArrayOp` btree improvements from PostgreSQL 17. The idea: buffer FK values across trigger invocations in the per-constraint cache, build a `SK_SEARCHARRAY` scan key, and let the btree AM traverse matching leaf pages in one sorted pass instead of one tree descent per row. Locking and recheck would remain per-tuple. This could be explored as a separate patch on top of the current series.
+
+## Current Status
+
+- The series is in PG19-Drafts. Amit moved it there in October 2025; Junwang Zhao is continuing the work.
+- Amit's v3 patches (February 2026) are in reasonable shape and ready for review. He welcomes feedback, especially on concurrency handling in `ri_LockPKTuple()` and the snapshot lifecycle in 0002.
+- Pavel Stehule has offered to help with testing and review.
+
+## Conclusion
+
+Eliminating SPI from RI triggers for simple foreign key checks yields substantial performance gains for bulk operations. The hybrid fast-path + fallback design addresses reviewer concerns about correctness by deferring to SPI whenever executor behavior is non-trivial to replicate. The per-statement resource caching in v3 adds a second layer of optimization by amortizing relation/index setup across many rows within a single trigger-firing cycle.
+
+For workloads with large bulk inserts or updates on tables with foreign keys—common in ETL, staging loads, and data migrations—this work could significantly reduce runtimes. The current limitations (partitioned PKs, temporal FKs) leave those cases on the existing path, preserving correctness while optimizing the majority of FK workloads.
+
+## References
+
+- [Thread: Eliminating SPI / SQL from some RI triggers - take 3](https://www.postgresql.org/message-id/flat/CA%2BHiwqF4C0ws3cO%2Bz5cLkPuvwnAwkSp7sfvgGj3yQ%3DLi6KNMqA%40mail.gmail.com)
+- [1] Simplifying foreign key/RI checks (earlier thread)
+- [2] Eliminating SPI from RI triggers - take 2 (earlier thread)
diff --git a/src/en/2026/README.md b/src/en/2026/README.md