Add blog: PostgreSQL planner COUNT(*) optimization (Week 04)

zhjwpku · zhjwpku · commit e1e7fc2f69ea · 2026-01-25T09:04:43.000+08:00
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -3,6 +3,8 @@
 # 🇬🇧 English
 
 - [2026](./en/2026/README.md)
+  - [Week 04](./en/2026/04/README.md)
+    - [PostgreSQL Planner Optimization: Automatic COUNT(*) Conversion](./en/2026/04/planner-count-optimization.md)
   - [Week 03](./en/2026/03/README.md)
     - [Extended Statistics Import/Export Functions](./en/2026/03/extended-statistics-import-functions.md)
     - [pg_plan_advice: Query Plan Control](./en/2026/03/pg-plan-advice.md)
@@ -12,6 +14,8 @@
 # 🇨🇳 中文
 
 - [2026](./cn/2026/README.md)
+  - [第 04 周](./cn/2026/04/README.md)
+    - [PostgreSQL 查询规划器优化：自动 COUNT(*) 转换](./cn/2026/04/planner-count-optimization.md)
   - [第 03 周](./cn/2026/03/README.md)
     - [扩展统计信息导入/导出功能](./cn/2026/03/extended-statistics-import-functions.md)
     - [pg_plan_advice：查询计划控制](./cn/2026/03/pg-plan-advice.md)
diff --git a/src/cn/2026/04/README.md b/src/cn/2026/04/README.md
@@ -0,0 +1,9 @@
+# 第 04 周（2026）
+
+2026 年第 04 周的 PostgreSQL 邮件列表讨论。
+
+🇬🇧 [English Version](../../../en/2026/04/README.md)
+
+## 文章
+
+- [PostgreSQL 查询规划器优化：自动 COUNT(*) 转换](./planner-count-optimization.md)
diff --git a/src/cn/2026/04/planner-count-optimization.md b/src/cn/2026/04/planner-count-optimization.md
@@ -0,0 +1,171 @@
+# PostgreSQL 查询规划器优化：自动 COUNT(*) 转换
+
+## 引言
+
+2025 年 10 月，PostgreSQL 提交者 David Rowley 提出了一个重要的查询规划器优化，能够自动将 `COUNT(1)` 和 `COUNT(not_null_col)` 表达式转换为 `COUNT(*)`。这个优化解决了一个常见的性能反模式：开发者认为 `COUNT(1)` 等同于 `COUNT(*)`，但实际上 `COUNT(*)` 更高效。该补丁于 2025 年 11 月提交，并引入了用于聚合函数简化的新基础设施。
+
+## 为什么这很重要
+
+`COUNT(*)` 和 `COUNT(column)` 之间的性能差异可能非常显著，特别是对于大表。当计算特定列时，PostgreSQL 必须：
+
+1. **解构元组**以提取列值
+2. **检查 NULL 值**（即使对于 NOT NULL 列，检查仍然会发生）
+3. **通过聚合函数处理列数据**
+
+相比之下，`COUNT(*)` 可以在不访问单个列值的情况下计算行数，从而获得显著更好的性能。David Rowley 的基准测试显示，在包含 100 万行的表上，使用 `COUNT(*)` 而不是 `COUNT(not_null_col)` 可以获得约 **37% 的性能提升**。
+
+## 技术分析
+
+### 基础设施：SupportRequestSimplifyAggref
+
+该补丁引入了一个名为 `SupportRequestSimplifyAggref` 的新基础设施，类似于现有的用于常规函数表达式（`FuncExpr`）的 `SupportRequestSimplify`。由于聚合使用 `Aggref` 节点，因此需要一个单独的机制。
+
+关键组件包括：
+
+1. **新的支持节点类型**：`supportnodes.h` 中的 `SupportRequestSimplifyAggref`
+2. **简化函数**：`clauses.c` 中的 `simplify_aggref()`，在常量折叠期间调用聚合的支持函数
+3. **增强的可空性检查**：扩展 `expr_is_nonnullable()` 以处理 `Const` 节点，而不仅仅是 `Var` 节点
+
+### 实现细节
+
+优化在查询规划的常量折叠阶段执行，具体在 `eval_const_expressions_mutator()` 中。当遇到 `Aggref` 节点时，规划器会：
+
+1. 检查聚合函数是否通过 `pg_proc.prosupport` 注册了支持函数
+2. 使用 `SupportRequestSimplifyAggref` 请求调用支持函数
+3. 如果支持函数返回简化的节点，则替换原始的 `Aggref`
+
+对于 `COUNT` 聚合，支持函数（`int8_agg_support_simplify()`）会检查：
+
+- 参数是否不可为空（使用 `expr_is_nonnullable()`）
+- 聚合中是否没有 `ORDER BY` 或 `DISTINCT` 子句
+- 如果两个条件都满足，则将 `COUNT(ANY)` 转换为 `COUNT(*)`
+
+### 代码示例
+
+`int8.c` 中的核心简化逻辑：
+
+```c
+static Node *
+int8_agg_support_simplify(SupportRequestSimplifyAggref *req)
+{
+    Aggref    *aggref = req->aggref;
+
+    /* 只处理 COUNT */
+    if (aggref->aggfnoid != INT8_AGG_COUNT_OID)
+        return NULL;
+
+    /* 必须恰好有一个参数 */
+    if (list_length(aggref->args) != 1)
+        return NULL;
+
+    /* 没有 ORDER BY 或 DISTINCT */
+    if (aggref->aggorder != NIL || aggref->aggdistinct != NIL)
+        return NULL;
+
+    /* 检查参数是否不可为空 */
+    if (!expr_is_nonnullable(req->root,
+                             (Expr *) linitial(aggref->args),
+                             true))
+        return NULL;
+
+    /* 转换为 COUNT(*) */
+    return make_count_star_aggref(aggref);
+}
+```
+
+## 补丁演进
+
+该补丁经历了四次迭代，每次都在改进实现：
+
+### 版本 1（初始提案）
+- 引入基本基础设施
+- 使用 `SysCache` 获取 `pg_proc` 元组
+
+### 版本 2（代码清理）
+- 用 `get_func_support()` 函数替换 `SysCache` 查找
+- 更清晰、更高效的方法
+
+### 版本 3（移除实验性代码）
+- 移除了处理 `COUNT(NULL)` 优化的 `#ifdef NOT_USED` 块
+- 清理了未使用的包含文件
+- 改进了注释
+
+### 版本 4（最终版本）
+- 在提交 `b140c8d7a` 后重新基于
+- 修复了支持函数总是返回 `Aggref` 的假设
+- 允许支持函数返回其他节点类型（例如常量），以实现更激进的优化
+- 这种灵活性使得未来的优化成为可能，例如将 `COUNT(NULL)` 转换为 `'0'::bigint`
+
+## 社区见解
+
+### 审查者反馈
+
+**Corey Huinker** 提供了积极的反馈：
+- +1 支持自动查询改进
+- 指出我们无法教育所有人 `COUNT(1)` 是反模式，所以让它不再是反模式是正确的做法
+- 确认补丁可以干净地应用且测试通过
+
+**Matheus Alcantara** 也进行了审查和测试：
+- 确认基准测试中约 30% 的性能提升
+- 验证了代码放置与现有的 `SupportRequestSimplify` 基础设施一致
+- +1 支持这个想法
+
+### 设计决策
+
+**优化的时机**：优化在常量折叠期间发生，这是规划过程的早期阶段。David 考虑过是否应该在稍后（在 `add_base_clause_to_rel()` 之后）进行，以捕获如下情况：
+
+```sql
+SELECT count(nullable_col) FROM t WHERE nullable_col IS NOT NULL;
+```
+
+但是，它必须在 `preprocess_aggref()` 之前发生，该函数将具有相同转换函数的聚合分组。当前的位置与常规函数的 `SupportRequestSimplify` 一致。
+
+**支持函数返回类型**：基础设施允许支持函数返回 `Aggref` 以外的节点。这个设计决策使得未来的优化成为可能，例如：
+- 将 `COUNT(NULL)` 转换为 `'0'::bigint`
+- 对聚合进行更激进的常量折叠
+
+## 性能考虑
+
+该优化提供了显著的性能优势：
+
+1. **减少元组解构**：`COUNT(*)` 不需要从元组中提取列值
+2. **更少的 NULL 检查**：不需要检查单个列值
+3. **更好的缓存利用率**：更少的数据移动意味着更好的 CPU 缓存使用
+
+对于具有多列的表，性能提升可能更加显著，因为 `COUNT(column)` 可能需要解构许多列才能到达目标列。
+
+## 边界情况和限制
+
+优化仅在以下情况下应用：
+
+1. 列被证明不可为空（NOT NULL 约束或常量）
+2. 聚合中没有 `ORDER BY` 子句
+3. 聚合中没有 `DISTINCT` 子句
+
+**尚未**优化的情况：
+
+- `COUNT(nullable_col)`，其中列可能为 NULL（即使在同一查询中通过 `WHERE nullable_col IS NOT NULL` 过滤）
+- `COUNT(col ORDER BY col)` - ORDER BY 阻止优化
+- `COUNT(DISTINCT col)` - DISTINCT 阻止优化
+
+`WHERE` 子句的限制是由于优化的时机（在常量折叠期间，在关系信息完全可用之前）。
+
+## 当前状态
+
+该补丁由 David Rowley 于 2025 年 11 月 26 日**提交**。它可在 PostgreSQL master 分支中使用，并将包含在 PostgreSQL 18 中。
+
+## 结论
+
+这个优化代表了 PostgreSQL 查询规划器的重大改进，自动修复了常见的性能反模式，而无需更改应用程序。新的 `SupportRequestSimplifyAggref` 基础设施也为未来的聚合优化打开了大门。
+
+对于开发者和 DBA：
+- **无需操作**：优化会自动发生
+- **性能优势**：使用 `COUNT(1)` 或 `COUNT(not_null_col)` 的现有查询将自动变得更快
+- **最佳实践**：虽然规划器现在会优化这些情况，但 `COUNT(*)` 仍然是计算行数最清晰、最符合习惯的方式
+
+这一变化体现了 PostgreSQL 对自动改进查询性能的承诺，减少了开发者了解每个优化细节的负担，同时仍然允许专家在需要时编写最优查询。
+
+## 参考资料
+
+- [讨论线程](https://www.postgresql.org/message-id/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com)
+- 相关：用于常规函数表达式的 `SupportRequestSimplify`
diff --git a/src/en/2026/04/README.md b/src/en/2026/04/README.md
@@ -0,0 +1,9 @@
+# Week 04 (2026)
+
+PostgreSQL mailing list discussions for Week 04, 2026.
+
+🇨🇳 [中文版本](../../../cn/2026/04/README.md)
+
+## Articles
+
+- [PostgreSQL Planner Optimization: Automatic COUNT(*) Conversion](./planner-count-optimization.md)
diff --git a/src/en/2026/04/planner-count-optimization.md b/src/en/2026/04/planner-count-optimization.md
@@ -0,0 +1,171 @@
+# PostgreSQL Planner Optimization: Automatic COUNT(*) Conversion
+
+## Introduction
+
+In October 2025, PostgreSQL committer David Rowley proposed a significant query planner optimization that automatically converts `COUNT(1)` and `COUNT(not_null_col)` expressions to `COUNT(*)`. This optimization addresses a common performance anti-pattern where developers write `COUNT(1)` thinking it's equivalent to `COUNT(*)`, when in fact `COUNT(*)` is more efficient. The patch was committed in November 2025 and introduces new infrastructure for aggregate function simplification.
+
+## Why This Matters
+
+The performance difference between `COUNT(*)` and `COUNT(column)` can be substantial, especially for large tables. When counting a specific column, PostgreSQL must:
+
+1. **Deform the tuple** to extract the column value
+2. **Check for NULL values** (even for NOT NULL columns, the check still occurs)
+3. **Process the column data** through the aggregate function
+
+In contrast, `COUNT(*)` can count rows without accessing individual column values, resulting in significantly better performance. David Rowley's benchmarks showed approximately **37% performance improvement** when using `COUNT(*)` instead of `COUNT(not_null_col)` on a table with 1 million rows.
+
+## Technical Analysis
+
+### The Infrastructure: SupportRequestSimplifyAggref
+
+The patch introduces a new infrastructure called `SupportRequestSimplifyAggref`, which is similar to the existing `SupportRequestSimplify` used for regular function expressions (`FuncExpr`). Since aggregates use `Aggref` nodes, a separate mechanism was needed.
+
+The key components include:
+
+1. **New support node type**: `SupportRequestSimplifyAggref` in `supportnodes.h`
+2. **Simplification function**: `simplify_aggref()` in `clauses.c` that calls the aggregate's support function during constant folding
+3. **Enhanced nullability checking**: Extended `expr_is_nonnullable()` to handle `Const` nodes, not just `Var` nodes
+
+### Implementation Details
+
+The optimization is performed during the constant folding phase of query planning, specifically in `eval_const_expressions_mutator()`. When an `Aggref` node is encountered, the planner:
+
+1. Checks if the aggregate function has a support function registered via `pg_proc.prosupport`
+2. Calls the support function with a `SupportRequestSimplifyAggref` request
+3. If the support function returns a simplified node, replaces the original `Aggref`
+
+For the `COUNT` aggregate specifically, the support function (`int8_agg_support_simplify()`) checks:
+
+- Whether the argument is non-nullable (using `expr_is_nonnullable()`)
+- Whether there are no `ORDER BY` or `DISTINCT` clauses in the aggregate
+- If both conditions are met, converts `COUNT(ANY)` to `COUNT(*)`
+
+### Code Example
+
+The core simplification logic in `int8.c`:
+
+```c
+static Node *
+int8_agg_support_simplify(SupportRequestSimplifyAggref *req)
+{
+    Aggref    *aggref = req->aggref;
+
+    /* Only handle COUNT */
+    if (aggref->aggfnoid != INT8_AGG_COUNT_OID)
+        return NULL;
+
+    /* Must have exactly one argument */
+    if (list_length(aggref->args) != 1)
+        return NULL;
+
+    /* No ORDER BY or DISTINCT */
+    if (aggref->aggorder != NIL || aggref->aggdistinct != NIL)
+        return NULL;
+
+    /* Check if argument is non-nullable */
+    if (!expr_is_nonnullable(req->root,
+                             (Expr *) linitial(aggref->args),
+                             true))
+        return NULL;
+
+    /* Convert to COUNT(*) */
+    return make_count_star_aggref(aggref);
+}
+```
+
+## Patch Evolution
+
+The patch went through four iterations, each refining the implementation:
+
+### Version 1 (Initial Proposal)
+- Introduced the basic infrastructure
+- Used `SysCache` to fetch `pg_proc` tuples
+
+### Version 2 (Code Cleanup)
+- Replaced `SysCache` lookup with `get_func_support()` function
+- Cleaner and more efficient approach
+
+### Version 3 (Removed Experimental Code)
+- Removed `#ifdef NOT_USED` block that handled `COUNT(NULL)` optimization
+- Cleaned up unused includes
+- Improved comments
+
+### Version 4 (Final Version)
+- Rebased after commit `b140c8d7a`
+- Fixed assumption that support function always returns an `Aggref`
+- Allows support functions to return other node types (e.g., constants) for more aggressive optimizations
+- This flexibility enables future optimizations like converting `COUNT(NULL)` to `'0'::bigint`
+
+## Community Insights
+
+### Reviewer Feedback
+
+**Corey Huinker** provided positive feedback:
+- +1 for the automatic query improvement
+- Noted that we can't educate everyone that `COUNT(1)` is an anti-pattern, so making it not an anti-pattern is the right approach
+- Confirmed the patch applies cleanly and tests pass
+
+**Matheus Alcantara** also reviewed and tested:
+- Confirmed ~30% performance improvement in benchmarks
+- Validated that the code placement is consistent with existing `SupportRequestSimplify` infrastructure
+- +1 for the idea
+
+### Design Decisions
+
+**Timing of Optimization**: The optimization happens during constant folding, which is early in the planning process. David considered whether it should happen later (after `add_base_clause_to_rel()`) to catch cases like:
+
+```sql
+SELECT count(nullable_col) FROM t WHERE nullable_col IS NOT NULL;
+```
+
+However, it must happen before `preprocess_aggref()`, which groups aggregates with the same transition function. The current placement is consistent with `SupportRequestSimplify` for regular functions.
+
+**Support Function Return Type**: The infrastructure allows support functions to return nodes other than `Aggref`. This design decision enables future optimizations, such as:
+- Converting `COUNT(NULL)` to `'0'::bigint`
+- More aggressive constant folding for aggregates
+
+## Performance Considerations
+
+The optimization provides significant performance benefits:
+
+1. **Reduced tuple deformation**: `COUNT(*)` doesn't need to extract column values from tuples
+2. **Fewer NULL checks**: No need to check individual column values
+3. **Better cache utilization**: Less data movement means better CPU cache usage
+
+For tables with many columns, the performance gain can be even more substantial, as `COUNT(column)` might require deforming many columns to reach the target column.
+
+## Edge Cases and Limitations
+
+The optimization only applies when:
+
+1. The column is provably non-nullable (NOT NULL constraint or constant)
+2. There are no `ORDER BY` clauses in the aggregate
+3. There are no `DISTINCT` clauses in the aggregate
+
+Cases that are **not** optimized (yet):
+
+- `COUNT(nullable_col)` where the column might be NULL (even if filtered by `WHERE nullable_col IS NOT NULL` in the same query)
+- `COUNT(col ORDER BY col)` - the ORDER BY prevents optimization
+- `COUNT(DISTINCT col)` - DISTINCT prevents optimization
+
+The limitation with `WHERE` clauses is due to the timing of the optimization (during constant folding, before relation information is fully available).
+
+## Current Status
+
+The patch was **committed** by David Rowley on November 26, 2025. It's available in PostgreSQL master branch and will be included in PostgreSQL 18.
+
+## Conclusion
+
+This optimization represents a significant improvement to PostgreSQL's query planner, automatically fixing a common performance anti-pattern without requiring application changes. The new `SupportRequestSimplifyAggref` infrastructure also opens the door for future aggregate optimizations.
+
+For developers and DBAs:
+- **No action required**: The optimization happens automatically
+- **Performance benefit**: Existing queries using `COUNT(1)` or `COUNT(not_null_col)` will automatically get faster
+- **Best practice**: While the planner now optimizes these cases, `COUNT(*)` remains the clearest and most idiomatic way to count rows
+
+This change demonstrates PostgreSQL's commitment to improving query performance automatically, reducing the burden on developers to know every optimization detail while still allowing experts to write optimal queries when needed.
+
+## References
+
+- [Discussion Thread](https://www.postgresql.org/message-id/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com)
+- Related: `SupportRequestSimplify` for regular function expressions