Skip to content

Commit e1e7fc2

Browse files
committed
Add blog: PostgreSQL planner COUNT(*) optimization (Week 04)
1 parent 96acc18 commit e1e7fc2

5 files changed

Lines changed: 364 additions & 0 deletions

File tree

src/SUMMARY.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
# 🇬🇧 English
44

55
- [2026](./en/2026/README.md)
6+
- [Week 04](./en/2026/04/README.md)
7+
- [PostgreSQL Planner Optimization: Automatic COUNT(*) Conversion](./en/2026/04/planner-count-optimization.md)
68
- [Week 03](./en/2026/03/README.md)
79
- [Extended Statistics Import/Export Functions](./en/2026/03/extended-statistics-import-functions.md)
810
- [pg_plan_advice: Query Plan Control](./en/2026/03/pg-plan-advice.md)
@@ -12,6 +14,8 @@
1214
# 🇨🇳 中文
1315

1416
- [2026](./cn/2026/README.md)
17+
- [第 04 周](./cn/2026/04/README.md)
18+
- [PostgreSQL 查询规划器优化:自动 COUNT(*) 转换](./cn/2026/04/planner-count-optimization.md)
1519
- [第 03 周](./cn/2026/03/README.md)
1620
- [扩展统计信息导入/导出功能](./cn/2026/03/extended-statistics-import-functions.md)
1721
- [pg_plan_advice:查询计划控制](./cn/2026/03/pg-plan-advice.md)

src/cn/2026/04/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# 第 04 周(2026)
2+
3+
2026 年第 04 周的 PostgreSQL 邮件列表讨论。
4+
5+
🇬🇧 [English Version](../../../en/2026/04/README.md)
6+
7+
## 文章
8+
9+
- [PostgreSQL 查询规划器优化:自动 COUNT(*) 转换](./planner-count-optimization.md)
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# PostgreSQL 查询规划器优化:自动 COUNT(*) 转换
2+
3+
## 引言
4+
5+
2025 年 10 月,PostgreSQL 提交者 David Rowley 提出了一个重要的查询规划器优化,能够自动将 `COUNT(1)``COUNT(not_null_col)` 表达式转换为 `COUNT(*)`。这个优化解决了一个常见的性能反模式:开发者认为 `COUNT(1)` 等同于 `COUNT(*)`,但实际上 `COUNT(*)` 更高效。该补丁于 2025 年 11 月提交,并引入了用于聚合函数简化的新基础设施。
6+
7+
## 为什么这很重要
8+
9+
`COUNT(*)``COUNT(column)` 之间的性能差异可能非常显著,特别是对于大表。当计算特定列时,PostgreSQL 必须:
10+
11+
1. **解构元组**以提取列值
12+
2. **检查 NULL 值**(即使对于 NOT NULL 列,检查仍然会发生)
13+
3. **通过聚合函数处理列数据**
14+
15+
相比之下,`COUNT(*)` 可以在不访问单个列值的情况下计算行数,从而获得显著更好的性能。David Rowley 的基准测试显示,在包含 100 万行的表上,使用 `COUNT(*)` 而不是 `COUNT(not_null_col)` 可以获得约 **37% 的性能提升**
16+
17+
## 技术分析
18+
19+
### 基础设施:SupportRequestSimplifyAggref
20+
21+
该补丁引入了一个名为 `SupportRequestSimplifyAggref` 的新基础设施,类似于现有的用于常规函数表达式(`FuncExpr`)的 `SupportRequestSimplify`。由于聚合使用 `Aggref` 节点,因此需要一个单独的机制。
22+
23+
关键组件包括:
24+
25+
1. **新的支持节点类型**`supportnodes.h` 中的 `SupportRequestSimplifyAggref`
26+
2. **简化函数**`clauses.c` 中的 `simplify_aggref()`,在常量折叠期间调用聚合的支持函数
27+
3. **增强的可空性检查**:扩展 `expr_is_nonnullable()` 以处理 `Const` 节点,而不仅仅是 `Var` 节点
28+
29+
### 实现细节
30+
31+
优化在查询规划的常量折叠阶段执行,具体在 `eval_const_expressions_mutator()` 中。当遇到 `Aggref` 节点时,规划器会:
32+
33+
1. 检查聚合函数是否通过 `pg_proc.prosupport` 注册了支持函数
34+
2. 使用 `SupportRequestSimplifyAggref` 请求调用支持函数
35+
3. 如果支持函数返回简化的节点,则替换原始的 `Aggref`
36+
37+
对于 `COUNT` 聚合,支持函数(`int8_agg_support_simplify()`)会检查:
38+
39+
- 参数是否不可为空(使用 `expr_is_nonnullable()`
40+
- 聚合中是否没有 `ORDER BY``DISTINCT` 子句
41+
- 如果两个条件都满足,则将 `COUNT(ANY)` 转换为 `COUNT(*)`
42+
43+
### 代码示例
44+
45+
`int8.c` 中的核心简化逻辑:
46+
47+
```c
48+
static Node *
49+
int8_agg_support_simplify(SupportRequestSimplifyAggref *req)
50+
{
51+
Aggref *aggref = req->aggref;
52+
53+
/* 只处理 COUNT */
54+
if (aggref->aggfnoid != INT8_AGG_COUNT_OID)
55+
return NULL;
56+
57+
/* 必须恰好有一个参数 */
58+
if (list_length(aggref->args) != 1)
59+
return NULL;
60+
61+
/* 没有 ORDER BY 或 DISTINCT */
62+
if (aggref->aggorder != NIL || aggref->aggdistinct != NIL)
63+
return NULL;
64+
65+
/* 检查参数是否不可为空 */
66+
if (!expr_is_nonnullable(req->root,
67+
(Expr *) linitial(aggref->args),
68+
true))
69+
return NULL;
70+
71+
/* 转换为 COUNT(*) */
72+
return make_count_star_aggref(aggref);
73+
}
74+
```
75+
76+
## 补丁演进
77+
78+
该补丁经历了四次迭代,每次都在改进实现:
79+
80+
### 版本 1(初始提案)
81+
- 引入基本基础设施
82+
- 使用 `SysCache` 获取 `pg_proc` 元组
83+
84+
### 版本 2(代码清理)
85+
- 用 `get_func_support()` 函数替换 `SysCache` 查找
86+
- 更清晰、更高效的方法
87+
88+
### 版本 3(移除实验性代码)
89+
- 移除了处理 `COUNT(NULL)` 优化的 `#ifdef NOT_USED` 块
90+
- 清理了未使用的包含文件
91+
- 改进了注释
92+
93+
### 版本 4(最终版本)
94+
- 在提交 `b140c8d7a` 后重新基于
95+
- 修复了支持函数总是返回 `Aggref` 的假设
96+
- 允许支持函数返回其他节点类型(例如常量),以实现更激进的优化
97+
- 这种灵活性使得未来的优化成为可能,例如将 `COUNT(NULL)` 转换为 `'0'::bigint`
98+
99+
## 社区见解
100+
101+
### 审查者反馈
102+
103+
**Corey Huinker** 提供了积极的反馈:
104+
- +1 支持自动查询改进
105+
- 指出我们无法教育所有人 `COUNT(1)` 是反模式,所以让它不再是反模式是正确的做法
106+
- 确认补丁可以干净地应用且测试通过
107+
108+
**Matheus Alcantara** 也进行了审查和测试:
109+
- 确认基准测试中约 30% 的性能提升
110+
- 验证了代码放置与现有的 `SupportRequestSimplify` 基础设施一致
111+
- +1 支持这个想法
112+
113+
### 设计决策
114+
115+
**优化的时机**:优化在常量折叠期间发生,这是规划过程的早期阶段。David 考虑过是否应该在稍后(在 `add_base_clause_to_rel()` 之后)进行,以捕获如下情况:
116+
117+
```sql
118+
SELECT count(nullable_col) FROM t WHERE nullable_col IS NOT NULL;
119+
```
120+
121+
但是,它必须在 `preprocess_aggref()` 之前发生,该函数将具有相同转换函数的聚合分组。当前的位置与常规函数的 `SupportRequestSimplify` 一致。
122+
123+
**支持函数返回类型**:基础设施允许支持函数返回 `Aggref` 以外的节点。这个设计决策使得未来的优化成为可能,例如:
124+
-`COUNT(NULL)` 转换为 `'0'::bigint`
125+
- 对聚合进行更激进的常量折叠
126+
127+
## 性能考虑
128+
129+
该优化提供了显著的性能优势:
130+
131+
1. **减少元组解构**`COUNT(*)` 不需要从元组中提取列值
132+
2. **更少的 NULL 检查**:不需要检查单个列值
133+
3. **更好的缓存利用率**:更少的数据移动意味着更好的 CPU 缓存使用
134+
135+
对于具有多列的表,性能提升可能更加显著,因为 `COUNT(column)` 可能需要解构许多列才能到达目标列。
136+
137+
## 边界情况和限制
138+
139+
优化仅在以下情况下应用:
140+
141+
1. 列被证明不可为空(NOT NULL 约束或常量)
142+
2. 聚合中没有 `ORDER BY` 子句
143+
3. 聚合中没有 `DISTINCT` 子句
144+
145+
**尚未**优化的情况:
146+
147+
- `COUNT(nullable_col)`,其中列可能为 NULL(即使在同一查询中通过 `WHERE nullable_col IS NOT NULL` 过滤)
148+
- `COUNT(col ORDER BY col)` - ORDER BY 阻止优化
149+
- `COUNT(DISTINCT col)` - DISTINCT 阻止优化
150+
151+
`WHERE` 子句的限制是由于优化的时机(在常量折叠期间,在关系信息完全可用之前)。
152+
153+
## 当前状态
154+
155+
该补丁由 David Rowley 于 2025 年 11 月 26 日**提交**。它可在 PostgreSQL master 分支中使用,并将包含在 PostgreSQL 18 中。
156+
157+
## 结论
158+
159+
这个优化代表了 PostgreSQL 查询规划器的重大改进,自动修复了常见的性能反模式,而无需更改应用程序。新的 `SupportRequestSimplifyAggref` 基础设施也为未来的聚合优化打开了大门。
160+
161+
对于开发者和 DBA:
162+
- **无需操作**:优化会自动发生
163+
- **性能优势**:使用 `COUNT(1)``COUNT(not_null_col)` 的现有查询将自动变得更快
164+
- **最佳实践**:虽然规划器现在会优化这些情况,但 `COUNT(*)` 仍然是计算行数最清晰、最符合习惯的方式
165+
166+
这一变化体现了 PostgreSQL 对自动改进查询性能的承诺,减少了开发者了解每个优化细节的负担,同时仍然允许专家在需要时编写最优查询。
167+
168+
## 参考资料
169+
170+
- [讨论线程](https://www.postgresql.org/message-id/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com)
171+
- 相关:用于常规函数表达式的 `SupportRequestSimplify`

src/en/2026/04/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Week 04 (2026)
2+
3+
PostgreSQL mailing list discussions for Week 04, 2026.
4+
5+
🇨🇳 [中文版本](../../../cn/2026/04/README.md)
6+
7+
## Articles
8+
9+
- [PostgreSQL Planner Optimization: Automatic COUNT(*) Conversion](./planner-count-optimization.md)
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# PostgreSQL Planner Optimization: Automatic COUNT(*) Conversion
2+
3+
## Introduction
4+
5+
In October 2025, PostgreSQL committer David Rowley proposed a significant query planner optimization that automatically converts `COUNT(1)` and `COUNT(not_null_col)` expressions to `COUNT(*)`. This optimization addresses a common performance anti-pattern where developers write `COUNT(1)` thinking it's equivalent to `COUNT(*)`, when in fact `COUNT(*)` is more efficient. The patch was committed in November 2025 and introduces new infrastructure for aggregate function simplification.
6+
7+
## Why This Matters
8+
9+
The performance difference between `COUNT(*)` and `COUNT(column)` can be substantial, especially for large tables. When counting a specific column, PostgreSQL must:
10+
11+
1. **Deform the tuple** to extract the column value
12+
2. **Check for NULL values** (even for NOT NULL columns, the check still occurs)
13+
3. **Process the column data** through the aggregate function
14+
15+
In contrast, `COUNT(*)` can count rows without accessing individual column values, resulting in significantly better performance. David Rowley's benchmarks showed approximately **37% performance improvement** when using `COUNT(*)` instead of `COUNT(not_null_col)` on a table with 1 million rows.
16+
17+
## Technical Analysis
18+
19+
### The Infrastructure: SupportRequestSimplifyAggref
20+
21+
The patch introduces a new infrastructure called `SupportRequestSimplifyAggref`, which is similar to the existing `SupportRequestSimplify` used for regular function expressions (`FuncExpr`). Since aggregates use `Aggref` nodes, a separate mechanism was needed.
22+
23+
The key components include:
24+
25+
1. **New support node type**: `SupportRequestSimplifyAggref` in `supportnodes.h`
26+
2. **Simplification function**: `simplify_aggref()` in `clauses.c` that calls the aggregate's support function during constant folding
27+
3. **Enhanced nullability checking**: Extended `expr_is_nonnullable()` to handle `Const` nodes, not just `Var` nodes
28+
29+
### Implementation Details
30+
31+
The optimization is performed during the constant folding phase of query planning, specifically in `eval_const_expressions_mutator()`. When an `Aggref` node is encountered, the planner:
32+
33+
1. Checks if the aggregate function has a support function registered via `pg_proc.prosupport`
34+
2. Calls the support function with a `SupportRequestSimplifyAggref` request
35+
3. If the support function returns a simplified node, replaces the original `Aggref`
36+
37+
For the `COUNT` aggregate specifically, the support function (`int8_agg_support_simplify()`) checks:
38+
39+
- Whether the argument is non-nullable (using `expr_is_nonnullable()`)
40+
- Whether there are no `ORDER BY` or `DISTINCT` clauses in the aggregate
41+
- If both conditions are met, converts `COUNT(ANY)` to `COUNT(*)`
42+
43+
### Code Example
44+
45+
The core simplification logic in `int8.c`:
46+
47+
```c
48+
static Node *
49+
int8_agg_support_simplify(SupportRequestSimplifyAggref *req)
50+
{
51+
Aggref *aggref = req->aggref;
52+
53+
/* Only handle COUNT */
54+
if (aggref->aggfnoid != INT8_AGG_COUNT_OID)
55+
return NULL;
56+
57+
/* Must have exactly one argument */
58+
if (list_length(aggref->args) != 1)
59+
return NULL;
60+
61+
/* No ORDER BY or DISTINCT */
62+
if (aggref->aggorder != NIL || aggref->aggdistinct != NIL)
63+
return NULL;
64+
65+
/* Check if argument is non-nullable */
66+
if (!expr_is_nonnullable(req->root,
67+
(Expr *) linitial(aggref->args),
68+
true))
69+
return NULL;
70+
71+
/* Convert to COUNT(*) */
72+
return make_count_star_aggref(aggref);
73+
}
74+
```
75+
76+
## Patch Evolution
77+
78+
The patch went through four iterations, each refining the implementation:
79+
80+
### Version 1 (Initial Proposal)
81+
- Introduced the basic infrastructure
82+
- Used `SysCache` to fetch `pg_proc` tuples
83+
84+
### Version 2 (Code Cleanup)
85+
- Replaced `SysCache` lookup with `get_func_support()` function
86+
- Cleaner and more efficient approach
87+
88+
### Version 3 (Removed Experimental Code)
89+
- Removed `#ifdef NOT_USED` block that handled `COUNT(NULL)` optimization
90+
- Cleaned up unused includes
91+
- Improved comments
92+
93+
### Version 4 (Final Version)
94+
- Rebased after commit `b140c8d7a`
95+
- Fixed assumption that support function always returns an `Aggref`
96+
- Allows support functions to return other node types (e.g., constants) for more aggressive optimizations
97+
- This flexibility enables future optimizations like converting `COUNT(NULL)` to `'0'::bigint`
98+
99+
## Community Insights
100+
101+
### Reviewer Feedback
102+
103+
**Corey Huinker** provided positive feedback:
104+
- +1 for the automatic query improvement
105+
- Noted that we can't educate everyone that `COUNT(1)` is an anti-pattern, so making it not an anti-pattern is the right approach
106+
- Confirmed the patch applies cleanly and tests pass
107+
108+
**Matheus Alcantara** also reviewed and tested:
109+
- Confirmed ~30% performance improvement in benchmarks
110+
- Validated that the code placement is consistent with existing `SupportRequestSimplify` infrastructure
111+
- +1 for the idea
112+
113+
### Design Decisions
114+
115+
**Timing of Optimization**: The optimization happens during constant folding, which is early in the planning process. David considered whether it should happen later (after `add_base_clause_to_rel()`) to catch cases like:
116+
117+
```sql
118+
SELECT count(nullable_col) FROM t WHERE nullable_col IS NOT NULL;
119+
```
120+
121+
However, it must happen before `preprocess_aggref()`, which groups aggregates with the same transition function. The current placement is consistent with `SupportRequestSimplify` for regular functions.
122+
123+
**Support Function Return Type**: The infrastructure allows support functions to return nodes other than `Aggref`. This design decision enables future optimizations, such as:
124+
- Converting `COUNT(NULL)` to `'0'::bigint`
125+
- More aggressive constant folding for aggregates
126+
127+
## Performance Considerations
128+
129+
The optimization provides significant performance benefits:
130+
131+
1. **Reduced tuple deformation**: `COUNT(*)` doesn't need to extract column values from tuples
132+
2. **Fewer NULL checks**: No need to check individual column values
133+
3. **Better cache utilization**: Less data movement means better CPU cache usage
134+
135+
For tables with many columns, the performance gain can be even more substantial, as `COUNT(column)` might require deforming many columns to reach the target column.
136+
137+
## Edge Cases and Limitations
138+
139+
The optimization only applies when:
140+
141+
1. The column is provably non-nullable (NOT NULL constraint or constant)
142+
2. There are no `ORDER BY` clauses in the aggregate
143+
3. There are no `DISTINCT` clauses in the aggregate
144+
145+
Cases that are **not** optimized (yet):
146+
147+
- `COUNT(nullable_col)` where the column might be NULL (even if filtered by `WHERE nullable_col IS NOT NULL` in the same query)
148+
- `COUNT(col ORDER BY col)` - the ORDER BY prevents optimization
149+
- `COUNT(DISTINCT col)` - DISTINCT prevents optimization
150+
151+
The limitation with `WHERE` clauses is due to the timing of the optimization (during constant folding, before relation information is fully available).
152+
153+
## Current Status
154+
155+
The patch was **committed** by David Rowley on November 26, 2025. It's available in PostgreSQL master branch and will be included in PostgreSQL 18.
156+
157+
## Conclusion
158+
159+
This optimization represents a significant improvement to PostgreSQL's query planner, automatically fixing a common performance anti-pattern without requiring application changes. The new `SupportRequestSimplifyAggref` infrastructure also opens the door for future aggregate optimizations.
160+
161+
For developers and DBAs:
162+
- **No action required**: The optimization happens automatically
163+
- **Performance benefit**: Existing queries using `COUNT(1)` or `COUNT(not_null_col)` will automatically get faster
164+
- **Best practice**: While the planner now optimizes these cases, `COUNT(*)` remains the clearest and most idiomatic way to count rows
165+
166+
This change demonstrates PostgreSQL's commitment to improving query performance automatically, reducing the burden on developers to know every optimization detail while still allowing experts to write optimal queries when needed.
167+
168+
## References
169+
170+
- [Discussion Thread](https://www.postgresql.org/message-id/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com)
171+
- Related: `SupportRequestSimplify` for regular function expressions

0 commit comments

Comments
 (0)