Skip to content

Commit 2fb7325

Browse files
committed
JSONPath String Methods
1 parent d1fd5d8 commit 2fb7325

7 files changed

Lines changed: 168 additions & 0 deletions

File tree

src/SUMMARY.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
# 🇬🇧 English
44

55
- [2026](./en/2026/README.md)
6+
- [Week 13](./en/2026/13/README.md)
7+
- [JSONPath String Methods: Cleaning JSON Inside the Path—and a Long Debate About Immutability](./en/2026/13/jsonpath-string-methods.md)
68
- [Week 12](./en/2026/12/README.md)
79
- [Reduce Planning Time for Large NOT IN Lists Containing NULL](./en/2026/12/not-in-null-planning-optimization.md)
810
- [Week 11](./en/2026/11/README.md)
@@ -31,6 +33,8 @@
3133
# 🇨🇳 中文
3234

3335
- [2026](./cn/2026/README.md)
36+
- [第 13 周](./cn/2026/13/README.md)
37+
- [JSONPath 字符串方法:在路径里清洗 JSON,以及一场关于不可变性的长跑讨论](./cn/2026/13/jsonpath-string-methods.md)
3438
- [第 12 周](./cn/2026/12/README.md)
3539
- [缩短含 NULL 的大规模 NOT IN 列表的规划时间](./cn/2026/12/not-in-null-planning-optimization.md)
3640
- [第 11 周](./cn/2026/11/README.md)

src/cn/2026/13/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# 2026 年第 13 周
2+
3+
2026 年第 13 周 PostgreSQL 邮件列表讨论。
4+
5+
## 文章
6+
7+
- [JSONPath 字符串方法:在路径里清洗 JSON,以及一场关于不可变性的长跑讨论](./jsonpath-string-methods.md)
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# JSONPath 字符串方法:在路径里清洗 JSON,以及一场关于不可变性的长跑讨论
2+
3+
## 引言
4+
5+
处理“脏”JSON 时,常常要在比较之前做去空白、大小写转换、按分隔符拆分等操作。现在你可以在 `jsonb_path_query()` 等函数外围用 SQL 完成这些步骤,但不一定能在 **JSONPath 表达式内部**直接完成。Florents Tselai 在 `pgsql-hackers` 上发布了一系列补丁,为 JSONPath 增加常见的字符串方法:`lower()``upper()``initcap()``ltrim()` / `rtrim()` / `btrim()``replace()``split_part()`,实现上委托给 PostgreSQL 内置的字符串处理函数。
6+
7+
讨论很快从“能写什么表达式”扩展到更底层的问题:这些方法与 **易变性与不可变性**(依赖区域设置时的行为)、PostgreSQL 的 JSONPath 究竟应对齐 **SQL 标准** 还是互联网 RFC,以及现有带 `*_tz` 后缀的 JSONPath 入口在命名上的历史包袱。该工作登记在 [Commitfest 5270](https://commitfest.postgresql.org/patch/5270/)[邮件串](https://www.postgresql.org/message-id/flat/CA%2Bv5N40sJF39m0v7h%3DQN86zGp0CUf9F1WKasnZy9nNVj_VhCZQ%40mail.gmail.com) 从 2024 年延续到 2025 年,补丁版本迭代很多轮。
8+
9+
## 为何重要
10+
11+
JSONPath 是 `jsonb_path_*` 系列函数内嵌的领域语言。在路径里直接提供字符串原语,可以减少外层 SQL 嵌套,让意图集中在路径上,也更接近人们熟悉的“管道式”文本处理——只是作用在 JSON 标量上。对数据清洗场景(空白、大小写、分隔符拆分),这类能力在易用性上很有吸引力。
12+
13+
真正的难点在于:**大小写映射和许多字符串操作依赖区域设置(locale)**。PostgreSQL 的规划与优化依赖正确的易变性标注:若把仍可能随区域或系统环境变化的结果标成 `immutable`,会破坏优化器的基本假设。线程里大部分篇幅正是在处理这一矛盾。
14+
15+
## 技术分析
16+
17+
### 补丁在做什么
18+
19+
首版补丁让这些方法转发到已在 `pg_proc` 注册的实现;在 `JsonPathParseItem` 上为需要比传统左右操作数更灵活入参的方法增加了 `method_args``arg0``arg1`),并增加 `jspGetArgX()` 访问器。作者还加入了 `README.jsonpath`,说明今后如何新增方法,方便后续贡献者。提案中的用法示例如下:
20+
21+
```sql
22+
SELECT jsonb_path_query('" hElLo WorlD "', '$.btrim().lower().upper().lower().replace("hello","bye") starts with "bye"');
23+
SELECT jsonb_path_query('"abc~(at)~def~@~ghi"', '$.split_part("~(at)~", 2)');
24+
```
25+
26+
首帖列出的待决事项包括:若将来 **SQL/JSON 标准**定义了同名方法如何避免冲突(相关讨论里提到过 `pg_` 等前缀)、与现有 JSONPath 代码一致的 **默认排序规则**、尚缺的用户文档,以及类似 `CREATE JSONPATH FUNCTION` 的可扩展性设想。
27+
28+
### 补丁演进(概览)
29+
30+
可下载的系列从 **v1****v18**;自 **v6** 起每个版本拆成两个补丁:一个重命名 JSONPath 方法实参相关的词法/语法标记,另一个承载字符串方法本体。中间多轮修订涉及测试、`jsonpath_scan.l` 等文件的变基与冲突消解。较新的版本在 `doc/src/sgml/func/func-json.sgml` 中补充了 **SGML 文档**,并为 `jsonpath``jsonb_jsonpath` 等增加了 **回归测试**
31+
32+
## 社区观点
33+
34+
- **Tom Lane** 很早就指出 [不可变性问题](https://www.postgresql.org/message-id/145894.1727298237%40sss.pgh.pa.us):依赖区域设置的字符串操作无法保证 JSONPath 运算处处不可变;他引用提交 `cb599b9dd`,并对比 JSONPath 中为时区敏感日期时间引入的 `_tz` 分裂——他认为这种为每一种易变来源再复制一套入口的做法难以持续扩展。
35+
36+
- **Alexander Korotkov** 提出 **“灵活的易变性”** 设想:是否可以有辅助逻辑分析 JSONPath 参数是否为常量、路径中方法是否都“安全”,从而在受限情形下把 `jsonb_path_query()` 标为 `immutable`,否则标为 `stable`。他还询问这些新名字是否出现在 **SQL 标准**(或草案)中。
37+
38+
- **Florents Tselai** 提到 2019 年的相关讨论,并勾勒 **启发式**(若路径上各段都安全,则整体可视为不可变);他引用 [RFC 9535](https://www.rfc-editor.org/rfc/rfc9535.html#name-function-extensions) 中的“函数扩展”,认为厂商扩展在规范上有依据,而易变性属于实现细节。
39+
40+
- **David E. Wheeler** 说明:PostgreSQL 的 JSONPath 跟踪的是 **SQL 标准中的 SQL/JSON**,而不是 RFC 9535;公开 RFC 中的扩展机制与 SQL 标准文本中的规则未必一致。他同时关心 **可扩展钩子**(词法、语法、执行器);Florents 概括了步骤:新增 `JsonPathItemType`、修改 `jsonpath_scan.l` / `jsonpath_gram.y`、在 `executeItemOptUnwrapTarget` 中分发。
41+
42+
- **Robert Haas** 把问题部分归结为 **通用策略**(依赖操作系统时间或区域设置时,函数往往是 `stable`),并类比现有的 **`json_path_exists``json_path_exists_tz`**:可以考虑让带后缀的一组函数承担更多“非纯”行为,而在无后缀版本中报错——同时承认 `_tz` 这个名字对“区域设置”并不贴切。
43+
44+
- **David Wheeler** 将 Tom 所说的“难扩展”理解为:不会为每一种易变来源都造一套 `json_path_exists_*`;并讨论是否 **改名** 或泛化。在欧洲 PostgreSQL 社区会议上的讨论之后,Florents 总结了较务实的路线:把新行为放在 **`jsonb_path_*_tz` 家族**下、在非 `_tz` 变体中拒绝使用并 **写清文档**——在命名上与现有 `_tz` 保持一致,尽管语义上已不只在说时区。
45+
46+
- **命名与 API 膨胀**:David 提议是否引入 `_stable` 一类后缀;**Robert** 描述了Deprecation、GUC、多年迁移等重流程,并认为抱怨难以避免。**Tom Lane** 认为不必为迁移投入那么大精力,用户若执意保留旧名可以包一层 **包装函数**,并反问:为何不能 **新增一组更好的名字且永不删除旧名****Robert** 表示若能接受多出来的符号也可以。**Florents** 则指出再并行一套五个 `jsonb_path_*` 会显著增加 API 表面积。**David** 还从索引化角度追问:现实中 `_tz` 路径是否适合索引场景;讨论中也提到生成列以及未来虚拟生成列等用法。
47+
48+
## 技术细节
49+
50+
### 不可变性与区域设置
51+
52+
PostgreSQL 区分 `immutable`(在规划器假设下,相同输入应产生相同结果)与 `stable`(在一次语句执行过程中可能变化,例如随会话或环境)。区域数据可能随操作系统或 ICU 更新而改变,因此依赖区域的函数通常不能标为 `immutable`。JSONPath 若深度参与表达式求值与优化,必须与核心 SQL 函数遵守同一套规则。
53+
54+
### 标准立场
55+
56+
讨论中区分了两类文献:
57+
58+
- **RFC 9535**(IETF JSONPath):公开,描述扩展点。
59+
- **SQL/JSON 与 SQL 标准中的 JSONPath**(PostgreSQL 对齐目标):文本不公开,扩展规则可能与 RFC 不同。
60+
61+
即便规范允许厂商扩展方法名,未来标准若占用同名仍有风险——首帖提出的命名策略问题仍未消失。
62+
63+
### 实现层面
64+
65+
除易变性外,新增方法会牵动 JSONPath 词法分析器、语法、执行器与测试。将“重命名标记”和“功能本体”拆成两个补丁,有利于在版本号变多之后仍保持评审可控。
66+
67+
## 现状
68+
69+
补丁系列在多轮变基中持续演进(抓取数据中可见至 **v18**)。工作条目见 [Commitfest 5270](https://commitfest.postgresql.org/patch/5270/),也可在 [GitHub PR](https://github.com/Florents-Tselai/postgres/pull/18) 浏览。2025 年 5 月的讨论在实现策略(`*_tz` 表面、非后缀版本报错、文档)与命名取舍上趋于务实,但本邮件串片段中未见最终合入主线的结论。
70+
71+
## 结语
72+
73+
JSONPath 字符串方法回应的是真实的产品需求:在路径表达式内完成规范化与拆分。社区回应的焦点与其说在“要不要这个功能”,不如说在 **易变性标注是否正确****与 SQL/JSON 的关系**,以及 **`jsonb_path_*``_tz` 后缀** 的长期可维护性。这条线程很好地展示了 PostgreSQL 在扩展内嵌 DSL 时,如何在标准符合性、优化器正确性与 API 卫生之间取舍。

src/cn/2026/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
## 各周
66

7+
- [第 13 周](/cn/2026/13/index.html)
8+
- [JSONPath 字符串方法:在路径里清洗 JSON,以及一场关于不可变性的长跑讨论](/cn/2026/13/jsonpath-string-methods.html)
79
- [第 12 周](/cn/2026/12/index.html)
810
- [缩短含 NULL 的大规模 NOT IN 列表的规划时间](/cn/2026/12/not-in-null-planning-optimization.html)
911
- [第 11 周](/cn/2026/11/index.html)

src/en/2026/13/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Week 13 (2026)
2+
3+
PostgreSQL mailing list discussions for Week 13, 2026.
4+
5+
## Articles
6+
7+
- [JSONPath String Methods: Cleaning JSON Inside the Path—and a Long Debate About Immutability](./jsonpath-string-methods.md)
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# JSONPath String Methods: Cleaning JSON Inside the Path—and a Long Debate About Immutability
2+
3+
## Introduction
4+
5+
Working with messy JSON often means trimming, case-folding, and splitting strings before comparisons. Today you can do that in SQL around `jsonb_path_query()` and friends, but not always *inside* the JSONPath expression itself. Florents Tselai posted a patch series to `pgsql-hackers` that adds familiar string helpers—`lower()`, `upper()`, `initcap()`, `ltrim()` / `rtrim()` / `btrim()`, `replace()`, and `split_part()`—as JSONPath methods, delegating to PostgreSQL’s built-in string functions.
6+
7+
The thread quickly grew into a broader discussion: how these operations interact with **volatility and immutability** (locale-dependent behavior), whether PostgreSQL’s JSONPath should track the **SQL standard** or Internet RFCs, and what to do about the naming wart around the existing `*_tz` JSONPath entry points. The work is registered on [Commitfest 5270](https://commitfest.postgresql.org/patch/5270/); the [mailing list thread](https://www.postgresql.org/message-id/flat/CA%2Bv5N40sJF39m0v7h%3DQN86zGp0CUf9F1WKasnZy9nNVj_VhCZQ%40mail.gmail.com) spans 2024 through 2025 with many patch revisions.
8+
9+
## Why This Matters
10+
11+
JSONPath is the embedded language used by `jsonb_path_*` routines. Adding string primitives in-path reduces nested SQL, keeps intent local to the path, and matches how people already think about Unix-style text pipelines—only applied to JSON scalars. For data cleaning (whitespace, casing, delimiter splitting), the feature is ergonomically strong.
12+
13+
The catch is that **case mapping and many string operations depend on locale**. PostgreSQL’s planner relies on correct volatility labels: marking something `immutable` when it can change with locale or OS updates breaks assumptions. That tension is what drives most of the thread after the initial patch post.
14+
15+
## Technical Analysis
16+
17+
### What the patch adds
18+
19+
The author’s first revision introduces methods that forward to `pg_proc`-registered implementations, extends `JsonPathParseItem` with `method_args` (`arg0`, `arg1`) for methods that need more than the usual left/right operand pattern, and adds `jspGetArgX()` accessors. A `README.jsonpath` file documents how to add new methods for future contributors. Example shapes from the proposal:
20+
21+
```sql
22+
SELECT jsonb_path_query('" hElLo WorlD "', '$.btrim().lower().upper().lower().replace("hello","bye") starts with "bye"');
23+
SELECT jsonb_path_query('"abc~(at)~def~@~ghi"', '$.split_part("~(at)~", 2)');
24+
```
25+
26+
Open items in the first post included: possible **name collisions** if the SQL/JSON standard later defines methods with the same names (prefixes such as `pg_` were mentioned in an earlier related thread), **default collation** behavior (consistent with existing JSONPath code), missing user-facing documentation, and a half-formed idea of `CREATE JSONPATH FUNCTION`-style extensibility.
27+
28+
### Patch evolution (high level)
29+
30+
The fetchable series runs from **v1** through **v18**, with **v6** onward split into two patches per version: one renaming JSONPath method-argument tokens, the other carrying the string-method implementation. Intermediate revisions iterate on tests, grammar, and rebases against `jsonpath_scan.l` and related files. Later revisions add substantial **SGML documentation** under `doc/src/sgml/func/func-json.sgml` and **regression tests** for `jsonpath`, `jsonb_jsonpath`, and related SQL.
31+
32+
## Community Insights
33+
34+
- **Tom Lane** [raised immutability early](https://www.postgresql.org/message-id/145894.1727298237%40sss.pgh.pa.us): string methods tied to locale mean JSONPath operations are not universally immutable, pointing to commit `cb599b9dd` and contrasting with the `_tz` split used for time-zone-sensitive datetime behavior in JSONPath—a pattern he argued does not scale cleanly to many sources of mutability.
35+
36+
- **Alexander Korotkov** floated **“flexible mutability”**: a companion could analyze whether the JSONPath argument is constant and whether all methods used are “safe,” so `jsonb_path_query()` might be labeled `immutable` in restricted cases and `stable` otherwise. He also asked whether the new names appear in the **SQL standard** (or a draft).
37+
38+
- **Florents Tselai** noted prior discussion from 2019, sketched a **heuristic** (if all elements are safe, treat the path as immutable), and cited [RFC 9535](https://www.rfc-editor.org/rfc/rfc9535.html#name-function-extensions) “function extensions” as covering vendor additions—while acknowledging mutability as an implementation concern.
39+
40+
- **David E. Wheeler** clarified that **PostgreSQL’s JSONPath follows the SQL/JSON track in the SQL standard**, not RFC 9535; extension facilities in the public RFC are not automatically the same as SQL’s rules. He remained interested in **extensibility hooks** (lexer, parser, executor) and asked what hooks would look like; Florents outlined steps: new `JsonPathItemType`, lexer/parser changes in `jsonpath_scan.l` / `jsonpath_gram.y`, and a dispatch hook from `executeItemOptUnwrapTarget`.
41+
42+
- **Robert Haas** reframed the issue as partly a **general policy gap** (immutability when depending on OS time or locale is often `stable`), compared to the existing **`json_path_exists` vs `json_path_exists_tz`** split, and suggested extending the “tz” family to accept locale-dependent operations and **erroring in the non-suffixed functions**—while admitting the `_tz` name becomes misleading.
43+
44+
- **David Wheeler** read Tom’s “doesn’t scale” comment as “we will not add `json_path_exists_foo` for every mutable concern,” and wondered about **renaming** or generalizing. After discussion at a **PostgreSQL community event**, Florents summarized a **pragmatic path**: add the new behavior under the **`jsonb_path_*_tz` family**, reject it in non-`_tz` variants, and document clearly—keeping naming aligned with existing `_tz` functions despite the semantic stretch.
45+
46+
- **Naming clutter**: David suggested introducing `_stable`-style names; **Robert** outlined a heavy deprecation/GUC path and predicted complaints regardless. **Tom Lane** saw little value in elaborate migration machinery, noted **wrapper functions** preserve old names for stubborn apps, and asked what is wrong with **new names alongside old ones without removal**. **Robert** accepted extra clutter if consensus prefers it. **Florents** noted a third parallel set of five `jsonb_path_*` functions would add API surface. **David** questioned indexability of `_tz` usage in the wild; the thread touches on generated columns and future virtual generated columns as where timestamps from JSON often land.
47+
48+
## Technical Details
49+
50+
### Immutability and locale
51+
52+
PostgreSQL distinguishes `immutable` (same inputs → same outputs for the life of a query plan’s assumptions) from `stable` (can change within a statement, e.g., with session or environment). Locale definitions can change when the OS or ICU data is updated, so functions that depend on them are typically not `immutable`. JSONPath integration must respect the same rules as core SQL functions if paths are embedded in expression evaluation and optimization.
53+
54+
### Standards positioning
55+
56+
The thread distinguishes:
57+
58+
- **RFC 9535** (JSONPath, IETF): public, documents extension points.
59+
- **SQL/JSON and SQL-standard JSONPath** (what PostgreSQL targets): not fully public; extension and naming rules may differ.
60+
61+
Vendor-specific method names may still need care to avoid future standard collisions—prefixing or naming conventions remain an open design concern from the first post.
62+
63+
### Implementation surface
64+
65+
Beyond volatility, adding methods touches the JSONPath lexer, grammar, executor, and tests. Splitting patches (rename tokens vs feature body) keeps review manageable once the series grows.
66+
67+
## Current Status
68+
69+
The patch series remained under active development through multiple rebases (through **v18** in the downloaded artifacts). It is listed on [Commitfest 5270](https://commitfest.postgresql.org/patch/5270/), with a [GitHub branch/PR](https://github.com/Florents-Tselai/postgres/pull/18) for readers who prefer that view. The mailing list discussion in May 2025 converged on practical next steps (surface under `*_tz`, strict errors elsewhere, documentation) and naming trade-offs, without a final commit message in this thread snapshot.
70+
71+
## Conclusion
72+
73+
JSONPath string methods address a real ergonomics gap: in-path normalization and splitting of JSON string data. The community response centers less on the feature’s utility than on **correct volatility**, **alignment with SQL/JSON**, and **API sprawl** around `jsonb_path_*` and the `_tz` suffix. The thread is a useful snapshot of how PostgreSQL balances standards compliance, planner correctness, and long-term maintainability when extending a DSL that lives inside core.

src/en/2026/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ PostgreSQL Weekly posts for 2026.
44

55
## Weeks
66

7+
- [Week 13](/en/2026/13/index.html)
8+
- [JSONPath String Methods: Cleaning JSON Inside the Path—and a Long Debate About Immutability](/en/2026/13/jsonpath-string-methods.html)
79
- [Week 12](/en/2026/12/index.html)
810
- [Reduce Planning Time for Large NOT IN Lists Containing NULL](/en/2026/12/not-in-null-planning-optimization.html)
911
- [Week 11](/en/2026/11/index.html)

0 commit comments

Comments
 (0)