Skip to content

Commit 9c50a71

Browse files
authored
Fix path navigation on map columns for spath command (#5149)
* Fix path navigation bug in qualified name resolver and add spath tests Signed-off-by: Chen Dai <daichen@amazon.com> * Add spath UT and IT with other commands Signed-off-by: Chen Dai <daichen@amazon.com> * Move aliasing logic from resolver to project item expansion Signed-off-by: Chen Dai <daichen@amazon.com> * Update javadoc and readme Signed-off-by: Chen Dai <daichen@amazon.com> --------- Signed-off-by: Chen Dai <daichen@amazon.com>
1 parent 7a6a3a5 commit 9c50a71

5 files changed

Lines changed: 161 additions & 16 deletions

File tree

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -453,7 +453,16 @@ private List<RexNode> expandProjectFields(
453453
}
454454
matchingFields.forEach(f -> expandedFields.add(context.relBuilder.field(f)));
455455
} else if (addedFields.add(fieldName)) {
456-
expandedFields.add(rexVisitor.analyze(field, context));
456+
RexNode resolved = rexVisitor.analyze(field, context);
457+
/*
458+
* Dotted path access is resolved into ITEM(map, path) function call without aliasing.
459+
* Re-apply the alias so the projected column retains the user-visible name.
460+
* TODO: Introduce path navigation semantics without relying on projection-time aliasing.
461+
*/
462+
if (resolved.getKind() == SqlKind.ITEM) {
463+
resolved = context.relBuilder.alias(resolved, fieldName);
464+
}
465+
expandedFields.add(resolved);
457466
}
458467
}
459468
case AllFields ignored -> {

core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ private static RexNode resolveInNonJoinCondition(
9595
private static String joinParts(List<String> parts, int start, int length) {
9696
StringBuilder sb = new StringBuilder();
9797
for (int i = 0; i < length; i++) {
98-
if (start < i) {
98+
if (i > 0) {
9999
sb.append(".");
100100
}
101101
sb.append(parts.get(start + i));
@@ -289,9 +289,7 @@ private static RexNode resolveFieldAccess(
289289
return field;
290290
} else {
291291
String itemName = joinParts(parts, length + start, parts.size() - length);
292-
return context.relBuilder.alias(
293-
createItemAccess(field, itemName, context),
294-
String.join(QualifiedName.DELIMITER, parts.subList(start, parts.size())));
292+
return createItemAccess(field, itemName, context);
295293
}
296294
}
297295

docs/user/ppl/cmd/spath.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -143,25 +143,25 @@ fetched rows / total rows = 3/3
143143

144144
## Example 5: Auto-extract mode
145145

146-
When `path` is omitted, `spath` extracts all fields from the JSON into a map. All values are stringified, and null values are preserved:
146+
When `path` is omitted, `spath` extracts all fields from the JSON into a map. You can access individual values using dotted path navigation, where `doc.user.name` resolves to the map key `user.name`. For keys containing special characters like `{}`, use backtick quoting:
147147

148148
```ppl
149149
source=structured
150-
| spath input=doc_auto output=result
151-
| fields doc_auto result
150+
| spath input=doc_auto output=doc
151+
| fields doc_auto, doc.user.name, doc.user.age, doc.`tags{}`, doc.active
152152
```
153153

154154
The query returns the following results:
155155

156156
```text
157157
fetched rows / total rows = 3/3
158-
+---------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
159-
| doc_auto | result |
160-
|---------------------------------------------------------------------------------+------------------------------------------------------------------------------------|
161-
| {"user":{"name":"John","age":30},"tags":["java","sql"],"active":true} | {'user.age': '30', 'tags{}': '[java, sql]', 'user.name': 'John', 'active': 'true'} |
162-
| {"user":{"name":"Jane","age":25},"tags":["python"],"active":null} | {'user.age': '25', 'tags{}': 'python', 'user.name': 'Jane', 'active': 'null'} |
163-
| {"user":{"name":"Bob","age":35},"tags":["go","rust","sql"],"user.name":"Bobby"} | {'user.age': '35', 'tags{}': '[go, rust, sql]', 'user.name': '[Bob, Bobby]'} |
164-
+---------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
158+
+---------------------------------------------------------------------------------+---------------+--------------+-----------------+------------+
159+
| doc_auto | doc.user.name | doc.user.age | doc.tags{} | doc.active |
160+
|---------------------------------------------------------------------------------+---------------+--------------+-----------------+------------|
161+
| {"user":{"name":"John","age":30},"tags":["java","sql"],"active":true} | John | 30 | [java, sql] | true |
162+
| {"user":{"name":"Jane","age":25},"tags":["python"],"active":null} | Jane | 25 | python | null |
163+
| {"user":{"name":"Bob","age":35},"tags":["go","rust","sql"],"user.name":"Bobby"} | [Bob, Bobby] | 35 | [go, rust, sql] | null |
164+
+---------------------------------------------------------------------------------+---------------+--------------+-----------------+------------+
165165
```
166166

167167
The flattening rules demonstrated in this example:
@@ -171,4 +171,3 @@ The flattening rules demonstrated in this example:
171171
- Duplicate logical keys merge into arrays: in the third row, both `"user": {"name": "Bob"}` (nested) and `"user.name": "Bobby"` (direct dotted key) resolve to the same key `user.name`, so their values merge into `'[Bob, Bobby]'`
172172
- All values are strings: numeric `30` becomes `'30'`, boolean `true` becomes `'true'`, and arrays become strings like `'[java, sql]'`
173173
- Null values are preserved: in the second row, `"active": null` is kept as `'active': 'null'` in the map
174-

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
import static org.opensearch.sql.util.MatcherUtils.rows;
99
import static org.opensearch.sql.util.MatcherUtils.schema;
1010
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;
11+
import static org.opensearch.sql.util.MatcherUtils.verifyDataRowsInOrder;
1112
import static org.opensearch.sql.util.MatcherUtils.verifySchema;
1213

1314
import java.io.IOException;
@@ -48,6 +49,17 @@ public void init() throws Exception {
4849
+ " \"malformed_doc\": \"{\\\"user\\\":{\\\"name\\\":\"}");
4950
client().performRequest(autoExtractDoc);
5051

52+
// Auto-extract mode: 2-doc index for spath + command (eval/where/stats/sort) tests
53+
Request cmdDoc1 = new Request("PUT", "/test_spath_cmd/_doc/1?refresh=true");
54+
cmdDoc1.setJsonEntity(
55+
"{\"doc\": \"{\\\"user\\\":{\\\"name\\\":\\\"John\\\",\\\"age\\\":30}}\"}");
56+
client().performRequest(cmdDoc1);
57+
58+
Request cmdDoc2 = new Request("PUT", "/test_spath_cmd/_doc/2?refresh=true");
59+
cmdDoc2.setJsonEntity(
60+
"{\"doc\": \"{\\\"user\\\":{\\\"name\\\":\\\"Alice\\\",\\\"age\\\":25}}\"}");
61+
client().performRequest(cmdDoc2);
62+
5163
// Auto-extract mode: null input handling (doc 1 establishes mapping, doc 2 has null)
5264
Request nullDoc1 = new Request("PUT", "/test_spath_null/_doc/1?refresh=true");
5365
nullDoc1.setJsonEntity("{\"doc\": \"{\\\"n\\\": 1}\"}");
@@ -163,4 +175,45 @@ public void testSpathAutoExtractMalformedJson() throws IOException {
163175
verifySchema(result, schema("result", "struct"));
164176
verifyDataRows(result, rows(new JSONObject("{}")));
165177
}
178+
179+
@Test
180+
public void testSpathAutoExtractWithEval() throws IOException {
181+
JSONObject result =
182+
executeQuery(
183+
"source=test_spath_cmd | spath input=doc"
184+
+ " | eval name = doc.user.name | fields name");
185+
verifySchema(result, schema("name", "string"));
186+
verifyDataRows(result, rows("Alice"), rows("John"));
187+
}
188+
189+
@Test
190+
public void testSpathAutoExtractWithWhere() throws IOException {
191+
JSONObject result =
192+
executeQuery(
193+
"source=test_spath_cmd | spath input=doc"
194+
+ " | where doc.user.name = 'John' | fields doc.user.name");
195+
verifySchema(result, schema("doc.user.name", "string"));
196+
verifyDataRows(result, rows("John"));
197+
}
198+
199+
@Test
200+
public void testSpathAutoExtractWithStats() throws IOException {
201+
JSONObject result =
202+
executeQuery(
203+
"source=test_spath_cmd | spath input=doc"
204+
+ " | stats sum(doc.user.age) by doc.user.name");
205+
verifySchema(result, schema("sum(doc.user.age)", "double"), schema("doc.user.name", "string"));
206+
verifyDataRows(result, rows(25, "Alice"), rows(30, "John"));
207+
}
208+
209+
@Test
210+
public void testSpathAutoExtractWithSort() throws IOException {
211+
// spath auto-extract + sort by path navigation on result
212+
JSONObject result =
213+
executeQuery(
214+
"source=test_spath_cmd | spath input=doc"
215+
+ " | sort doc.user.name | fields doc.user.name");
216+
verifySchema(result, schema("doc.user.name", "string"));
217+
verifyDataRowsInOrder(result, rows("Alice"), rows("John"));
218+
}
166219
}

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLSpathTest.java

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,4 +58,90 @@ public void testSpathAutoExtractModeWithOutput() {
5858
+ " LogicalTableScan(table=[[scott, EMP]])\n")
5959
.expectSparkSQL("SELECT JSON_EXTRACT_ALL(`ENAME`) `result`\n" + "FROM `scott`.`EMP`");
6060
}
61+
62+
@Test
63+
public void testSpathAutoExtractModeWithEval() {
64+
withPPLQuery(
65+
"source=EMP | spath input=ENAME output=result"
66+
+ " | eval age = result.user.age + 1 | fields age")
67+
.expectLogical(
68+
"LogicalProject(age=[+(SAFE_CAST(ITEM(JSON_EXTRACT_ALL($1),"
69+
+ " 'user.age')), 1.0E0)])\n"
70+
+ " LogicalTableScan(table=[[scott, EMP]])\n")
71+
.expectSparkSQL(
72+
"SELECT TRY_CAST(JSON_EXTRACT_ALL(`ENAME`)['user.age']"
73+
+ " AS DOUBLE) + 1.0E0 `age`\n"
74+
+ "FROM `scott`.`EMP`");
75+
}
76+
77+
@Test
78+
public void testSpathAutoExtractModeWithStats() {
79+
withPPLQuery("source=EMP | spath input=ENAME output=result | stats count() by result.user.name")
80+
.expectLogical(
81+
"LogicalProject(count()=[$1], result.user.name=[$0])\n"
82+
+ " LogicalAggregate(group=[{0}], count()=[COUNT()])\n"
83+
+ " LogicalProject(result.user.name=[ITEM(JSON_EXTRACT_ALL($1),"
84+
+ " 'user.name')])\n"
85+
+ " LogicalTableScan(table=[[scott, EMP]])\n")
86+
.expectSparkSQL(
87+
"SELECT COUNT(*) `count()`,"
88+
+ " JSON_EXTRACT_ALL(`ENAME`)['user.name'] `result.user.name`\n"
89+
+ "FROM `scott`.`EMP`\n"
90+
+ "GROUP BY JSON_EXTRACT_ALL(`ENAME`)['user.name']");
91+
}
92+
93+
@Test
94+
public void testSpathAutoExtractModeWithWhere() {
95+
withPPLQuery("source=EMP | spath input=ENAME output=result" + " | where result.active = 'true'")
96+
.expectLogical(
97+
"LogicalFilter(condition=[=(ITEM($8, 'active'),"
98+
+ " 'true')])\n"
99+
+ " LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3],"
100+
+ " HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7],"
101+
+ " result=[JSON_EXTRACT_ALL($1)])\n"
102+
+ " LogicalTableScan(table=[[scott, EMP]])\n")
103+
.expectSparkSQL(
104+
"SELECT *\n"
105+
+ "FROM (SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`,"
106+
+ " `SAL`, `COMM`, `DEPTNO`, JSON_EXTRACT_ALL(`ENAME`) `result`\n"
107+
+ "FROM `scott`.`EMP`) `t`\n"
108+
+ "WHERE `result`['active'] = 'true'");
109+
}
110+
111+
@Test
112+
public void testSpathAutoExtractModeWithFields() {
113+
withPPLQuery(
114+
"source=EMP | spath input=ENAME output=result"
115+
+ " | fields result.user.name, result.user.age")
116+
.expectLogical(
117+
"LogicalProject(result.user.name=[ITEM(JSON_EXTRACT_ALL($1), 'user.name')],"
118+
+ " result.user.age=[ITEM(JSON_EXTRACT_ALL($1), 'user.age')])\n"
119+
+ " LogicalTableScan(table=[[scott, EMP]])\n")
120+
.expectSparkSQL(
121+
"SELECT JSON_EXTRACT_ALL(`ENAME`)['user.name'] `result.user.name`,"
122+
+ " JSON_EXTRACT_ALL(`ENAME`)['user.age'] `result.user.age`\n"
123+
+ "FROM `scott`.`EMP`");
124+
}
125+
126+
@Test
127+
public void testSpathAutoExtractModeWithSort() {
128+
withPPLQuery("source=EMP | spath input=ENAME output=result" + " | sort result.user.name")
129+
.expectLogical(
130+
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3],"
131+
+ " HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], result=[$8])\n"
132+
+ " LogicalSort(sort0=[$9], dir0=[ASC-nulls-first])\n"
133+
+ " LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3],"
134+
+ " HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7],"
135+
+ " result=[JSON_EXTRACT_ALL($1)],"
136+
+ " $f9=[ITEM(JSON_EXTRACT_ALL($1), 'user.name')])\n"
137+
+ " LogicalTableScan(table=[[scott, EMP]])\n")
138+
.expectSparkSQL(
139+
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`,"
140+
+ " `SAL`, `COMM`, `DEPTNO`, `result`\n"
141+
+ "FROM (SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`,"
142+
+ " `SAL`, `COMM`, `DEPTNO`, JSON_EXTRACT_ALL(`ENAME`) `result`,"
143+
+ " JSON_EXTRACT_ALL(`ENAME`)['user.name'] `$f9`\n"
144+
+ "FROM `scott`.`EMP`\n"
145+
+ "ORDER BY 10) `t0`");
146+
}
61147
}

0 commit comments

Comments
 (0)