Skip to content

Commit 7d6d983

Browse files
authored
Update graphlookup syntax (#5209)
* Update graphlookup syntax Signed-off-by: Lantao Jin <ltjin@amazon.com> * remove push mode Signed-off-by: Lantao Jin <ltjin@amazon.com> * remove unused tokens Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com>
1 parent a873e47 commit 7d6d983

10 files changed

Lines changed: 241 additions & 239 deletions

File tree

core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@
2323
/**
2424
* AST node for graphLookup command. Performs BFS graph traversal on a lookup table.
2525
*
26-
* <p>Example: source=employees | graphLookup employees fromField=manager toField=name maxDepth=3
27-
* depthField=level direction=uni as hierarchy
26+
* <p>Example: source=employees | graphLookup employees start=reportsTo edge=manager-->name
27+
* maxDepth=3 depthField=level as hierarchy
2828
*/
2929
@Getter
3030
@Setter

docs/user/ppl/cmd/graphlookup.md

Lines changed: 53 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -8,51 +8,64 @@ The `graphLookup` command performs recursive graph traversal on a collection usi
88
The `graphLookup` command has the following syntax:
99

1010
```syntax
11-
graphLookup <lookupIndex> startField=<startField> fromField=<fromField> toField=<toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [direction=(uni | bi)] [supportArray=(true | false)] [batchMode=(true | false)] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>
11+
graphLookup <lookupIndex> start=<startField> edge=<fromField><operator><toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [supportArray=(true | false)] [batchMode=(true | false)] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>
1212
```
1313

1414
The following are examples of the `graphLookup` command syntax:
1515

1616
```syntax
17-
source = employees | graphLookup employees startField=reportsTo fromField=reportsTo toField=name as reportingHierarchy
18-
source = employees | graphLookup employees startField=reportsTo fromField=reportsTo toField=name maxDepth=2 as reportingHierarchy
19-
source = employees | graphLookup employees startField=reportsTo fromField=reportsTo toField=name depthField=level as reportingHierarchy
20-
source = employees | graphLookup employees startField=reportsTo fromField=reportsTo toField=name direction=bi as connections
21-
source = travelers | graphLookup airports startField=nearestAirport fromField=connects toField=airport supportArray=true as reachableAirports
22-
source = airports | graphLookup airports startField=airport fromField=connects toField=airport supportArray=true as reachableAirports
23-
source = employees | graphLookup employees startField=reportsTo fromField=reportsTo toField=name filter=(status = 'active' AND age > 18) as reportingHierarchy
17+
source = employees | graphLookup employees start=reportsTo edge=reportsTo-->name as reportingHierarchy
18+
source = employees | graphLookup employees start=reportsTo edge=reportsTo-->name maxDepth=2 as reportingHierarchy
19+
source = employees | graphLookup employees start=reportsTo edge=reportsTo-->name depthField=level as reportingHierarchy
20+
source = employees | graphLookup employees start=reportsTo edge=reportsTo<->name as connections
21+
source = travelers | graphLookup airports start=nearestAirport edge=connects-->airport supportArray=true as reachableAirports
22+
source = airports | graphLookup airports start=airport edge=connects-->airport supportArray=true as reachableAirports
23+
source = employees | graphLookup employees start=reportsTo edge=reportsTo-->name filter=(status = 'active' AND age > 18) as reportingHierarchy
2424
```
2525

2626
## Parameters
2727

2828
The `graphLookup` command supports the following parameters.
2929

30-
| Parameter | Required/Optional | Description |
31-
| --- | --- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
32-
| `<lookupIndex>` | Required | The name of the index to perform the graph traversal on. Can be the same as the source index for self-referential graphs. |
33-
| `startField=<startField>` | Required | The field in the source documents whose value is used to start the recursive search. The value of this field is matched against `toField` in the lookup index. We support both single value and array values as starting points. |
34-
| `fromField=<fromField>` | Required | The field in the lookup index documents that contains the value to recurse on. After matching a document, the value of this field is used to find the next set of documents. It supports both single value and array values. |
35-
| `toField=<toField>` | Required | The field in the lookup index documents to match against. Documents where `toField` equals the current traversal value are included in the results. |
36-
| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth of hops. Default is `0`. A value of `0` means only the direct connections to the statr values are returned. A value of `1` means 1 hop connections (initial match plus one recursive step), and so on. |
37-
| `depthField=<depthField>` | Optional | The name of the field to add to each traversed document indicating its recursion depth. If not specified, no depth field is added. Depth starts at `0` for the first level of matches. |
38-
| `direction=(uni \| bi)` | Optional | The traversal direction. `uni` (default) performs unidirectional traversal following edges in the forward direction only. `bi` performs bidirectional traversal, following edges in both directions. |
30+
| Parameter | Required/Optional | Description |
31+
|---|---|---|
32+
| `<lookupIndex>` | Required | The name of the index to perform the graph traversal on. Can be the same as the source index for self-referential graphs. |
33+
| `start=<startField>` | Required | The field in the source documents whose value is used to initiate the recursive search. The value of this field is matched against `toField` in the lookup index. Supports both single values and array values as starting points. |
34+
| `edge=<fromField><operator><toField>` | Required | Defines the traversal path between nodes, specifying the connection fields and the direction of traversal. See [Edge Sub-parameters](#edge-sub-parameters) below. |
35+
| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth (number of hops). Default is `0`. A value of `0` returns only direct connections to the start values. A value of `1` returns the initial matches plus one additional recursive step, and so on. |
36+
| `depthField=<depthField>` | Optional | The name of the field added to each traversed document to indicate its recursion depth. If not specified, no depth field is added. Depth starts at `0` for the first level of matches. |
3937
| `supportArray=(true \| false)` | Optional | When `true`, disables early visited-node filter pushdown to OpenSearch. Default is `false`. Set to `true` when `fromField` or `toField` contains array values to ensure correct traversal behavior. See [Array Field Handling](#array-field-handling) for details. |
4038
| `batchMode=(true \| false)` | Optional | When `true`, collects all start values from all source rows and performs a single unified BFS traversal. Default is `false`. The output changes to two arrays: `[Array<sourceRows>, Array<lookupResults>]`. See [Batch Mode](#batch-mode) for details. |
41-
| `usePIT=(true \| false)` | Optional | When `true`, enables PIT (Point In Time) search for the lookup table, allowing paginated retrieval of complete results without the `max_result_window` size limit. Default is `false`. See [PIT Search](#pit-search) for details. |
42-
| `filter=(<condition>)` | Optional | A filter condition to restrict which lookup table documents participate in the graph traversal. Only documents matching the condition are considered as candidates during BFS. Parentheses around the condition are required. Example: `filter=(status = 'active' AND age > 18)`. |
43-
| `as <outputField>` | Required | The name of the output array field that will contain all documents found during the graph traversal. |
39+
| `usePIT=(true \| false)` | Optional | When `true`, enables Point In Time (PIT) search for the lookup index, allowing paginated retrieval of complete results without the `max_result_window` size limit. Default is `false`. See [PIT Search](#pit-search) for details. |
40+
| `filter=(<condition>)` | Optional | A filter condition that restricts which lookup index documents participate in the graph traversal. Only documents matching the condition are considered as candidates during BFS. Parentheses around the condition are required. Example: `filter=(status = 'active' AND age > 18)`. |
41+
| `as <outputField>` | Required | The name of the output array field that will contain all documents discovered during the graph traversal. |
42+
43+
### Edge Sub-parameters
44+
45+
The `edge` parameter uses the syntax `edge=<fromField><operator><toField>` and consists of three components:
46+
47+
| Component | Description |
48+
|---|---|
49+
| `fromField` | The field in the lookup index documents used for recursion. After a document is matched, the value of this field is used to find the next set of connected documents. Supports both single values and array values. |
50+
| `toField` | The field in the lookup index documents to match against. Documents where `toField` equals the current traversal value are included in the results. |
51+
| `operator` | Specifies the direction of traversal. `-->` performs a **unidirectional** traversal from `fromField` to `toField` only. `<->` performs a **bidirectional** traversal between `fromField` and `toField`. |
52+
53+
**Examples:**
54+
55+
- **Unidirectional:** `edge=reportsTo-->name` — traverses from `reportsTo` to `name` in one direction only.
56+
- **Bidirectional:** `edge=reportsTo<->name` — traverses between `reportsTo` and `name` in both directions.
4457

4558
## How It Works
4659

4760
The `graphLookup` command performs a breadth-first search (BFS) traversal:
4861

49-
1. For each source document, extract the value of `startField`
62+
1. For each source document, extract the value of `start`
5063
2. Query the lookup index to find documents where `toField` matches the start value
5164
3. Add matched documents to the result array
5265
4. Extract `fromField` values from matched documents to continue traversal
5366
5. Repeat steps 2-4 until no new documents are found or `maxDepth` is reached
5467

55-
For bidirectional traversal (`direction=bi`), the algorithm also follows edges in the reverse direction by additionally matching `fromField` values.
68+
For bidirectional traversal (`<->`), the algorithm also follows edges in the reverse direction by additionally matching `fromField` values.
5669

5770
## Example 1: Employee Hierarchy Traversal
5871

@@ -72,9 +85,8 @@ The following query finds the reporting chain for each employee:
7285
```ppl ignore
7386
source = employees
7487
| graphLookup employees
75-
startField=reportsTo
76-
fromField=reportsTo
77-
toField=name
88+
start=reportsTo
89+
edge=reportsTo-->name
7890
as reportingHierarchy
7991
```
8092

@@ -102,9 +114,8 @@ The following query adds a depth field to track how many levels each manager is
102114
```ppl ignore
103115
source = employees
104116
| graphLookup employees
105-
startField=reportsTo
106-
fromField=reportsTo
107-
toField=name
117+
start=reportsTo
118+
edge=reportsTo-->name
108119
depthField=level
109120
as reportingHierarchy
110121
```
@@ -133,9 +144,8 @@ The following query limits traversal to 2 levels using `maxDepth=1`:
133144
```ppl ignore
134145
source = employees
135146
| graphLookup employees
136-
startField=reportsTo
137-
fromField=reportsTo
138-
toField=name
147+
start=reportsTo
148+
edge=reportsTo-->name
139149
maxDepth=1
140150
as reportingHierarchy
141151
```
@@ -174,9 +184,8 @@ The following query finds reachable airports from each airport:
174184
```ppl ignore
175185
source = airports
176186
| graphLookup airports
177-
startField=airport
178-
fromField=connects
179-
toField=airport
187+
start=airport
188+
edge=connects-->airport
180189
as reachableAirports
181190
```
182191

@@ -209,9 +218,8 @@ The following query finds reachable airports for each traveler:
209218
```ppl ignore
210219
source = travelers
211220
| graphLookup airports
212-
startField=nearestAirport
213-
fromField=connects
214-
toField=airport
221+
start=nearestAirport
222+
edge=connects-->airport
215223
as reachableAirports
216224
```
217225

@@ -235,10 +243,8 @@ The following query performs bidirectional traversal to find both managers and c
235243
source = employees
236244
| where name = 'Ron'
237245
| graphLookup employees
238-
startField=reportsTo
239-
fromField=reportsTo
240-
toField=name
241-
direction=bi
246+
start=reportsTo
247+
edge=reportsTo<->name
242248
as connections
243249
```
244250

@@ -279,9 +285,8 @@ Use `batchMode=true` when:
279285
```ppl ignore
280286
source = travelers
281287
| graphLookup airports
282-
startField=nearestAirport
283-
fromField=connects
284-
toField=airport
288+
start=nearestAirport
289+
edge=connects-->airport
285290
batchMode=true
286291
maxDepth=2
287292
as reachableAirports
@@ -324,9 +329,8 @@ Use `usePIT=true` when:
324329
```ppl ignore
325330
source = employees
326331
| graphLookup employees
327-
startField=reportsTo
328-
fromField=reportsTo
329-
toField=name
332+
start=reportsTo
333+
edge=reportsTo-->name
330334
usePIT=true
331335
as reportingHierarchy
332336
```
@@ -342,9 +346,8 @@ The following query traverses only active employees in the reporting hierarchy:
342346
```ppl ignore
343347
source = employees
344348
| graphLookup employees
345-
startField=reportsTo
346-
fromField=reportsTo
347-
toField=name
349+
start=reportsTo
350+
edge=reportsTo-->name
348351
filter=(status = 'active')
349352
as reportingHierarchy
350353
```

0 commit comments

Comments
 (0)