Skip to content

Commit fe95703

Browse files
authored
[Feature] Add grammar bundle generation API for PPL language features (#5162)
* initial commit for grammar API Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * remove unused etag Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * cleanup on unused code Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * cleanup on comments and debug logging Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * modify tests Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * a few mode cleanup Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Read ANTLR version from runtime Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * add antlr version Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * spotless fix Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Address PR review: fix hash truncation, antlrVersion API, immutability, and tests - Hash full 32-bit ints in grammarHash to avoid collisions with ANTLR 4.13.2 ATN serialization - Use RuntimeMetaData.getRuntimeVersion() instead of unreliable JAR manifest lookup - Make GrammarBundle immutable with @value instead of @DaTa - Update THIRD-PARTY to reflect ANTLR 4.13.2 - Harden tests with JSON parsing and add antlrVersion assertion Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Mark grammar API as experimental Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Address review: ATN v4 guard, startRuleIndex by name, test hardening - Assert ATN serialization version 4 for both lexer and parser to enforce antlr4ng compatibility contract - Resolve startRuleIndex by looking up "root" rule name instead of hardcoding 0 - Fix MockRestChannel.bytesOutput() to return real BytesStreamOutput - Document nullable elements in literalNames/symbolicNames Javadoc - Rename test methods to follow testXxx() convention per ppl/plugin modules Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Reduce invalidateCache() visibility from public to protected Consistent with buildBundle() which is also @VisibleForTesting protected. Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * add more necessary fields Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * adjusting ignore token set to be lexical/internal only Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * addressed fix-now comments Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * fix test duplication Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Polish grammar bundle builder and stabilize grammar endpoint doctest Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * address issue: transport action wrapper Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Refactor PPL grammar bundle loading to static holder singleton Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Revert grammar endpoint doc example to clean JSON format Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Fix typo renameClasue in grammar bundle builder Signed-off-by: Eric Wei <mengwei.eric@gmail.com> --------- Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
1 parent 734394d commit fe95703

13 files changed

Lines changed: 1242 additions & 5 deletions

File tree

THIRD-PARTY

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -467,15 +467,15 @@ DAMAGE.
467467

468468
------
469469

470-
** ANTLR; version 4.7.1 -- https://github.com/antlr/antlr4
470+
** ANTLR; version 4.13.2 -- https://github.com/antlr/antlr4
471471
/*
472-
* Copyright (c) 2012-2017 The ANTLR Project. All rights reserved.
472+
* Copyright (c) 2012-2024 The ANTLR Project. All rights reserved.
473473
* Use of this file is governed by the BSD 3-clause license that
474474
* can be found in the LICENSE.txt file in the project root.
475475
*/
476476

477477
[The "BSD 3-clause license"]
478-
Copyright (c) 2012-2017 The ANTLR Project. All rights reserved.
478+
Copyright (c) 2012-2024 The ANTLR Project. All rights reserved.
479479

480480
Redistribution and use in source and binary forms, with or without
481481
modification, are permitted provided that the following conditions

docs/user/ppl/interfaces/endpoint.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,3 +293,35 @@ Exceeding these limits returns an error.
293293
- In the simple array format, `["*"]` highlights all fields. Specific field names like `["firstname", "lastname"]` scope highlighting to those fields only.
294294
- In the object format, each key in the `fields` object is a field name or wildcard. Each value is an object of per-field highlight options. Supported per-field options: `fragment_size`, `number_of_fragments`, `type` (`plain`, `unified`, `fvh`), `pre_tags`, `post_tags`, `require_field_match`, `no_match_size`, `order`. Use `{}` for defaults. Example: `{"title": {"fragment_size": 200}, "body": {"type": "plain"}}`.
295295
- Highlights may include fields that are not explicitly projected in the other columns. For example, using `{"*": {}}` highlights all fields that matched the search query, including fields not selected by `| fields`. In the example above, the `address` field appears in `_highlight` because it contains a match ("880 Holmes Lane") even though only `account_number`, `firstname`, and `lastname` are projected as separate columns.
296+
297+
## Grammar (Experimental)
298+
299+
### Description
300+
301+
You can send an HTTP GET request to endpoint **/_plugins/_ppl/_grammar** to fetch serialized PPL grammar metadata used by autocomplete clients.
302+
303+
### Example
304+
305+
```bash ppl ignore
306+
curl -sS -X GET localhost:9200/_plugins/_ppl/_grammar
307+
```
308+
309+
Expected output (trimmed):
310+
311+
```json
312+
{
313+
"bundleVersion": "1.0",
314+
"antlrVersion": "4.13.2",
315+
"grammarHash": "sha256:...",
316+
"startRuleIndex": 0,
317+
"lexerSerializedATN": [4, ...],
318+
"parserSerializedATN": [4, ...],
319+
"lexerRuleNames": ["SEARCH", "..."],
320+
"parserRuleNames": ["root", "..."],
321+
"literalNames": [null, "'SEARCH'", "..."],
322+
"symbolicNames": [null, "SEARCH", "..."],
323+
"tokenDictionary": {"PIPE": 196, "...": 0},
324+
"ignoredTokens": [472, 473, "..."],
325+
"rulesToVisit": [200, 201, "..."]
326+
}
327+
```

integ-test/src/test/java/org/opensearch/sql/security/PPLPermissionsIT.java

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,23 @@ private JSONObject executeQueryAsUser(String query, String username) throws IOEx
339339
return new JSONObject(org.opensearch.sql.legacy.TestUtils.getResponseBody(response, true));
340340
}
341341

342+
/** Executes a grammar metadata request as a specific user with basic authentication. */
343+
private JSONObject executeGrammarAsUser(String username) throws IOException {
344+
Request request = new Request("GET", "/_plugins/_ppl/_grammar");
345+
346+
RequestOptions.Builder restOptionsBuilder = RequestOptions.DEFAULT.toBuilder();
347+
restOptionsBuilder.addHeader(
348+
"Authorization",
349+
"Basic "
350+
+ java.util.Base64.getEncoder()
351+
.encodeToString((username + ":" + STRONG_PASSWORD).getBytes()));
352+
request.setOptions(restOptionsBuilder);
353+
354+
Response response = client().performRequest(request);
355+
assertEquals(200, response.getStatusLine().getStatusCode());
356+
return new JSONObject(org.opensearch.sql.legacy.TestUtils.getResponseBody(response, true));
357+
}
358+
342359
@Test
343360
public void testUserWithBankPermissionCanAccessBankIndex() throws IOException {
344361
// Test that bank_user can access bank index - this should work with the fix
@@ -512,6 +529,32 @@ public void testBankUserWithEvalCommand() throws IOException {
512529
verifyColumn(result, columnName("full_name"));
513530
}
514531

532+
@Test
533+
public void testUserWithPPLPermissionCanAccessGrammarEndpoint() throws IOException {
534+
JSONObject result = executeGrammarAsUser(BANK_USER);
535+
assertTrue(result.has("bundleVersion"));
536+
assertTrue(result.has("antlrVersion"));
537+
assertTrue(result.has("grammarHash"));
538+
assertTrue(result.has("tokenDictionary"));
539+
}
540+
541+
@Test
542+
public void testUserWithoutPPLPermissionCannotAccessGrammarEndpoint() throws IOException {
543+
try {
544+
executeGrammarAsUser(NO_PPL_USER);
545+
fail("Expected security exception for user without PPL permission");
546+
} catch (ResponseException e) {
547+
assertEquals(403, e.getResponse().getStatusLine().getStatusCode());
548+
String responseBody =
549+
org.opensearch.sql.legacy.TestUtils.getResponseBody(e.getResponse(), false);
550+
assertTrue(
551+
"Response should contain permission error message",
552+
responseBody.contains("no permissions")
553+
|| responseBody.contains("Forbidden")
554+
|| responseBody.contains("cluster:admin/opensearch/ppl"));
555+
}
556+
}
557+
515558
// Negative test cases for missing permissions
516559

517560
@Test
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"ppl.grammar": {
3+
"documentation": {
4+
"url": "https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/interfaces/endpoint.md",
5+
"description": "PPL Grammar Endpoint for Autocomplete"
6+
},
7+
"stability": "experimental",
8+
"url": {
9+
"paths": [
10+
{
11+
"path": "/_plugins/_ppl/_grammar",
12+
"methods": ["GET"]
13+
}
14+
]
15+
},
16+
"params": {}
17+
}
18+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
"PPL grammar endpoint returns expected response shape":
3+
- do:
4+
ppl.grammar: {}
5+
- is_true: bundleVersion
6+
- is_true: antlrVersion
7+
- is_true: grammarHash
8+
- match: {startRuleIndex: 0}
9+
- gt: {lexerSerializedATN.0: 0}
10+
- is_true: lexerRuleNames.0
11+
- is_true: channelNames.0
12+
- is_true: modeNames.0
13+
- gt: {parserSerializedATN.0: 0}
14+
- is_true: parserRuleNames.0
15+
- is_true: symbolicNames.1
16+
- is_true: tokenDictionary.PIPE
17+
- gt: {ignoredTokens.0: 0}
18+
- gt: {rulesToVisit.0: 0}

plugin/src/main/java/org/opensearch/sql/plugin/SQLPlugin.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@
9494
import org.opensearch.sql.opensearch.storage.OpenSearchDataSourceFactory;
9595
import org.opensearch.sql.opensearch.storage.script.CompoundedScriptEngine;
9696
import org.opensearch.sql.plugin.config.OpenSearchPluginModule;
97+
import org.opensearch.sql.plugin.rest.RestPPLGrammarAction;
9798
import org.opensearch.sql.plugin.rest.RestPPLQueryAction;
9899
import org.opensearch.sql.plugin.rest.RestPPLStatsAction;
99100
import org.opensearch.sql.plugin.rest.RestQuerySettingsAction;
@@ -163,6 +164,7 @@ public List<RestHandler> getRestHandlers(
163164

164165
return Arrays.asList(
165166
new RestPPLQueryAction(),
167+
new RestPPLGrammarAction(),
166168
new RestSqlAction(settings, injector),
167169
new RestSqlStatsAction(settings, restController),
168170
new RestPPLStatsAction(settings, restController),
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
package org.opensearch.sql.plugin.rest;
7+
8+
import static org.opensearch.rest.RestRequest.Method.GET;
9+
10+
import com.google.common.annotations.VisibleForTesting;
11+
import com.google.common.collect.ImmutableList;
12+
import java.io.IOException;
13+
import java.util.List;
14+
import lombok.extern.log4j.Log4j2;
15+
import org.opensearch.common.annotation.ExperimentalApi;
16+
import org.opensearch.core.action.ActionListener;
17+
import org.opensearch.core.rest.RestStatus;
18+
import org.opensearch.core.xcontent.XContentBuilder;
19+
import org.opensearch.rest.BaseRestHandler;
20+
import org.opensearch.rest.BytesRestResponse;
21+
import org.opensearch.rest.RestChannel;
22+
import org.opensearch.rest.RestRequest;
23+
import org.opensearch.sql.plugin.transport.PPLQueryAction;
24+
import org.opensearch.sql.plugin.transport.TransportPPLQueryRequest;
25+
import org.opensearch.sql.plugin.transport.TransportPPLQueryResponse;
26+
import org.opensearch.sql.ppl.autocomplete.GrammarBundle;
27+
import org.opensearch.sql.ppl.autocomplete.PPLGrammarBundleBuilder;
28+
import org.opensearch.transport.client.node.NodeClient;
29+
30+
/*
31+
* REST handler for {@code GET /_plugins/_ppl/_grammar}.
32+
*
33+
* @opensearch.experimental
34+
*/
35+
@ExperimentalApi
36+
@Log4j2
37+
public class RestPPLGrammarAction extends BaseRestHandler {
38+
39+
private static final String ENDPOINT_PATH = "/_plugins/_ppl/_grammar";
40+
41+
@Override
42+
public String getName() {
43+
return "ppl_grammar_action";
44+
}
45+
46+
@Override
47+
public List<Route> routes() {
48+
return ImmutableList.of(new Route(GET, ENDPOINT_PATH));
49+
}
50+
51+
@Override
52+
protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient client)
53+
throws IOException {
54+
55+
return channel -> {
56+
try {
57+
authorizeRequest(
58+
client,
59+
new ActionListener<>() {
60+
@Override
61+
public void onResponse(TransportPPLQueryResponse ignored) {
62+
try {
63+
GrammarBundle bundle = getBundle();
64+
XContentBuilder builder = channel.newBuilder();
65+
serializeBundle(builder, bundle);
66+
channel.sendResponse(new BytesRestResponse(RestStatus.OK, builder));
67+
} catch (Exception e) {
68+
log.error("Error building or serializing PPL grammar", e);
69+
sendErrorResponse(channel, e);
70+
}
71+
}
72+
73+
@Override
74+
public void onFailure(Exception e) {
75+
log.error("PPL grammar authorization failed", e);
76+
sendErrorResponse(channel, e);
77+
}
78+
});
79+
} catch (Exception e) {
80+
log.error("Error authorizing PPL grammar request", e);
81+
sendErrorResponse(channel, e);
82+
}
83+
};
84+
}
85+
86+
@VisibleForTesting
87+
protected void authorizeRequest(
88+
NodeClient client, ActionListener<TransportPPLQueryResponse> listener) {
89+
client.execute(
90+
PPLQueryAction.INSTANCE, new TransportPPLQueryRequest("", null, ENDPOINT_PATH), listener);
91+
}
92+
93+
private void sendErrorResponse(RestChannel channel, Exception e) {
94+
try {
95+
channel.sendResponse(new BytesRestResponse(channel, e));
96+
} catch (IOException ioException) {
97+
log.error("Failed to send PPL grammar error response", ioException);
98+
}
99+
}
100+
101+
/** Gets the grammar bundle. Override in tests to inject a custom or failing bundle provider. */
102+
@VisibleForTesting
103+
protected GrammarBundle getBundle() {
104+
return PPLGrammarBundleBuilder.getBundle();
105+
}
106+
107+
private void serializeBundle(XContentBuilder builder, GrammarBundle bundle) throws IOException {
108+
builder.startObject();
109+
110+
// Identity & versioning
111+
builder.field("bundleVersion", bundle.getBundleVersion());
112+
builder.field("antlrVersion", bundle.getAntlrVersion());
113+
builder.field("grammarHash", bundle.getGrammarHash());
114+
builder.field("startRuleIndex", bundle.getStartRuleIndex());
115+
116+
// Lexer ATN & metadata
117+
builder.field("lexerSerializedATN", bundle.getLexerSerializedATN());
118+
builder.field("lexerRuleNames", bundle.getLexerRuleNames());
119+
builder.field("channelNames", bundle.getChannelNames());
120+
builder.field("modeNames", bundle.getModeNames());
121+
122+
// Parser ATN & metadata
123+
builder.field("parserSerializedATN", bundle.getParserSerializedATN());
124+
builder.field("parserRuleNames", bundle.getParserRuleNames());
125+
126+
// Vocabulary
127+
builder.field("literalNames", bundle.getLiteralNames());
128+
builder.field("symbolicNames", bundle.getSymbolicNames());
129+
130+
// Autocomplete configuration
131+
builder.field("tokenDictionary", bundle.getTokenDictionary());
132+
builder.field("ignoredTokens", bundle.getIgnoredTokens());
133+
builder.field("rulesToVisit", bundle.getRulesToVisit());
134+
135+
builder.endObject();
136+
}
137+
}

plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryAction.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,13 +100,21 @@ protected void doExecute(
100100
+ " false"));
101101
return;
102102
}
103+
104+
TransportPPLQueryRequest transportRequest = TransportPPLQueryRequest.fromActionRequest(request);
105+
if (transportRequest.isGrammarRequest()) {
106+
// Authorization is enforced by this transport action before returning grammar metadata in
107+
// REST.
108+
listener.onResponse(new TransportPPLQueryResponse("{}"));
109+
return;
110+
}
111+
103112
Metrics.getInstance().getNumericalMetric(MetricName.PPL_REQ_TOTAL).increment();
104113
Metrics.getInstance().getNumericalMetric(MetricName.PPL_REQ_COUNT_TOTAL).increment();
105114

106115
QueryContext.addRequestId();
107116

108117
PPLService pplService = injector.getInstance(PPLService.class);
109-
TransportPPLQueryRequest transportRequest = TransportPPLQueryRequest.fromActionRequest(request);
110118
// in order to use PPL service, we need to convert TransportPPLQueryRequest to PPLQueryRequest
111119
PPLQueryRequest transformedRequest = transportRequest.toPPLQueryRequest();
112120
QueryContext.setProfile(transformedRequest.profile());

plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryRequest.java

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,16 @@ public String getRequest() {
119119
* @return true if it is an explain request
120120
*/
121121
public boolean isExplainRequest() {
122-
return path.endsWith("/_explain");
122+
return path != null && path.endsWith("/_explain");
123+
}
124+
125+
/**
126+
* Check if request is for grammar metadata endpoint.
127+
*
128+
* @return true if it is a grammar metadata request
129+
*/
130+
public boolean isGrammarRequest() {
131+
return path != null && path.endsWith("/_grammar");
123132
}
124133

125134
/** Decide on the formatter by the requested format. */

0 commit comments

Comments
 (0)