[Feature] Add efficient filtering (knn.filter) support for vectorSearch()#5331
Open
mengweieric wants to merge 17 commits intoopensearch-project:feature/vector-search-p0from
Conversation
- Enforce exactly one of k, max_distance, or min_score - Validate k is in [1, 10000] range - Add 6 tests: mutual exclusivity (3 combos), k too small, k too large, k boundary values (1 and 10000) Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
VectorSearchQueryBuilder now accepts options map and rejects pushDownLimit when LIMIT exceeds k. Radial modes (max_distance, min_score) have no LIMIT restriction. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Create VectorSearchIndexTest: 7 tests covering buildKnnQueryJson() for top-k, max_distance, min_score, nested fields, multi-element and single-element vectors, numeric option rendering - Add edge case tests to VectorSearchTableFunctionImplementationTest: NaN vector component, empty option key/value, negative k, NaN for max_distance and min_score (6 new tests) - Add VectorSearchQueryBuilderTest: min_score radial mode LIMIT, pushDownSort delegation to parent (2 new tests) - Extract buildKnnQueryJson() as package-private for direct testing Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Test too-many (5) and zero arguments paths in VectorSearchTableFunctionResolver to complement existing too-few (2) test. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Cap radial mode (max_distance/min_score) results at maxResultWindow to prevent unbounded result sets - Reject ORDER BY on non-_score fields and _score ASC in vectorSearch since knn results are naturally sorted by _score DESC - Add 12 integration tests: 4 _explain DSL shape verification tests and 8 validation error path tests Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Add multi-sort expression test: ORDER BY _score DESC, name ASC correctly rejects the non-_score field (VectorSearchQueryBuilderTest) - Add case-insensitive argument name lookup test to verify TABLE='x' resolves same as table='x' (Implementation test) - Add non-numeric option fallback test: verifies string options are quoted in JSON output (VectorSearchIndexTest) - Add 4 integration tests: ORDER BY _score DESC succeeds, ORDER BY non-score rejects, ORDER BY _score ASC rejects, LIMIT within k succeeds (VectorSearchIT, now 16 tests) Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
The base OpenSearchIndexScanQueryBuilder.pushDownSort() pushes sort.getCount() as a limit when non-zero. Our override validated _score DESC and returned true, but did not preserve this contract. SQL always sets count=0, so this was not reachable today, but PPL or future callers may set a non-zero count to combine sort+limit in one LogicalSort node. Preserve the behavior defensively. Add focused test: LogicalSort(count=7) with _score DESC verifies the count is pushed down as request size. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Unit test: compound AND predicate survives pushdown into bool.filter - Integration test: compound WHERE (term + range) produces bool query - Integration test: radial max_distance with WHERE produces bool query Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…SearchIndex Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…on in VectorSearchQueryBuilder Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…fficient mode Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…matting Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Radial search (max_distance or min_score) can return unbounded results. Add build-time validation that rejects radial queries without an explicit LIMIT clause, with a clear error message guiding the user. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
filter_type=post|efficientoption tovectorSearch()so WHERE clauses can be placed inside the knn clause (knn.filter) for efficient pre-filtering during ANN search, or outside asbool.filterfor post-filtering (default). Also adds mandatory LIMIT enforcement for radial search.What this PR adds
FilterType enum and option parsing
FilterTypeenum (POST,EFFICIENT) withfromString()validationfilter_typeadded to allowed option keys inVectorSearchTableFunctionImplementationfilter_typeis stripped from options before knn JSON generation — it's a SQL-layer directive, not a knn parameterEfficient filter pushdown
VectorSearchQueryBuilder.pushDownFilter()branches on filter type:POST(default): knn inbool.must+ WHERE inbool.filter(post-filtering)EFFICIENT: rebuilds knn query with WHERE embedded inknn.filtervia callbackFunction<QueryBuilder, QueryBuilder>callback keeps JSON serialization inVectorSearchIndexbuildKnnQueryJson()collapsed to accept optional filter JSON parameter — no duplicationBuild-time validation
build()override rejects explicitfilter_typewithout a pushdownable WHERE clausepostandefficient— specifying the directive without a filter is always an errorRadial search LIMIT requirement
max_distanceormin_score) without an explicitLIMITclause is rejected at build time with a clear error messagemaxResultWindowrowsEngine support
knn.filteris supported for lucene and faiss engines (HNSW, IVF). Engine compatibility is not validated by the SQL plugin — unsupported engines reject at execution time.SQL syntax
Test plan
./gradlew spotlessCheck— PASS./gradlew :opensearch:test— PASS./gradlew :integ-test:integTest -Dtests.class="*VectorSearchIT"— PASS (25 tests)