fix(query): clarify condition resolution semantics for label queries by contrueCT · Pull Request #2994 · apache/hugegraph

contrueCT · 2026-04-13T11:15:30Z

Purpose of the PR

close [Improve]: clarify ConditionQuery.condition() semantics for missing, conflicting, and multi-value conditions #2992

ConditionQuery.condition() currently mixes several different meanings in one API, including:

no condition
conflicting conditions resolved to empty
a unique resolved value
a raw multi-value result
an exception for ambiguous resolved values

This PR keeps the legacy condition() behavior unchanged, adds explicit condition-resolution APIs, and migrates the high-risk LABEL call sites to use the clearer semantics.

Main Changes

Add explicit condition-resolution APIs to ConditionQuery
- containsCondition(Object key)
- conditionValues(Object key)
- conditionValue(Object key)
Keep the legacy condition() method backward-compatible
Document the semantic differences between the legacy API and the new explicit APIs
Migrate LABEL-related high-risk callers to the new APIs in:
- graph/index transactions
- serializers
- traversers
- in-memory / hstore paths
Preserve the old behavior for non-LABEL legacy usages in this first step

Verifying these changes

Added and extended regression coverage for the new semantics:

QueryTest#testConditionWithoutLabel
QueryTest#testConditionWithEqAndIn
QueryTest#testConditionWithSingleInValues
QueryTest#testConditionWithConflictingEqAndIn
QueryTest#testConditionWithMultipleMatchedInValues

Added a targeted regression for the label-index fallback path:

VertexCoreTest#testCollectMatchedIndexesByJointLabelsWithIndexedProperties

This test verifies:

a multi-label query can conservatively fall back and still match the indexed label
conflicting label conditions produce no matched indexes

Existing label-query regressions were also rechecked to ensure no behavior regression:

EdgeCoreTest#testQueryInEdgesOfVertexByLabels
EdgeCoreTest#testQueryInEdgesOfVertexByConflictingLabels
EdgeCoreTest#testQueryInEdgesOfVertexBySortkey
VertexCoreTest#testQueryByJointLabels

Trivial rework / code cleanup without any test coverage. (No Need)
Already covered by existing tests, such as (please modify tests here).
Need tests and can be verified as follows:
- mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='QueryTest#testConditionWithoutLabel+testConditionWithEqAndIn+testConditionWithSingleInValues+testConditionWithConflictingEqAndIn+testConditionWithMultipleMatchedInValues' test
- mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='EdgeCoreTest#testQueryInEdgesOfVertexByLabels+testQueryInEdgesOfVertexByConflictingLabels+testQueryInEdgesOfVertexBySortkey' test
- mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='VertexCoreTest#testQueryByJointLabels+testCollectMatchedIndexesByJointLabelsWithIndexedProperties' test

Does this PR potentially affect the following parts?

Dependencies (add/update license info & regenerate_known_dependencies.sh)
Modify configurations
The public API
Other affects (typed here)
Nope

Documentation Status

Doc - TODO
Doc - Done
Doc - No Need

codecov · 2026-05-06T03:21:33Z

Codecov Report

❌ Patch coverage is 48.87218% with 136 lines in your changes missing coverage. Please review.
✅ Project coverage is 29.71%. Comparing base (1f61c48) to head (b4a135f).

Files with missing lines	Patch %	Lines
...he/hugegraph/backend/tx/GraphIndexTransaction.java	30.68%	116 Missing and 6 partials ⚠️
...apache/hugegraph/backend/query/ConditionQuery.java	92.42%	1 Missing and 4 partials ⚠️
...e/hugegraph/backend/serializer/TextSerializer.java	0.00%	5 Missing ⚠️
...g/apache/hugegraph/backend/store/ram/RamTable.java	0.00%	2 Missing ⚠️
...hugegraph/backend/serializer/BinarySerializer.java	90.90%	1 Missing ⚠️
...e/hugegraph/traversal/algorithm/HugeTraverser.java	0.00%	1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (1f61c48) and HEAD (b4a135f). Click for more details.

HEAD has 2 uploads less than BASE

Flag BASE (1f61c48) HEAD (b4a135f)

3 1

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2994      +/-   ##
============================================
- Coverage     36.11%   29.71%   -6.40%     
- Complexity      338      375      +37     
============================================
  Files           803      803              
  Lines         68234    68445     +211     
  Branches       8964     9003      +39     
============================================
- Hits          24640    20341    -4299     
- Misses        40936    45770    +4834     
+ Partials       2658     2334     -324

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

imbajin

I found one correctness issue in the latest revision. The CI failures were posted separately as a PR-level reminder.

CI/status checks are failing on the latest head (cc9af24929e42af1c90e1f55f3e60adc351e0318). Could you check the failed jobs before the next review round?

Failed checks include:

build-server (memory, 11): https://github.com/apache/hugegraph/actions/runs/26448131941/job/77861497015

imbajin

I don't see a clear blocking correctness issue in the latest head, and the previous LABEL-resolution comments look addressed. One remaining merge risk is that the latest checks are still red: hstore failed in VertexCoreTest#testQueryByDateProperty.

Since this PR also touches HstoreStore, could you rerun or clarify whether the hstore failure is an existing flaky/environment issue?

Add explicit condition resolution APIs to ConditionQuery while preserving the legacy condition() behavior. Introduce containsCondition(Object), conditionValues(Object), and conditionValue(Object) so callers can distinguish missing, empty, unique, and multi-value results without overloading null semantics. Migrate LABEL-specific consumers in graph/index transactions, serializers, traversers, and stores to use the new APIs for unique-label resolution and conservative fallback behavior. Extend QueryTest and VertexCoreTest to cover absent, conflicting, and multi-value label conditions as well as collectMatchedIndexes() behavior for multi-label and conflicting label queries.

contrueCT · 2026-06-05T18:37:11Z

Thanks for your patience. The hstore CI failure exposed an existing latent issue in hstore's range-index query path. For range-index scans with limit/paging, the upper layer assumed that backend scan results were globally ordered by the range-index key and that the returned page state could be reused as a HugeGraph range cursor. In hstore, multi-node/tablet scans can return entries in backend iterator order, and the page state is an internal storage cursor, so those assumptions may lead to unstable ordering or skipped results. This PR keeps the fix intentionally scoped: hstore range-index queries whose visible result depends on limit/offset/paging are sorted and sliced in the index layer, while unbounded scans still use the original streaming path to avoid disturbing count, joint-index, and cleanup paths. I think this is enough for the current PR, but the underlying hstore scan/page-state contract should be handled in a dedicated follow-up, ideally by defining whether range scans must be globally ordered and fixing the hstore iterator/page-state semantics at the storage-client layer.

imbajin

Blocking: yes. Summary: HStore range-index offset queries can skip too many sorted results. Evidence: static review of GraphIndexTransaction/query offset handling.

imbajin · 2026-06-06T17:33:34Z

+        ConditionQuery scanQuery = query.copy();
+        scanQuery.page(null);
+        scanQuery.limit(Query.NO_LIMIT);


‼️ Reset offset before the full sorted scan

Evidence: querySortedRangeIndexes() copies the original query, clears page, and changes only limit before calling super.query(scanQuery). For range() / offset queries, scanQuery still carries the original offset, so the backend iterator can skip those entries during the full scan. The returned BatchIdHolder still keeps the original query, and QueryList.IndexQuery.each() applies bindQuery.skipOffsetIfNeeded(ids) again to the sorted ids.

Impact: HStore range-index queries with offset/range can drop too many results after this fallback path is selected. Please reset the scan query offset before reading all range-index entries, then let the sorted holder apply the original offset once.

Suggested change

ConditionQuery scanQuery = query.copy();

scanQuery.page(null);

scanQuery.limit(Query.NO_LIMIT);

ConditionQuery scanQuery = query.copy();

scanQuery.page(null);

scanQuery.offset(0L);

scanQuery.limit(Query.NO_LIMIT);

Good catch, thanks. Fixed in 2df4802 by resetting scanQuery.offset(0L) before the full sorted scan, so the fallback reads the complete matched range first and lets the original query apply offset/limit once after sorting. I also added range-offset coverage to VertexCoreTest#testQueryByDateProperty for the double-skip case. Local checks passed: git diff --check, hugegraph-core compile, and VertexCoreTest#testQueryByDateProperty with the rocksdb core-test profile.

contrueCT · 2026-06-07T03:08:41Z

Thanks. I fixed this by resetting scanQuery.offset(0L) before the full sorted range-index scan, so the fallback now reads the complete matched range first and lets the original query apply offset/limit only once after sorting. I also added range-offset coverage to VertexCoreTest#testQueryByDateProperty to guard the double-skip case. Local checks passed with git diff --check, hugegraph-core compile, and VertexCoreTest#testQueryByDateProperty under the rocksdb core-test profile.

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 13, 2026