Un-ignore test#452
Conversation
54cae02 to
7ec7dd9
Compare
|
seems like |
|
seems like the #[ignore = "reason"] is a bit outdated. Updating this. seems to be the exact same issue across multiple test & it has something to do with how snapshots are handled. |
spoke to claude about this and it might be a github caching issue. ill try invalidating the cache to see if it re-downloads the data. But besides q30 the rest of the issues are snapshot issues which I need to do more research on. |
|
I think q30 is failing for something else, see this PR that just got merged upstream: |
|
|
||
| SELECT "UserID", "SearchPhrase", COUNT(*) FROM hits GROUP BY "UserID", "SearchPhrase" LIMIT 10; | ||
| --SELECT "UserID", "SearchPhrase", COUNT(*) FROM hits GROUP BY "UserID", "SearchPhrase" LIMIT 10; | ||
| -- previous query did not have an ORDER BY, so the results were non-deterministic. Adding ORDER BY to make the test deterministic. |
There was a problem hiding this comment.
@gabotechs what do you think of this approach? certain queries can return correct results while not being an exact match of vanilla data fusion. The most straightforward approach to me is to update queries to only have one correct answer. This avoids updating the testing logic everywhere.
I commented out the old query as a refrence, if this looks good ill update the remaining breaking test.
There was a problem hiding this comment.
This queries are standard industry benchmarks, it might be better to stick to them, otherwise we would be testing a different thing.
|
chaning the queries is not a valid solution since there are standardized. The main issue that comes to mind is that record batches aren't reaching the final coordinator node. Assuming the comparison logic itself is correct, if the rows aren't there to be re-ordered into a deterministic order, the comparison will always fail. fix
|
|
did steps 1 &2 and it seems like this comes down to the engines picking two different 10-row slices. Going to add a flag to comparison functions that check for the limit operator. if it exist we cannot guarantee that the results are the exact same.this boils down to skipping the |
83663c6 to
152d8b3
Compare
| // to correctly return different result sets (e.g. LIMIT without ORDER BY, or ORDER BY | ||
| // with unbroken ties at the LIMIT boundary). Running the full comparison on these | ||
| // queries would surface false positives even though both engines are right. | ||
| if config.uses_undeterministic_limit_operator || config.non_deterministic_sort { |
There was a problem hiding this comment.
this has the draw back of not actually checking if the results are correct/identical, but I don't see another way of handling this. the record batches returned by run(ctx,query) has already filter/truncated the results.
152d8b3 to
a20e110
Compare
…gful checks for nondeterministic results" Revert "Rename file to read-me" fix remaining tpcds_correctness test revised ignore reasons fix 3, ignore 1 revert broken ci test re-run 30 on CI revised querys instead of test logic fix limit tests 4 test remain resolves 30, 42,70,86 remain
a20e110 to
e3a2e84
Compare
| // to correctly return different result sets (e.g. LIMIT without ORDER BY, or ORDER BY | ||
| // with unbroken ties at the LIMIT boundary). Running the full comparison on these | ||
| // queries would surface false positives even though both engines are right. | ||
| if config.uses_undeterministic_limit_operator || config.non_deterministic_sort { |
There was a problem hiding this comment.
moved this function a little lower so that it runs after a schema check and a row count check
|
@gabotechs this is good to review now 😃 |
see #451


color codes original ignored test
change log
Clickbench correctness q3 : shifted precision back by one digit. see comment for details
q0 & q6 are still ignored because datafusions planner is smart enough to resolve the answer through statistics.
Still ignored (3)