Un-ignore test by Rich-T-kid · Pull Request #452 · datafusion-contrib/datafusion-distributed

Rich-T-kid · 2026-05-13T03:55:56Z

see #451
color codes original ignored test

change log

Added PerTestConfig in src/test_utils/property_based.rs with two booleans:
- uses_undeterministic_limit_operator — LIMIT without ORDER BY (q17).
- non_deterministic_sort — ORDER BY with unbroken ties at the LIMIT boundary (q21, q22, q24, q31, q32).
- compare_result_set takes &PerTestConfig and short-circuits to Ok(()) when either flag is set, with a comment explaining that a strict set comparison would false-positive on SQL-legal answer variation.
- Each call site is now explicit; only flagged queries opt out.
- Threaded the same param through test_tpcds_query for symmetry; all tpcds correctness call sites pass PerTestConfig::default().

Clickbench correctness q3 : shifted precision back by one digit. see comment for details
q0 & q6 are still ignored because datafusions planner is smart enough to resolve the answer through statistics.

Still ignored (3)

Clickbench q42 : compare_ordering reports LexOrdering PartialEq inequality even though the Debug output is byte-identical on both sides. fields are missing that are needed to debug. the current diff shows identical structs even though the comparison fails
TPCDs q70 & q86 : GROUP BY rollup(s_state, s_county) with two grouping() calls in the projection. The trailing two projection columns swap positions between runs because the planner iterates a HashMap to figure out which columns the rank window function needs. That swap propagates upward through SortExec / SortPreservingMergeExec as @5/@6 index flips. these are very flaky and it flips between passing and failing
TPCDs q30 - this works locally & fails on the CI, it might make sense to change the query to match what was done in upstream datafusion I.E changing the query. Will wait for @gabotechs thoughts on this.

Rich-T-kid · 2026-05-13T04:36:48Z

seems like test_tpcds_shard03_q30 has a schema mismatch since the datasource that we are using uses c_last_review_date but the query expects c_last_review_date_sk. It seems datafusion renamed the column

Rich-T-kid · 2026-05-13T04:51:18Z

seems like the #[ignore = "reason"] is a bit outdated. Updating this. seems to be the exact same issue across multiple test & it has something to do with how snapshots are handled.

Rich-T-kid · 2026-05-13T04:54:52Z

datasource that we are using uses c_last_review_date but the query expects c_last_review_date_sk.
taking a look at the q30.sql file this seems to be true

-- SQLBench-DS query 30 derived from TPC-DS query 30 under the terms of the TPC Fair Use Policy.
-- TPC-DS queries are Copyright 2021 Transaction Processing Performance Council.
-- This query was generated at scale factor 1.
with customer_total_return as
 (select wr_returning_customer_sk as ctr_customer_sk
        ,ca_state as ctr_state, 
 	sum(wr_return_amt) as ctr_total_return
 from web_returns
     ,date_dim
     ,customer_address
 where wr_returned_date_sk = d_date_sk 
   and d_year =2000
   and wr_returning_addr_sk = ca_address_sk 
 group by wr_returning_customer_sk
         ,ca_state)
  select  c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
       ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
       ,c_last_review_date_sk,ctr_total_return
 from customer_total_return ctr1
     ,customer_address
     ,customer
 where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
 			  from customer_total_return ctr2 
                  	  where ctr1.ctr_state = ctr2.ctr_state)
       and ca_address_sk = c_current_addr_sk
       and ca_state = 'KS'
       and ctr1.ctr_customer_sk = c_customer_sk
 order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
                  ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
                  ,c_last_review_date_sk,ctr_total_return
 LIMIT 100;

spoke to claude about this and it might be a github caching issue. ill try invalidating the cache to see if it re-downloads the data. But besides q30 the rest of the issues are snapshot issues which I need to do more research on.

gabotechs · 2026-05-13T08:55:00Z

I think q30 is failing for something else, see this PR that just got merged upstream:

Chore: Fix TPC-DS schema/query (fixes q30 run) apache/datafusion#22086

Rich-T-kid · 2026-05-13T15:33:23Z


-SELECT "UserID", "SearchPhrase", COUNT(*) FROM hits GROUP BY "UserID", "SearchPhrase" LIMIT 10;
+--SELECT "UserID", "SearchPhrase", COUNT(*) FROM hits GROUP BY "UserID", "SearchPhrase" LIMIT 10; 
+-- previous query did not have an ORDER BY, so the results were non-deterministic. Adding ORDER BY to make the test deterministic.


@gabotechs what do you think of this approach? certain queries can return correct results while not being an exact match of vanilla data fusion. The most straightforward approach to me is to update queries to only have one correct answer. This avoids updating the testing logic everywhere.
I commented out the old query as a refrence, if this looks good ill update the remaining breaking test.

This queries are standard industry benchmarks, it might be better to stick to them, otherwise we would be testing a different thing.

Rich-T-kid · 2026-05-13T16:21:51Z

chaning the queries is not a valid solution since there are standardized. The main issue that comes to mind is that record batches aren't reaching the final coordinator node. Assuming the comparison logic itself is correct, if the rows aren't there to be re-ordered into a deterministic order, the comparison will always fail.
For example: take a query that returns 100 rows, run it twice, once on vanilla DF and once on DDF. Since there's no ORDER BY, the engine can decide how to apply the LIMIT and either result is correct. The problem when we go to order the final results is that the data may have already been filtered before the final record batch was produced, which is likely what's causing the mismatches we're seeing.

fix

the initial approach I want to try is looking at the raw output of both queries & then looking at the raw output after applying a limit 10.
dig into the comparison logic. this is assuming that the produced record batch from the queries contain the same rows from step 1 if not comparing batches that include different rows will always be wrong.
double check how vanilla df handles Limit's. does it have free range to select any rows? since DDF is backed by vanilla DF it should do the exact same processing, but if vanila DF allows for non-deterministic limits we may need to change the approach

Rich-T-kid · 2026-05-13T17:22:03Z

did steps 1 &2 and it seems like this comes down to the engines picking two different 10-row slices. Going to add a flag to comparison functions that check for the limit operator. if it exist we cannot guarantee that the results are the exact same.this boils down to skipping the detailed_batch_comparison() call while keeping the row count check & schema validation.

Rich-T-kid · 2026-05-13T18:55:04Z

+    // to correctly return different result sets (e.g. LIMIT without ORDER BY, or ORDER BY
+    // with unbroken ties at the LIMIT boundary). Running the full comparison on these
+    // queries would surface false positives even though both engines are right.
+    if config.uses_undeterministic_limit_operator || config.non_deterministic_sort {


this has the draw back of not actually checking if the results are correct/identical, but I don't see another way of handling this. the record batches returned by run(ctx,query) has already filter/truncated the results.

…gful checks for nondeterministic results" Revert "Rename file to read-me" fix remaining tpcds_correctness test revised ignore reasons fix 3, ignore 1 revert broken ci test re-run 30 on CI revised querys instead of test logic fix limit tests 4 test remain resolves 30, 42,70,86 remain

Rich-T-kid · 2026-05-13T19:09:02Z

+    // to correctly return different result sets (e.g. LIMIT without ORDER BY, or ORDER BY
+    // with unbroken ties at the LIMIT boundary). Running the full comparison on these
+    // queries would surface false positives even though both engines are right.
+    if config.uses_undeterministic_limit_operator || config.non_deterministic_sort {


moved this function a little lower so that it runs after a schema check and a row count check

Rich-T-kid · 2026-05-13T19:10:52Z

@gabotechs this is good to review now 😃

Rename file to read-me

bee3a37

Rich-T-kid force-pushed the rich-T-kid/un-ignore-test branch from 54cae02 to 7ec7dd9 Compare May 13, 2026 04:19

Rich-T-kid commented May 13, 2026

View reviewed changes

Rich-T-kid force-pushed the rich-T-kid/un-ignore-test branch 3 times, most recently from 83663c6 to 152d8b3 Compare May 13, 2026 18:51

Rich-T-kid commented May 13, 2026

View reviewed changes

Rich-T-kid force-pushed the rich-T-kid/un-ignore-test branch from 152d8b3 to a20e110 Compare May 13, 2026 18:58

Rich-T-kid force-pushed the rich-T-kid/un-ignore-test branch from a20e110 to e3a2e84 Compare May 13, 2026 19:07

Rich-T-kid commented May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Un-ignore test#452

Un-ignore test#452
Rich-T-kid wants to merge 2 commits into
datafusion-contrib:mainfrom
Rich-T-kid:rich-T-kid/un-ignore-test

Rich-T-kid commented May 13, 2026 •

edited

Loading

Uh oh!

Rich-T-kid commented May 13, 2026 •

edited

Loading

Uh oh!

Rich-T-kid commented May 13, 2026

Uh oh!

Rich-T-kid commented May 13, 2026

Uh oh!

gabotechs commented May 13, 2026

Uh oh!

Rich-T-kid May 13, 2026

Uh oh!

gabotechs May 13, 2026

Uh oh!

Rich-T-kid commented May 13, 2026 •

edited

Loading

Uh oh!

Rich-T-kid commented May 13, 2026

Uh oh!

Rich-T-kid May 13, 2026 •

edited

Loading

Uh oh!

Rich-T-kid May 13, 2026

Uh oh!

Rich-T-kid commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Rich-T-kid commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

change log

Still ignored (3)

Uh oh!

Rich-T-kid commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rich-T-kid commented May 13, 2026

Uh oh!

Rich-T-kid commented May 13, 2026

Uh oh!

gabotechs commented May 13, 2026

Uh oh!

Rich-T-kid May 13, 2026

Choose a reason for hiding this comment

Uh oh!

gabotechs May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Rich-T-kid commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

fix

Uh oh!

Rich-T-kid commented May 13, 2026

Uh oh!

Rich-T-kid May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rich-T-kid May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Rich-T-kid commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rich-T-kid commented May 13, 2026 •

edited

Loading

Rich-T-kid commented May 13, 2026 •

edited

Loading

Rich-T-kid commented May 13, 2026 •

edited

Loading

Rich-T-kid May 13, 2026 •

edited

Loading