@@ -8,33 +8,33 @@ Feature: Pipeline tests using the planets dataset
88 Some validation of entity attributes is performed: SQL expressions and Python filter
99 functions are used, and templatable business rules feature in the transformations.
1010
11- # Scenario: Validate and filter planets (spark)
12- # Given I submit the planets file planets_demo.csv for processing
13- # And A spark pipeline is configured
14- # And I add initial audit entries for the submission
15- # Then the latest audit record for the submission is marked with processing status file_transformation
16- # When I run the file transformation phase
17- # Then the planets entity is stored as a parquet after the file_transformation phase
18- # And the latest audit record for the submission is marked with processing status data_contract
19- # When I run the data contract phase
20- # Then there is 1 record rejection from the data_contract phase
21- # And the planets entity is stored as a parquet after the data_contract phase
22- # And the latest audit record for the submission is marked with processing status business_rules
23- # When I run the business rules phase
24- # Then The rules restrict "planets" to 1 qualifying record
25- # And At least one row from "planets" has generated error code "HIGH_DENSITY"
26- # And At least one row from "planets" has generated error code "WEAK_ESCAPE"
27- # And the planets entity is stored as a parquet after the business_rules phase
28- # And the latest audit record for the submission is marked with processing status error_report
29- # When I run the error report phase
30- # Then An error report is produced
31- # And The entity "planets" does not contain an entry for "Jupiter" in column "planet"
32- # And The entity "planets" contains an entry for "Neptune" in column "planet"
33- # And The statistics entry for the submission shows the following information
34- # | parameter | value |
35- # | record_count | 9 |
36- # | number_record_rejections | 18 |
37- # | number_warnings | 0 |
11+ Scenario : Validate and filter planets (spark)
12+ Given I submit the planets file planets_demo.csv for processing
13+ And A spark pipeline is configured
14+ And I add initial audit entries for the submission
15+ Then the latest audit record for the submission is marked with processing status file_transformation
16+ When I run the file transformation phase
17+ Then the planets entity is stored as a parquet after the file_transformation phase
18+ And the latest audit record for the submission is marked with processing status data_contract
19+ When I run the data contract phase
20+ Then there is 1 record rejection from the data_contract phase
21+ And the planets entity is stored as a parquet after the data_contract phase
22+ And the latest audit record for the submission is marked with processing status business_rules
23+ When I run the business rules phase
24+ Then The rules restrict "planets" to 1 qualifying record
25+ And At least one row from "planets" has generated error code "HIGH_DENSITY"
26+ And At least one row from "planets" has generated error code "WEAK_ESCAPE"
27+ And the planets entity is stored as a parquet after the business_rules phase
28+ And the latest audit record for the submission is marked with processing status error_report
29+ When I run the error report phase
30+ Then An error report is produced
31+ And The entity "planets" does not contain an entry for "Jupiter" in column "planet"
32+ And The entity "planets" contains an entry for "Neptune" in column "planet"
33+ And The statistics entry for the submission shows the following information
34+ | parameter | value |
35+ | record_count | 9 |
36+ | number_record_rejections | 18 |
37+ | number_warnings | 0 |
3838
3939 Scenario : Handle a file with no extension provided (spark)
4040 Given I submit the planets file planets_no_extension for processing
0 commit comments