You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user_guidance/auditing.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,16 +3,17 @@ tags:
3
3
- Auditing
4
4
---
5
5
6
-
The Auditing objects within the DVE are used to help control and store information about a given submission and what stage it's currently at. In addition to the above, it's also used to store statistics about the submission and the number of validations it has triggered etc. So, for users not interested in using the Error reports stage, you could source information directly from the audit tables.
6
+
The Auditing objects within the DVE are used to help control and store information about submitted data and what stage it's currently at. In addition to the above, it's also used to store statistics about the submission and the number of validations it has triggered etc. So, for users not interested in using the Error reports stage, you could source information directly from the audit tables.
7
7
8
8
## Audit Tables
9
+
9
10
Currently, these are the audit tables that can be accessed within the DVE:
10
11
11
12
| Table Name | Purpose |
12
13
| --------------------- | ------- |
13
-
| processing_status | Contains information about the submission and what the current processing status is. |
14
-
| submission_info | Contains information about the submitted file. |
15
-
| submission_statistics | Contains validation statistics for each submission. |
14
+
|`processing_status`| Contains information about the submission and what the current processing status is. |
15
+
|`submission_info`| Contains information about the submitted file. |
16
+
|`submission_statistics`| Contains validation statistics for each submission. |
Copy file name to clipboardExpand all lines: docs/user_guidance/business_rules.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,9 @@
2
2
title: Business Rules
3
3
tags:
4
4
- Business Rules
5
+
- dischema
6
+
- Rule Store
7
+
- Reference Data
5
8
---
6
9
7
10
The Business Rules section contain the rules you want to apply to your dataset. Rule logic might include...
@@ -14,7 +17,7 @@ All rules are written in `SQL`. Depending on which [backend implementation](./im
14
17
15
18
When writing the rules, you need to be aware that the expressions are wrapped in `NOT` expression. So, you should write the rules as though you are looking for non problematic values.
16
19
17
-
When rules are being applied, [Complex Rules](./business_rules.md#complex-rules) are always applied before [Rules](./business_rules.md#rules) and [Filters](./business_rules.md#filters)
20
+
When rules are being applied, [Complex Rules](./business_rules.md#complex-rules) are always applied before [Rules](./business_rules.md#rules) and [Filters](./business_rules.md#filters).
18
21
19
22
This page is meant to give you greater details on how you can write your Business Rules. If you want a summary of how the Business Rules work, then please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
20
23
@@ -125,7 +128,7 @@ If you need to perform more complex rules, with pre-steps, then see the [Complex
125
128
126
129
### Types of rejections
127
130
128
-
You may have noticed the three type of rejections in the example above. For any given rule you can reject a record, the whole file (submission) or just raise a warning. More details in table below:
131
+
You may have noticed the field "failure_type" in the example above. For any given rule (filter) you can reject a record, the whole file (submission) or just raise a warning. Here are the details around the currently supported Rejection Types:
129
132
130
133
| Rejection Type | Behaviour | How to set in the rule |
Copy file name to clipboardExpand all lines: docs/user_guidance/data_contract.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ tags:
6
6
- Domain Types
7
7
---
8
8
9
-
The Data Contract describes the structure (models) of your data and controls how the data should be typecasted. We use [Pydantic](https://docs.pydantic.dev/1.10/) to generate and validate the models. This page is meant to give you greater details on how you should write your Data Contract. If you want a summary of how the Data Contract works, please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
9
+
The Data Contract defines the structure (models) of your data and controls how it is typecast. We use [Pydantic](https://docs.pydantic.dev/1.10/) to generate and validate the models. This page is meant to give you greater details on how you should write your Data Contract. If you want a summary of how the Data Contract works, please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
Copy file name to clipboardExpand all lines: docs/user_guidance/file_transformation.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -69,7 +69,7 @@ The File Transformation stage within the DVE is used to convert submitted files
69
69
}
70
70
```
71
71
72
-
The secondary use of the File Transformation stage is the ability to normalise your data into multiple entities. Imagine you had something like Hospital and Patient data in a single submission. You could split this out into seperate entities so that the validated outputs of the data could be loaded into seperate tables. For example:
72
+
The secondary use of the File Transformation stage is the ability to normalise your data into multiple entities. Imagine you had something like Hospital and Patient data in a single submission. You could split this out into seperate entities so that the validated outputs of the data could be loaded into seperate tables (parquet). For example:
Copy file name to clipboardExpand all lines: docs/user_guidance/getting_started.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ tags:
9
9
10
10
## Rules Configuration Introduction
11
11
12
-
To use the DVE you will need to create a dischema document. The dischema document describes how the DVE should validate your data. It's divided into two primary parts. The first part is the `contract` (data contract) - this describes the structure of your data and controls how the data should be typecasted. For example, here is a dischema document describing how the DVE might validate data about a movies dataset:
12
+
To use the DVE you will need to create a dischema document. The dischema document describes how the DVE should validate your data. It's divided into two primary parts. The first part is the `contract` (data contract) - this defines the structure of your data and determines how it is modeled and typecast. For example, here is a dischema document describing how the DVE may validate data about a movies:
13
13
14
14
!!! example "Example `movies.dischema.json`"
15
15
@@ -72,7 +72,7 @@ For each dataset definition, you will need to provide a `reader_config` which de
72
72
73
73
To learn more about how you can construct your Data Contract please read [here](data_contract.md).
74
74
75
-
The second part of the dischema are the `business_rules`*or*`tranformations`. This section describes the validation rules you want to apply to entities defined within the `contract`. For example, with our `movies` dataset above, we may want to check that movies in this dataset are less than 4 hours long. The expression to write this check is written in SQL and that syntax may change slightly depending on the SQL backend you've choosen (we currently support [DuckDB](implementations/duckdb.md) and [Spark SQL](implementations/spark.md)).
75
+
The second part of the dischema are the `tranformations` (business_rules). This section describes the validation rules you want to apply to entities defined within the `contract`. For example, with our `movies` dataset above, we may want to check that movies in this dataset are less than 4 hours long. The expression to write this check is written in SQL and that syntax may change slightly depending on the SQL backend you've choosen (we currently support [DuckDB](implementations/duckdb.md) and [Spark SQL](implementations/spark.md)).
76
76
!!! example "Example `movies.dischema.json`"
77
77
78
78
```json
@@ -90,7 +90,7 @@ The second part of the dischema are the `business_rules` *or* `tranformations`.
90
90
}
91
91
}
92
92
```
93
-
You may look at the expression above and think "Hang on! That's the opposite of what you want! You're only getting movies less than 4 hours!", however, all validation rules are wrapped inside a `NOT`expression. So, you write the rules as though you are looking for non problematic values.
93
+
You may look at the expression above and think "Hang on! That's the opposite of what you want! You're only getting movies less than 4 hours!", __however, all validation rules are wrapped inside a `NOT`expression__. So, you write the rules as though you are looking for non problematic values.
94
94
95
95
We also offer a feature called `complex_rules`. These are rules where you need to transform the data before you can apply the rule. For instance, you may want to perform a join, aggregate the data, or perform a filter. The complex rules allow you to combine "pre-steps" before you perform the validation.
The connection passed into the `DuckDBRefDataLoader` object will then be able use various DuckDB readers to load data from an existing table on the connection OR loading data from reference data persisted in either `parquet` or `pyarrow` format.
114
+
The connection passed into the `DuckDBRefDataLoader` object will then be able to use various DuckDB readers to load data from an existing table on the connection OR loading data from reference data persisted in either `parquet` or `pyarrow` format.
115
115
116
116
If you want to learn more about the reference data loaders, you can view the advanced user guidance [here](../../advanced_guidance/package_documentation/refdata_loaders.md).
0 commit comments