Skip to content

Commit ebd7ef5

Browse files
docs: updated content and fixed auto code doc pathing
1 parent adfd5ed commit ebd7ef5

11 files changed

Lines changed: 48 additions & 39 deletions

File tree

docs/advanced_guidance/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
---
2+
title: Advanced Guidance
3+
---
4+
15
<div class="grid cards" markdown>
26

37
- :material-file-code:{ .lg .middle } __DVE Code Reference Documentation__

docs/advanced_guidance/package_documentation/auditing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11

2-
::: src.dve.core_engine.backends.base.auditing.BaseAuditingManager
2+
::: dve.core_engine.backends.base.auditing.BaseAuditingManager
33
options:
44
heading_level: 3
55
members:

docs/advanced_guidance/package_documentation/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
---
2+
title: Package Documentation
3+
---
4+
15
<div class="grid cards" markdown>
26

37
- :material-language-python:{ .lg .middle } __Pipeline__

docs/advanced_guidance/package_documentation/readers.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,35 +2,35 @@
22

33
=== "Base"
44

5-
::: src.dve.core_engine.backends.readers.csv.CSVFileReader
5+
::: dve.core_engine.backends.readers.csv.CSVFileReader
66
options:
77
heading_level: 3
88
merge_init_into_class: true
99
members: false
1010

1111
=== "DuckDB"
1212

13-
::: src.dve.core_engine.backends.implementations.duckdb.readers.csv.DuckDBCSVReader
13+
::: dve.core_engine.backends.implementations.duckdb.readers.csv.DuckDBCSVReader
1414
options:
1515
heading_level: 3
1616
members:
1717
- __init__
1818

19-
::: src.dve.core_engine.backends.implementations.duckdb.readers.csv.PolarsToDuckDBCSVReader
19+
::: dve.core_engine.backends.implementations.duckdb.readers.csv.PolarsToDuckDBCSVReader
2020
options:
2121
heading_level: 3
2222
members:
2323
- __init__
2424

25-
::: src.dve.core_engine.backends.implementations.duckdb.readers.csv.DuckDBCSVRepeatingHeaderReader
25+
::: dve.core_engine.backends.implementations.duckdb.readers.csv.DuckDBCSVRepeatingHeaderReader
2626
options:
2727
heading_level: 3
2828
members:
2929
- __init__
3030

3131
=== "Spark"
3232

33-
::: src.dve.core_engine.backends.implementations.spark.readers.csv.SparkCSVReader
33+
::: dve.core_engine.backends.implementations.spark.readers.csv.SparkCSVReader
3434
options:
3535
heading_level: 3
3636
members:
@@ -40,15 +40,15 @@
4040

4141
=== "DuckDB"
4242

43-
::: src.dve.core_engine.backends.implementations.duckdb.readers.json.DuckDBJSONReader
43+
::: dve.core_engine.backends.implementations.duckdb.readers.json.DuckDBJSONReader
4444
options:
4545
heading_level: 3
4646
members:
4747
- __init__
4848

4949
=== "Spark"
5050

51-
::: src.dve.core_engine.backends.implementations.spark.readers.json.SparkJSONReader
51+
::: dve.core_engine.backends.implementations.spark.readers.json.SparkJSONReader
5252
options:
5353
heading_level: 3
5454
members:
@@ -58,29 +58,29 @@
5858

5959
=== "Base"
6060

61-
::: src.dve.core_engine.backends.readers.xml.BasicXMLFileReader
61+
::: dve.core_engine.backends.readers.xml.BasicXMLFileReader
6262
options:
6363
heading_level: 3
6464
merge_init_into_class: true
6565
members: false
6666

6767
=== "DuckDB"
6868

69-
::: src.dve.core_engine.backends.implementations.duckdb.readers.xml.DuckDBXMLStreamReader
69+
::: dve.core_engine.backends.implementations.duckdb.readers.xml.DuckDBXMLStreamReader
7070
options:
7171
heading_level: 3
7272
members:
7373
- __init__
7474

7575
=== "Spark"
7676

77-
::: src.dve.core_engine.backends.implementations.spark.readers.xml.SparkXMLStreamReader
77+
::: dve.core_engine.backends.implementations.spark.readers.xml.SparkXMLStreamReader
7878
options:
7979
heading_level: 3
8080
members:
8181
- __init__
8282

83-
::: src.dve.core_engine.backends.implementations.spark.readers.xml.SparkXMLReader
83+
::: dve.core_engine.backends.implementations.spark.readers.xml.SparkXMLReader
8484
options:
8585
heading_level: 3
8686
members:

docs/user_guidance/auditing.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,17 @@ tags:
33
- Auditing
44
---
55

6-
The Auditing objects within the DVE are used to help control and store information about a given submission and what stage it's currently at. In addition to the above, it's also used to store statistics about the submission and the number of validations it has triggered etc. So, for users not interested in using the Error reports stage, you could source information directly from the audit tables.
6+
The Auditing objects within the DVE are used to help control and store information about submitted data and what stage it's currently at. In addition to the above, it's also used to store statistics about the submission and the number of validations it has triggered etc. So, for users not interested in using the Error reports stage, you could source information directly from the audit tables.
77

88
## Audit Tables
9+
910
Currently, these are the audit tables that can be accessed within the DVE:
1011

1112
| Table Name | Purpose |
1213
| --------------------- | ------- |
13-
| processing_status | Contains information about the submission and what the current processing status is. |
14-
| submission_info | Contains information about the submitted file. |
15-
| submission_statistics | Contains validation statistics for each submission. |
14+
| `processing_status` | Contains information about the submission and what the current processing status is. |
15+
| `submission_info` | Contains information about the submitted file. |
16+
| `submission_statistics` | Contains validation statistics for each submission. |
1617

1718
## Audit Objects
1819

docs/user_guidance/business_rules.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22
title: Business Rules
33
tags:
44
- Business Rules
5+
- dischema
6+
- Rule Store
7+
- Reference Data
58
---
69

710
The Business Rules section contain the rules you want to apply to your dataset. Rule logic might include...
@@ -14,7 +17,7 @@ All rules are written in `SQL`. Depending on which [backend implementation](./im
1417

1518
When writing the rules, you need to be aware that the expressions are wrapped in `NOT` expression. So, you should write the rules as though you are looking for non problematic values.
1619

17-
When rules are being applied, [Complex Rules](./business_rules.md#complex-rules) are always applied before [Rules](./business_rules.md#rules) and [Filters](./business_rules.md#filters)
20+
When rules are being applied, [Complex Rules](./business_rules.md#complex-rules) are always applied before [Rules](./business_rules.md#rules) and [Filters](./business_rules.md#filters).
1821

1922
This page is meant to give you greater details on how you can write your Business Rules. If you want a summary of how the Business Rules work, then please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
2023

@@ -125,7 +128,7 @@ If you need to perform more complex rules, with pre-steps, then see the [Complex
125128

126129
### Types of rejections
127130

128-
You may have noticed the three type of rejections in the example above. For any given rule you can reject a record, the whole file (submission) or just raise a warning. More details in table below:
131+
You may have noticed the field "failure_type" in the example above. For any given rule (filter) you can reject a record, the whole file (submission) or just raise a warning. Here are the details around the currently supported Rejection Types:
129132

130133
| Rejection Type | Behaviour | How to set in the rule |
131134
| -------------- | --------- | ---------------------- |

docs/user_guidance/data_contract.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ tags:
66
- Domain Types
77
---
88

9-
The Data Contract describes the structure (models) of your data and controls how the data should be typecasted. We use [Pydantic](https://docs.pydantic.dev/1.10/) to generate and validate the models. This page is meant to give you greater details on how you should write your Data Contract. If you want a summary of how the Data Contract works, please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
9+
The Data Contract defines the structure (models) of your data and controls how it is typecast. We use [Pydantic](https://docs.pydantic.dev/1.10/) to generate and validate the models. This page is meant to give you greater details on how you should write your Data Contract. If you want a summary of how the Data Contract works, please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
1010

1111
!!! Note
1212

docs/user_guidance/file_transformation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ The File Transformation stage within the DVE is used to convert submitted files
6969
}
7070
```
7171

72-
The secondary use of the File Transformation stage is the ability to normalise your data into multiple entities. Imagine you had something like Hospital and Patient data in a single submission. You could split this out into seperate entities so that the validated outputs of the data could be loaded into seperate tables. For example:
72+
The secondary use of the File Transformation stage is the ability to normalise your data into multiple entities. Imagine you had something like Hospital and Patient data in a single submission. You could split this out into seperate entities so that the validated outputs of the data could be loaded into seperate tables (parquet). For example:
7373

7474
=== "DuckDB"
7575

docs/user_guidance/getting_started.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ tags:
99

1010
## Rules Configuration Introduction
1111

12-
To use the DVE you will need to create a dischema document. The dischema document describes how the DVE should validate your data. It's divided into two primary parts. The first part is the `contract` (data contract) - this describes the structure of your data and controls how the data should be typecasted. For example, here is a dischema document describing how the DVE might validate data about a movies dataset:
12+
To use the DVE you will need to create a dischema document. The dischema document describes how the DVE should validate your data. It's divided into two primary parts. The first part is the `contract` (data contract) - this defines the structure of your data and determines how it is modeled and typecast. For example, here is a dischema document describing how the DVE may validate data about a movies:
1313

1414
!!! example "Example `movies.dischema.json`"
1515

@@ -72,7 +72,7 @@ For each dataset definition, you will need to provide a `reader_config` which de
7272

7373
To learn more about how you can construct your Data Contract please read [here](data_contract.md).
7474

75-
The second part of the dischema are the `business_rules` *or* `tranformations`. This section describes the validation rules you want to apply to entities defined within the `contract`. For example, with our `movies` dataset above, we may want to check that movies in this dataset are less than 4 hours long. The expression to write this check is written in SQL and that syntax may change slightly depending on the SQL backend you've choosen (we currently support [DuckDB](implementations/duckdb.md) and [Spark SQL](implementations/spark.md)).
75+
The second part of the dischema are the `tranformations` (business_rules). This section describes the validation rules you want to apply to entities defined within the `contract`. For example, with our `movies` dataset above, we may want to check that movies in this dataset are less than 4 hours long. The expression to write this check is written in SQL and that syntax may change slightly depending on the SQL backend you've choosen (we currently support [DuckDB](implementations/duckdb.md) and [Spark SQL](implementations/spark.md)).
7676
!!! example "Example `movies.dischema.json`"
7777

7878
```json
@@ -90,7 +90,7 @@ The second part of the dischema are the `business_rules` *or* `tranformations`.
9090
}
9191
}
9292
```
93-
You may look at the expression above and think "Hang on! That's the opposite of what you want! You're only getting movies less than 4 hours!", however, all validation rules are wrapped inside a `NOT` expression. So, you write the rules as though you are looking for non problematic values.
93+
You may look at the expression above and think "Hang on! That's the opposite of what you want! You're only getting movies less than 4 hours!", __however, all validation rules are wrapped inside a `NOT` expression__. So, you write the rules as though you are looking for non problematic values.
9494

9595
We also offer a feature called `complex_rules`. These are rules where you need to transform the data before you can apply the rule. For instance, you may want to perform a join, aggregate the data, or perform a filter. The complex rules allow you to combine "pre-steps" before you perform the validation.
9696

docs/user_guidance/implementations/duckdb.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ DuckDBRefDataLoader.connection = db_con
111111
DuckDBRefDataLoader.dataset_config_uri = Path("path", "to", "my", "rules").as_posix()
112112
```
113113

114-
The connection passed into the `DuckDBRefDataLoader` object will then be able use various DuckDB readers to load data from an existing table on the connection OR loading data from reference data persisted in either `parquet` or `pyarrow` format.
114+
The connection passed into the `DuckDBRefDataLoader` object will then be able to use various DuckDB readers to load data from an existing table on the connection OR loading data from reference data persisted in either `parquet` or `pyarrow` format.
115115

116116
If you want to learn more about the reference data loaders, you can view the advanced user guidance [here](../../advanced_guidance/package_documentation/refdata_loaders.md).
117117

0 commit comments

Comments
 (0)