docs: updated content and fixed auto code doc pathing

georgeRobertson · georgeRobertson · commit ebd7ef586275 · 2026-03-17T10:44:55.000Z
diff --git a/docs/advanced_guidance/index.md b/docs/advanced_guidance/index.md
@@ -1,3 +1,7 @@
+---
+title: Advanced Guidance
+---
+
 <div class="grid cards" markdown>
 
 -   :material-file-code:{ .lg .middle } __DVE Code Reference Documentation__
diff --git a/docs/advanced_guidance/package_documentation/auditing.md b/docs/advanced_guidance/package_documentation/auditing.md
@@ -1,5 +1,5 @@
 
-::: src.dve.core_engine.backends.base.auditing.BaseAuditingManager
+::: dve.core_engine.backends.base.auditing.BaseAuditingManager
     options:
         heading_level: 3
         members:
diff --git a/docs/advanced_guidance/package_documentation/index.md b/docs/advanced_guidance/package_documentation/index.md
@@ -1,3 +1,7 @@
+---
+title: Package Documentation
+---
+
 <div class="grid cards" markdown>
 
 -   :material-language-python:{ .lg .middle } __Pipeline__
diff --git a/docs/advanced_guidance/package_documentation/readers.md b/docs/advanced_guidance/package_documentation/readers.md
@@ -2,35 +2,35 @@
 
 === "Base"
 
-    ::: src.dve.core_engine.backends.readers.csv.CSVFileReader
+    ::: dve.core_engine.backends.readers.csv.CSVFileReader
         options:
             heading_level: 3
             merge_init_into_class: true
             members: false
 
 === "DuckDB"
 
-    ::: src.dve.core_engine.backends.implementations.duckdb.readers.csv.DuckDBCSVReader
+    ::: dve.core_engine.backends.implementations.duckdb.readers.csv.DuckDBCSVReader
         options:
             heading_level: 3
             members:
                 - __init__
 
-    ::: src.dve.core_engine.backends.implementations.duckdb.readers.csv.PolarsToDuckDBCSVReader
+    ::: dve.core_engine.backends.implementations.duckdb.readers.csv.PolarsToDuckDBCSVReader
         options:
             heading_level: 3
             members:
                 - __init__
 
-    ::: src.dve.core_engine.backends.implementations.duckdb.readers.csv.DuckDBCSVRepeatingHeaderReader
+    ::: dve.core_engine.backends.implementations.duckdb.readers.csv.DuckDBCSVRepeatingHeaderReader
         options:
             heading_level: 3
             members:
                 - __init__
 
 === "Spark"
 
-    ::: src.dve.core_engine.backends.implementations.spark.readers.csv.SparkCSVReader
+    ::: dve.core_engine.backends.implementations.spark.readers.csv.SparkCSVReader
         options:
             heading_level: 3
             members:
@@ -40,15 +40,15 @@
 
 === "DuckDB"
 
-    ::: src.dve.core_engine.backends.implementations.duckdb.readers.json.DuckDBJSONReader
+    ::: dve.core_engine.backends.implementations.duckdb.readers.json.DuckDBJSONReader
         options:
             heading_level: 3
             members:
                 - __init__
 
 === "Spark"
 
-    ::: src.dve.core_engine.backends.implementations.spark.readers.json.SparkJSONReader
+    ::: dve.core_engine.backends.implementations.spark.readers.json.SparkJSONReader
         options:
             heading_level: 3
             members:
@@ -58,29 +58,29 @@
 
 === "Base"
 
-    ::: src.dve.core_engine.backends.readers.xml.BasicXMLFileReader
+    ::: dve.core_engine.backends.readers.xml.BasicXMLFileReader
         options:
             heading_level: 3
             merge_init_into_class: true
             members: false
 
 === "DuckDB"
 
-    ::: src.dve.core_engine.backends.implementations.duckdb.readers.xml.DuckDBXMLStreamReader
+    ::: dve.core_engine.backends.implementations.duckdb.readers.xml.DuckDBXMLStreamReader
         options:
             heading_level: 3
             members:
                 - __init__
 
 === "Spark"
 
-    ::: src.dve.core_engine.backends.implementations.spark.readers.xml.SparkXMLStreamReader
+    ::: dve.core_engine.backends.implementations.spark.readers.xml.SparkXMLStreamReader
         options:
             heading_level: 3
             members:
                 - __init__
 
-    ::: src.dve.core_engine.backends.implementations.spark.readers.xml.SparkXMLReader
+    ::: dve.core_engine.backends.implementations.spark.readers.xml.SparkXMLReader
         options:
             heading_level: 3
             members:
diff --git a/docs/user_guidance/auditing.md b/docs/user_guidance/auditing.md
@@ -3,16 +3,17 @@ tags:
     - Auditing
 ---
 
-The Auditing objects within the DVE are used to help control and store information about a given submission and what stage it's currently at. In addition to the above, it's also used to store statistics about the submission and the number of validations it has triggered etc. So, for users not interested in using the Error reports stage, you could source information directly from the audit tables.
+The Auditing objects within the DVE are used to help control and store information about submitted data and what stage it's currently at. In addition to the above, it's also used to store statistics about the submission and the number of validations it has triggered etc. So, for users not interested in using the Error reports stage, you could source information directly from the audit tables.
 
 ## Audit Tables
+
 Currently, these are the audit tables that can be accessed within the DVE:
 
 | Table Name            | Purpose |
 | --------------------- | ------- |
-| processing_status     | Contains information about the submission and what the current processing status is. |
-| submission_info       | Contains information about the submitted file. |
-| submission_statistics | Contains validation statistics for each submission. |
+| `processing_status`     | Contains information about the submission and what the current processing status is. |
+| `submission_info`       | Contains information about the submitted file. |
+| `submission_statistics` | Contains validation statistics for each submission. |
 
 ## Audit Objects
 
diff --git a/docs/user_guidance/business_rules.md b/docs/user_guidance/business_rules.md
@@ -2,6 +2,9 @@
 title: Business Rules
 tags:
     - Business Rules
+    - dischema
+    - Rule Store
+    - Reference Data
 ---
 
 The Business Rules section contain the rules you want to apply to your dataset. Rule logic might include...
@@ -14,7 +17,7 @@ All rules are written in `SQL`. Depending on which [backend implementation](./im
 
 When writing the rules, you need to be aware that the expressions are wrapped in `NOT` expression. So, you should write the rules as though you are looking for non problematic values.
 
-When rules are being applied, [Complex Rules](./business_rules.md#complex-rules) are always applied before [Rules](./business_rules.md#rules) and [Filters](./business_rules.md#filters)
+When rules are being applied, [Complex Rules](./business_rules.md#complex-rules) are always applied before [Rules](./business_rules.md#rules) and [Filters](./business_rules.md#filters).
 
 This page is meant to give you greater details on how you can write your Business Rules. If you want a summary of how the Business Rules work, then please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
 
@@ -125,7 +128,7 @@ If you need to perform more complex rules, with pre-steps, then see the [Complex
 
 ### Types of rejections
 
-You may have noticed the three type of rejections in the example above. For any given rule you can reject a record, the whole file (submission) or just raise a warning. More details in table below:
+You may have noticed the field "failure_type" in the example above. For any given rule (filter) you can reject a record, the whole file (submission) or just raise a warning. Here are the details around the currently supported Rejection Types:
 
 | Rejection Type | Behaviour | How to set in the rule |
 | -------------- | --------- | ---------------------- |
diff --git a/docs/user_guidance/data_contract.md b/docs/user_guidance/data_contract.md
@@ -6,7 +6,7 @@ tags:
     - Domain Types
 ---
 
-The Data Contract describes the structure (models) of your data and controls how the data should be typecasted. We use [Pydantic](https://docs.pydantic.dev/1.10/) to generate and validate the models. This page is meant to give you greater details on how you should write your Data Contract. If you want a summary of how the Data Contract works, please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
+The Data Contract defines the structure (models) of your data and controls how it is typecast. We use [Pydantic](https://docs.pydantic.dev/1.10/) to generate and validate the models. This page is meant to give you greater details on how you should write your Data Contract. If you want a summary of how the Data Contract works, please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
 
 !!! Note
 
diff --git a/docs/user_guidance/file_transformation.md b/docs/user_guidance/file_transformation.md
@@ -69,7 +69,7 @@ The File Transformation stage within the DVE is used to convert submitted files
     }
     ```
 
-The secondary use of the File Transformation stage is the ability to normalise your data into multiple entities. Imagine you had something like Hospital and Patient data in a single submission. You could split this out into seperate entities so that the validated outputs of the data could be loaded into seperate tables. For example:
+The secondary use of the File Transformation stage is the ability to normalise your data into multiple entities. Imagine you had something like Hospital and Patient data in a single submission. You could split this out into seperate entities so that the validated outputs of the data could be loaded into seperate  tables (parquet). For example:
 
 === "DuckDB"
 
diff --git a/docs/user_guidance/getting_started.md b/docs/user_guidance/getting_started.md
@@ -9,7 +9,7 @@ tags:
 
 ## Rules Configuration Introduction
 
-To use the DVE you will need to create a dischema document. The dischema document describes how the DVE should validate your data. It's divided into two primary parts. The first part is the `contract` (data contract) - this describes the structure of your data and controls how the data should be typecasted. For example, here is a dischema document describing how the DVE might validate data about a movies dataset:
+To use the DVE you will need to create a dischema document. The dischema document describes how the DVE should validate your data. It's divided into two primary parts. The first part is the `contract` (data contract) - this defines the structure of your data and determines how it is modeled and typecast. For example, here is a dischema document describing how the DVE may validate data about a movies:
 
 !!! example "Example `movies.dischema.json`"
 
@@ -72,7 +72,7 @@ For each dataset definition, you will need to provide a `reader_config` which de
 
 To learn more about how you can construct your Data Contract please read [here](data_contract.md).
 
-The second part of the dischema are the `business_rules` *or* `tranformations`. This section describes the validation rules you want to apply to entities defined within the `contract`. For example, with our `movies` dataset above, we may want to check that movies in this dataset are less than 4 hours long. The expression to write this check is written in SQL and that syntax may change slightly depending on the SQL backend you've choosen (we currently support [DuckDB](implementations/duckdb.md) and [Spark SQL](implementations/spark.md)).
+The second part of the dischema are the `tranformations` (business_rules). This section describes the validation rules you want to apply to entities defined within the `contract`. For example, with our `movies` dataset above, we may want to check that movies in this dataset are less than 4 hours long. The expression to write this check is written in SQL and that syntax may change slightly depending on the SQL backend you've choosen (we currently support [DuckDB](implementations/duckdb.md) and [Spark SQL](implementations/spark.md)).
 !!! example "Example `movies.dischema.json`"
 
     ```json
@@ -90,7 +90,7 @@ The second part of the dischema are the `business_rules` *or* `tranformations`.
         }
     }
     ```
-You may look at the expression above and think "Hang on! That's the opposite of what you want! You're only getting movies less than 4 hours!", however, all validation rules are wrapped inside a `NOT` expression. So, you write the rules as though you are looking for non problematic values.
+You may look at the expression above and think "Hang on! That's the opposite of what you want! You're only getting movies less than 4 hours!", __however, all validation rules are wrapped inside a `NOT` expression__. So, you write the rules as though you are looking for non problematic values.
 
 We also offer a feature called `complex_rules`. These are rules where you need to transform the data before you can apply the rule. For instance, you may want to perform a join, aggregate the data, or perform a filter. The complex rules allow you to combine "pre-steps" before you perform the validation.
 
diff --git a/docs/user_guidance/implementations/duckdb.md b/docs/user_guidance/implementations/duckdb.md
@@ -111,7 +111,7 @@ DuckDBRefDataLoader.connection = db_con
 DuckDBRefDataLoader.dataset_config_uri = Path("path", "to", "my", "rules").as_posix()
 ```
 
-The connection passed into the `DuckDBRefDataLoader` object will then be able use various DuckDB readers to load data from an existing table on the connection OR loading data from reference data persisted in either `parquet` or `pyarrow` format.
+The connection passed into the `DuckDBRefDataLoader` object will then be able to use various DuckDB readers to load data from an existing table on the connection OR loading data from reference data persisted in either `parquet` or `pyarrow` format.
 
 If you want to learn more about the reference data loaders, you can view the advanced user guidance [here](../../advanced_guidance/package_documentation/refdata_loaders.md).
 
diff --git a/docs/user_guidance/install.md b/docs/user_guidance/install.md
@@ -17,27 +17,27 @@ You can install the DVE package through python package managers such as [pip](ht
 === "pip"
 
     ```sh
-    pip install git+https://github.com/NHSDigital/data-validation-engine.git@vMaj.Min.Pat
+    pip install data-validation-engine
     ```
 
 === "pipx"
 
     ```sh
-    pipx install git+https://github.com/NHSDigital/data-validation-engine.git@vMaj.Min.Pat
+    pipx install data-validation-engine
     ```
 
 === "uv"
 
     Add to your existing `uv` project...
     ```sh
-    uv add git+https://github.com/NHSDigital/data-validation-engine.git@vMaj.Min.Pat
+    uv add data-validation-engine
     ```
 
     ...or you can add via your `pyproject.toml`...
 
     ```toml
     dependencies = [
-        nhs-dve @ https://github.com/NHSDigital/data-validation-engine.git@vMaj.Min.Pat
+        data-validation-engine
     ]
     ```
 
@@ -53,14 +53,14 @@ You can install the DVE package through python package managers such as [pip](ht
 
     Add to your existing `poetry` project...
     ```sh
-    poetry add git+https://github.com/NHSDigital/data-validation-engine.git@vMaj.Min.Pat
+    poetry add data-validation-engine
     ```
 
     ...or you can add via your `pyproject.toml`...
 
     ```toml
     [tool.poetry.dependencies]
-    nhs-dve = { git = "https://github.com/NHSDigital/data-validation-engine.git", tag = "vMaj.Min.Pat" }
+    data-validation-engine = "*"
     ```
 
     ```sh
@@ -71,11 +71,8 @@ You can install the DVE package through python package managers such as [pip](ht
     poetry install
     ```
 
-!!! note
-    Replace `Maj.Min.Pat` with the version of the DVE you want. We recommend the latest release if you're just starting with the DVE.
-
 !!! info
-    We are working on getting the DVE available via PyPi and Conda. We will update this page with the relevant instructions once this has been successfully setup.
+    We are working on getting the DVE available via Conda. We will update this page with the relevant instructions once this has been successfully setup.
 
 Python dependencies are listed in the [`pyproject.toml`](https://github.com/NHSDigital/data-validation-engine/blob/main/pyproject.toml). Many of the dependencies are locked to quite restrictive versions due to complexity of this package. Core packages such as Pydantic, Pyspark and DuckDB are unlikely to receive flexible version constraints as changes in those packages could cause the DVE to malfunction. For less important dependencies, we have tried to make the contraints more flexible. Therefore, we would advise you to install the DVE into a seperate environment rather than trying to integrate it into an existing Python environment.
 
@@ -84,8 +81,8 @@ Once you have installed the DVE you are almost ready to use it. To be able to ru
 
 ## DVE Version Compatability Matrix
 
-| DVE Version  | Python Version | DuckDB Version | Spark Version |
-| ------------ | -------------- | -------------- | ------------- |
-| >=0.6        | >=3.10,<3.12   | 1.1.*          | 3.4.*         |
-| >=0.2,<0.6   | >=3.10,<3.12   | 1.1.0          | 3.4.4         |
-| 0.1          | >=3.7.2,<3.8   | 1.1.0          | 3.2.1         |
+| DVE Version  | Python Version | DuckDB Version | Spark Version | Pydantic Version |
+| ------------ | -------------- | -------------- | ------------- | ---------------- |
+| >=0.6        | >=3.10,<3.12   | 1.1.*          | 3.4.*         | 1.10.15          |
+| >=0.2,<0.6   | >=3.10,<3.12   | 1.1.0          | 3.4.4         | 1.10.15          |
+| 0.1          | >=3.7.2,<3.8   | 1.1.0          | 3.2.1         | 1.10.15          |

Original file line number	Diff line number	Diff line change
`@@ -69,7 +69,7 @@ The File Transformation stage within the DVE is used to convert submitted files`
`69`	`69`	`}`
`70`	`70`	```
`71`	`71`
`72`		`-The secondary use of the File Transformation stage is the ability to normalise your data into multiple entities. Imagine you had something like Hospital and Patient data in a single submission. You could split this out into seperate entities so that the validated outputs of the data could be loaded into seperate tables. For example:`
	`72`	`+The secondary use of the File Transformation stage is the ability to normalise your data into multiple entities. Imagine you had something like Hospital and Patient data in a single submission. You could split this out into seperate entities so that the validated outputs of the data could be loaded into seperate tables (parquet). For example:`
`73`	`73`
`74`	`74`	`=== "DuckDB"`
`75`	`75`