scd2
Here are 31 public repositories matching this topic...
A modern banking data pipeline built with Dagster and DBT!
-
Updated
Jan 31, 2026 - Python
SCD2 implementation using pyspark
-
Updated
Mar 10, 2019 - Jupyter Notebook
end-to-end data pipeline system built as part of the Coursera open-source Data Engineering program. It unifies diverse data sources, implements SCD2 historical tracking, and orchestrates workflows using industry-standard tools.
-
Updated
May 25, 2026 - Python
P&C insurance claims lakehouse: Azure ADLS + Databricks (PySpark/Delta) + Snowflake + dbt, real-time FNOL fraud signals via Kafka, Airflow-orchestrated, Terraform-provisioned, OIDC-secured, with data contracts, lineage, and ADRs throughout.
-
Updated
May 19, 2026 - Makefile
Advanced Healthcare Claims Pipeline using Snowflake, Snowpipe, Streams, Tasks, SCD Type 2, and AWS S3. Automates ingestion, CDC, dimensional modeling, and data quality checks for healthcare patient and claims data.
-
Updated
Nov 10, 2025
Fortune-500-grade banking analytics platform: OLTP -> medallion lakehouse -> Kimball star schema -> semantic layer -> 9-tab executive dashboard + 5 ML models (churn, fraud, segmentation, forecasting). Production-ready, governed, fully tested.
-
Updated
Apr 30, 2026 - Python
This is a data engineering pipeline built on Databricks + Delta Lake + PySpark that ingests travel booking and customer master data, applies SCD Type 2 logic, and delivers analytics-ready tables. It includes data quality enforcement, dimension versioning, fact aggregation, and performance tuning.
-
Updated
Oct 8, 2025 - Jupyter Notebook
Modern data stack reference: dbt + BigQuery + Airflow (Cloud Composer) with medallion layering, SCD2 snapshots, exposures, freshness SLAs, and 45× cost reduction via partition + cluster + incremental tuning.
-
Updated
Apr 23, 2026 - Python
This repo contains details about travel booking project executed on Databricks, Thanks
-
Updated
May 9, 2026 - Python
Production-grade parameterized ETL pipeline implementing SCD Type 2 for travel booking data using Databricks, Delta Lake, and ADLS — includes data quality checks, incremental fact table build, Z-Order optimization, and SQL reporting.
-
Updated
Apr 6, 2026 - Jupyter Notebook
Production-grade CDC pipeline: MySQL → Debezium → Kinesis → S3 → AWS Glue (PySpark) → Redshift + Postgres + OpenSearch. Multi-sink fanout with SCD2, idempotency tracking, and 13 modular Terraform modules.
-
Updated
Apr 23, 2026 - Python
Batch retail data lakehouse on Databricks: Delta Live Tables (bronze → silver → gold), Unity Catalog, synthetic data generator, and an executive analytics dashboard.
-
Updated
Apr 2, 2026 - Python
End-to-end Medicare data engineering pipeline: API ingestion, PostgreSQL 17, dbt, dimensional modeling (Kimball/SCD2), Apache Airflow orchestration, and Evidence.dev dashboard. Built on a QEMU/KVM Rocky Linux VM.
-
Updated
Apr 28, 2026 - PLpgSQL
Production-style dbt case study with SCD-style modeling, point-in-time joins, incremental marts, tests, and analyst-facing SQL.
-
Updated
May 26, 2026
Plataforma BI end-to-end para agroexportadora peruana ficticia de pimiento piquillo. SQL Server DW con SCD2, ETL con stored procedures, dashboard Power BI con RLS.
-
Updated
May 24, 2026 - TSQL
Improve this page
Add a description, image, and links to the scd2 topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the scd2 topic, visit your repo's landing page and select "manage topics."