This project demonstrates practical data extraction, transformation, and loading (ETL) techniques using Python. The assignment focuses on working with JSON data from both GitHub-hosted files and public REST APIs, and exporting the cleaned datasets into CSV files.
The main objectives of this assignment were to:
- Extract JSON data from a publicly hosted GitHub repository
- Consume live REST API endpoints using Python
- Transform nested JSON data into structured DataFrames
- Export cleaned and organized datasets into CSV format
- Demonstrate practical ETL workflow concepts using Python
requests: for sending HTTP requests and extracting API datapandas:for data manipulation and DataFrame operationsjson:for parsing and formatting JSON responsesdatetime:for generating timestamped output files
- GitHub : for hosting raw JSON datasets
- Mockaroo : for generating synthetic employee datasets
- DummyJSON API : for accessing sample REST API data
LUXDEV_WEEK9_ASSIGNMENT/
│
├── data/ # Raw and processed datasets
│ ├── cart_data_*.csv
│ ├── employee_raw_data_mock.json
│ ├── mock_employee_data_*.csv
│ └── products_data_*.csv
│
├── src/ # Python source files
│ ├── carts_Api_dummy.py # Cart API extraction pipeline
│ ├── employee_mock_git.py # GitHub raw JSON extraction
│ └── product_dummy.py # Product API extraction pipeline
│
├── Assignment_instructions.txt
├── README.md
└── requirements.txt
A synthetic employee dataset was generated using Mockaroo and exported in JSON format. The generated JSON dataset was uploaded to a public GitHub repository. A raw GitHub URL was then used to access the file programmatically using Python.
The extracted JSON data was:
- Retrieved using the
requestslibrary - Parsed into Python objects
- Converted into a Pandas DataFrame
- Cleaned and structured appropriately
- Exported into CSV format
Data was extracted from the following DummyJSON API endpoints:
- Sent a GET request to the API
- Extracted the JSON response
- Flattened nested JSON structures
- Normalized nested product data
- Converted the response into a DataFrame
- Exported the final dataset into CSV format
The project follows a simplified ETL (Extract, Transform, Load) pipeline:
- Retrieved JSON data from GitHub and REST APIs
- Cleaned and normalized nested JSON structures
- Structured the data into tabular DataFrames
- Flattened cart-product relationships
- Exported transformed datasets into CSV files for further analysis
Through this assignment, the following concepts were practiced:
- Working with REST APIs using Python
- Consuming and parsing JSON data
- Data normalization and transformation
- Using Pandas for tabular data handling
- Exporting datasets into CSV format
- Building beginner-level ETL pipelines
This assignment successfully demonstrates how Python can be used for real world data extraction and transformation workflows.