First ML Project

This is my first Machine Learning project.

Goal

Predict house prices from various property features using a supervised learning model.

Features Used

Rooms: Number of rooms in the property
Distance: Distance from the central business district (in kilometers)
Postcode: Postal code of the property
Bedroom2: Number of bedrooms (as reported by the real estate agent)
Bathroom: Number of bathrooms
Car: Number of car spots
Landsize: Land size in square meters
BuildingArea: Building size in square meters
YearBuilt: Year the house was built
Lattitude: Geographic latitude
Longtitude: Geographic longitude
Propertycount: Number of properties in the same suburb

Model

Random Forest Regressor was used to train the model and predict house prices.

Algorithm Type: Ensemble method (combines multiple decision trees).
Metric: Mean Absolute Error (MAE).
Average Accuracy: Approximately 85.08%.

Dataset

Name: Melbourne Housing Dataset.
Source: Kaggle — Melbourne Housing Market dataset.
Size: ~13,580 rows and 21 columns.
Target Variable: Price

Visualizations

Actual vs Predicted Prices: Scatter plot showing model predictions against real values.
Error Distribution: Histogram showing how prediction errors are spread.
Top 20 Features: Histogram displaying the top 20 features after proper encoding
Residual Graph: This is a scatter plot showing the residuals

Deployment (Streamlit app)

The File app.py contains the UI for the app
the UI was made possible via Streamlit
The steps to launch it are as follow:

Have the PKL code ready within your model's code, the section is clearly defined with comments within melb_model.py
Run that file, this will create the pkl file
launch the app using the command streamlit run app.py
Following the link that gets displayed will lead us towards the app within your browser of choice

Tools & Libraries

Python: for obvious reasons
Pandas: used for loading cleaning and manipulating the dataset.
NumPy: Provides efficient numerical operations and array handling, which Pandas and Scikit-learn both depend on internally.
Scikit-learn: Used for machine learning tasks — splitting data, training models (RandomForestRegressor), and evaluating performance (Mean Absolute Error).
Matplotlib: Handles data visualization, used to create plots and charts to see how the model performs (e.g., actual vs predicted).
Seaborn: Built on top of Matplotlib, used for statistical visualizations and improving the appearance of plots (e.g., the error distribution histogram).

Notes

The dataset is for educational and non-commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
app.py		app.py
melb_model.py		melb_model.py
melb_model_decision_tree.py		melb_model_decision_tree.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

First ML Project

Goal

Features Used

Model

Dataset

Visualizations

Deployment (Streamlit app)

Tools & Libraries

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

First ML Project

Goal

Features Used

Model

Dataset

Visualizations

Deployment (Streamlit app)

Tools & Libraries

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages