Skip to content

NexarObs/First-Machine-Learning-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

First ML Project

This is my first Machine Learning project.

Goal

Predict house prices from various property features using a supervised learning model.

Features Used

  • Rooms: Number of rooms in the property
  • Distance: Distance from the central business district (in kilometers)
  • Postcode: Postal code of the property
  • Bedroom2: Number of bedrooms (as reported by the real estate agent)
  • Bathroom: Number of bathrooms
  • Car: Number of car spots
  • Landsize: Land size in square meters
  • BuildingArea: Building size in square meters
  • YearBuilt: Year the house was built
  • Lattitude: Geographic latitude
  • Longtitude: Geographic longitude
  • Propertycount: Number of properties in the same suburb

Model

Random Forest Regressor was used to train the model and predict house prices.

  • Algorithm Type: Ensemble method (combines multiple decision trees).
  • Metric: Mean Absolute Error (MAE).
  • Average Accuracy: Approximately 85.08%.

Dataset

  • Name: Melbourne Housing Dataset.
  • Source: Kaggle — Melbourne Housing Market dataset.
  • Size: ~13,580 rows and 21 columns.
  • Target Variable: Price

Visualizations

  • Actual vs Predicted Prices: Scatter plot showing model predictions against real values.
  • Error Distribution: Histogram showing how prediction errors are spread.
  • Top 20 Features: Histogram displaying the top 20 features after proper encoding
  • Residual Graph: This is a scatter plot showing the residuals

Deployment (Streamlit app)

  • The File app.py contains the UI for the app
  • the UI was made possible via Streamlit
  • The steps to launch it are as follow:
  1. Have the PKL code ready within your model's code, the section is clearly defined with comments within melb_model.py
  2. Run that file, this will create the pkl file
  3. launch the app using the command streamlit run app.py
  4. Following the link that gets displayed will lead us towards the app within your browser of choice

Tools & Libraries

  • Python: for obvious reasons
  • Pandas: used for loading cleaning and manipulating the dataset.
  • NumPy: Provides efficient numerical operations and array handling, which Pandas and Scikit-learn both depend on internally.
  • Scikit-learn: Used for machine learning tasks — splitting data, training models (RandomForestRegressor), and evaluating performance (Mean Absolute Error).
  • Matplotlib: Handles data visualization, used to create plots and charts to see how the model performs (e.g., actual vs predicted).
  • Seaborn: Built on top of Matplotlib, used for statistical visualizations and improving the appearance of plots (e.g., the error distribution histogram).

Notes

The dataset is for educational and non-commercial use.

About

Very first Machine Learning Project, House Price Prediction, also has a GUI with Steamlit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages