This project implements an end-to-end Machine Learning pipeline for predicting house prices. It demonstrates the ability to go from raw data β insights β predictive models β deployable artifacts
- Build a predictive model for house prices.
- Showcase data cleaning, feature engineering, and EDA.
- Compare baseline (Linear Regression) with advanced ML models (Random Forest, XGBoost, Gradient Boosting).
- Evaluate models using industry-standard metrics (RΒ², MAE, RMSE).
- Deliver clear visualizations and business-ready insights.
- EDA & Visualization: Correlation heatmaps, distribution plots, actual vs predicted plots, feature importance.
- Feature Engineering: Derived features like
house_age,price_per_sqft. - Modeling: Linear Regression, Random Forest, Gradient Boosting, XGBoost with hyperparameter tuning.
- Evaluation: RΒ², MAE, RMSE with visualization of residuals.
- Reproducibility: Clean Colab/Jupyter Notebook with modular pipeline.
- Scalability: Extensible for larger datasets and deployable via Flask/FastAPI + cloud.
- Languages: Python (NumPy, Pandas, Matplotlib, Seaborn)
- ML Libraries: Scikit-learn, XGBoost
- Tools: Jupyter/Colab, GitHub
- Extensions (optional): Flask/FastAPI for API deployment, Docker for containerization, AWS/GCP for cloud scaling
- Top predictive features: Location, square footage, number of rooms.
- Tree-based models (Random Forest, XGBoost) outperformed linear regression.
- Strong correlation between square footage & house price.
- Visualizations provide clear business insights for real estate pricing strategies.