A curated collection of beginner-friendly data science projects with real datasets, clear explanations, and working code. Learn by building.
This repository is designed for anyone getting started with data science -- students, career switchers, and self-learners. Each project is a standalone Jupyter notebook that you can clone and run immediately.
You will learn:
- Data Cleaning and Preprocessing -- preparing real-world messy data
- Exploratory Data Analysis -- visualizations and statistical insights
- Machine Learning -- classification, regression, and anomaly detection
- Deep Learning -- CNNs, transfer learning, and NLP models
- Computer Vision -- detection, recognition, and pose estimation
Start from the top and work your way down. Projects are ordered by difficulty within each level.
Get comfortable with pandas, sklearn, and basic ML workflows.
| # | Project | What You'll Learn | Category |
|---|---|---|---|
| 1 | Titanic Survival Prediction | EDA, data cleaning, feature engineering, 7 classifiers, GridSearchCV | Classification |
| 2 | Iris Flower Classification | Image classification with CNNs, data loading | Classification |
| 3 | Customer Churn | Logistic regression from scratch, prediction on new data | Classification |
| 4 | Heart Failure Prediction | Feature analysis, multiple classifiers, model evaluation | Classification |
| 5 | Rental Prices of AirBnb | Linear regression, outlier analysis, label encoding | Regression |
Learn to work with text data, preprocessing pipelines, and NLP techniques.
| # | Project | What You'll Learn | Category |
|---|---|---|---|
| 6 | Message Spam Filtering | TF-IDF, text preprocessing, SVM classification | NLP |
| 7 | Cyber-Bullying Prediction | NLP pipeline, GridSearchCV, model comparison | NLP |
| 8 | Sentiment Analysis | Logistic regression from scratch, Twitter data, NLTK | NLP |
| 9 | AirBnb Reviews Sentimental Analysis | Full NLP pipeline: preprocessing, ML, deep learning, LLMs | NLP |
Work with images, neural networks, and pre-trained models.
| # | Project | What You'll Learn | Category |
|---|---|---|---|
| 10 | Gender Classification | EfficientNetV2, transfer learning, Keras | Classification |
| 11 | Face Detection | Haar cascades, MTCNN, OpenCV | Computer Vision |
| 12 | Face Recognition | LBPH algorithm, real-time webcam recognition | Computer Vision |
| 13 | Eye Disease Detection | ResNet34, data augmentation pipeline, medical imaging | Computer Vision |
| 14 | Alzheimer Detection | Clinical data analysis, Random Forest on medical data | Computer Vision |
Tackle more complex real-world problems.
| # | Project | What You'll Learn | Category |
|---|---|---|---|
| 15 | Network Intrusion Detection System | Ensemble methods, XGBoost, KDD Cup dataset | Anomaly Detection |
| 16 | Object Detection | YOLOv8, Faster R-CNN, RetinaNet, Detectron2 | Computer Vision |
| 17 | Pose Estimation | YOLOv8, MediaPipe, activity classification | Computer Vision |
| 18 | Robotics and Computer Integrated Manufacturing | MobileNetV2, transfer learning, industrial imaging | Robotics |
| # | Project | Category | Difficulty |
|---|---|---|---|
| 1 | Titanic Survival Prediction | Classification | Beginner |
| 2 | Iris Flower Classification | Classification | Beginner |
| 3 | Customer Churn | Classification | Beginner |
| 4 | Heart Failure Prediction | Classification | Beginner |
| 5 | Rental Prices of AirBnb | Regression | Beginner |
| 6 | Message Spam Filtering | NLP | Beginner |
| 7 | Cyber-Bullying Prediction | NLP | Beginner |
| 8 | Sentiment Analysis | NLP | Intermediate |
| 9 | AirBnb Reviews Sentimental Analysis | NLP | Intermediate |
| 10 | Gender Classification | Classification | Intermediate |
| 11 | Face Detection | Computer Vision | Intermediate |
| 12 | Face Recognition | Computer Vision | Intermediate |
| 13 | Eye Disease Detection | Computer Vision | Intermediate |
| 14 | Alzheimer Detection | Computer Vision | Intermediate |
| 15 | Network Intrusion Detection System | Anomaly Detection | Advanced |
| 16 | Object Detection | Computer Vision | Advanced |
| 17 | Pose Estimation | Computer Vision | Advanced |
| 18 | Robotics and Computer Integrated Manufacturing | Robotics | Advanced |
- Python 3.9+
- Jupyter Notebook or JupyterLab
pip install pandas numpy matplotlib seaborn scikit-learn jupyter| Project Type | Install |
|---|---|
| Deep Learning | pip install tensorflow keras |
| Computer Vision | pip install opencv-python |
| NLP | pip install nltk |
| Object Detection | pip install ultralytics |
Each project has its own requirements.txt for exact dependencies:
cd "Project Folder Name"
pip install -r requirements.txt
jupyter notebookgit clone https://github.com/tkarim45/Beginner-Data-Science-Projects.git
cd Beginner-Data-Science-Projects
# Pick a project and run it
cd "Iris Flower Classification"
pip install -r requirements.txt
jupyter notebookContributions are welcome! Please read our Contributing Guide before submitting a PR.
Quick rules:
- One project per PR
- Include a README, requirements.txt, and working notebook
- Host large datasets externally (>10 MB)
- Do not commit model binaries
This project is licensed under the MIT License -- use it freely for learning, teaching, or building.
If this repo helped you, consider giving it a star -- it helps others find it too.
