PhishGuard AI is a real-world phishing detection system that combines machine learning, URL intelligence, and security heuristics to identify malicious links with high accuracy and clear explanations.
Built with a production-oriented mindset, it goes beyond simple classification by providing interpretable risk scores, attack pattern detection, and secure backend architecture.
-
🔍 Real-time URL Scanning Analyze any URL instantly with a trained ML model + rule-based intelligence
-
🧠 Hybrid Detection Engine Combines:
- Machine Learning (Random Forest)
- Heuristic Security Rules
- Domain Intelligence
-
🎯 High Accuracy Detection
-
Handles tricky phishing like:
google.com.secure-login.xyzpaypal.com.login.verify.ru
-
-
📊 Explainable Results
- Risk score (0–100%)
- Threat classification (Safe / Suspicious / Critical)
- Human-readable explanations of WHY a URL is dangerous
-
🖼️ Live Website Preview (Secure)
- Screenshot rendering via proxy
- SSRF-protected backend
-
📦 Bulk URL Scanning
- Scan multiple URLs simultaneously using real backend logic (no fake/demo fallback)
All URLs pass through a centralized feature pipeline:
- URL length, entropy, digit ratios
- Subdomain patterns & depth
- Suspicious keywords (login, verify, secure, etc.)
- Domain structure & TLD analysis
- Character distribution patterns
Defined in:
shared/features.py
-
Model: Random Forest Classifier
-
Trained on:
- 🔴 Phishing URLs (PhishTank)
- 🟢 Legitimate URLs (Tranco Top Domains)
-
Balanced dataset with real-world examples
Final score = ML prediction + security heuristics
Enhancements include:
- Brand impersonation detection
- Suspicious subdomain detection
- Trusted domain soft-adjustment (no blind trust)
- FastAPI backend
- SSRF protection for screenshot endpoint
- Safe URL handling and validation
phishguard/
│
├── backend/ # FastAPI backend
├── frontend/ # Web UI
├── model/ # Training + dataset pipeline
├── shared/ # Feature extraction logic
│
├── requirements.txt
└── README.md
git clone <your-repo-url>
cd phishguard
pip install -r requirements.txt
You need:
- PhishTank URLs (phishing dataset)
- Tranco top domains (legitimate dataset)
Place them in:
model/phishtank_urls.csv
model/legit_urls.csv
python model/build_dataset.py
python model/train_model.py
uvicorn backend.app:app --reload
Open:
frontend/index.html
- SSRF protection implemented in screenshot endpoint
- No blind trust for known domains
- Backend validates all URLs before processing
| URL | Risk |
|---|---|
| google.com | 0% (Safe) |
| amazon.com | 0% (Safe) |
| google.com.secure-login.xyz | 90%+ (Critical) |
| paypal.com.login.verify.ru | 95%+ (Critical) |
- Backend: FastAPI
- Frontend: Vanilla JS + HTML/CSS
- ML: Scikit-learn (Random Forest)
- Data: Pandas, real-world datasets
- Model calibration (confidence tuning)
- Hard-negative dataset expansion
- Cloud deployment (API + UI)
- Browser extension integration
Built as a full-stack AI security project focusing on:
- real-world applicability
- system design
- ML + security integration
This is not just a classifier — it's a phishing intelligence system designed to reflect how real detection pipelines work.
If you found this useful, consider starring ⭐ the repo.