Real-time, intent-based zero-day phishing detection powered by a hybrid ML ensemble and a distributed microservices architecture.
Phishing Sentinel protects users from sophisticated phishing attacks by analyzing the structural DNA of a webpage (DOM) and its lexical identity (URL) simultaneously — catching malicious pages before they're ever reported to a blocklist.
- Hybrid Detection Engine — Two specialized models analyze both URL lexical patterns (41 features) and DOM structural intent in parallel, delivering a fused confidence score.
- Zero-Day Readiness — Detects malicious intent from page structure and URL entropy alone, even for freshly registered domains with no prior threat history.
- Tiered Threat Levels — Rather than binary block/allow decisions, the system returns a probabilistic confidence score mapped to Green / Yellow / Red alert tiers.
- Real-Time Observation Log — A live React dashboard polls the Go backend to surface active threats, scan history, and per-user security statistics.
- Context-Isolated Sessions — JWT-authenticated API ensures every user sees only their own scan history and statistics.
- Cross-Context Token Sync — The dashboard pushes auth tokens to the Chrome Extension via
externally_connectable, enabling seamless single-login across both surfaces. - Edge-First Privacy — DOM extraction happens locally in the browser extension before being forwarded; raw page content never persists beyond the analysis request.
- Interactive Demo — Unauthenticated visitors can simulate a full phishing interception and view mock analytics without creating an account.
Phishing Sentinel is structured as a monorepo of four decoupled layers:
phishing-sentinel/
├── api/ # Go/Gin API Gateway
│ ├── cmd/
│ │ ├── main.go # Server entry point, routing, middleware
│ │ └── types.go # Request/response structs & ML service client
│ └── internal/
│ ├── db.go # GORM + Supabase PostgreSQL initialization
│ └── model.go # User and ScanLog data models
├── dashboard/ # React/Vite Web Dashboard
│ └── src/
│ ├── components/
│ │ └── Demo/ # Interactive demo (no login required)
│ ├── pages/
│ │ ├── Dashboard/ # Threat log & statistics view
│ │ └── Login/ # Auth + token sync to extension
├── extension/ # Manifest V3 Chrome Extension
│ └── src/
│ ├── background/ # Service worker (PNA/CORS-safe request handling)
│ ├── content/ # DOM scraping & metadata extraction
│ └── popup/ # Extension UI
└── ml-service/ # Python/FastAPI Inference Engine
├── src/
│ ├── main.py # FastAPI server & /analyze endpoint
│ ├── preprocess.py # 16-feature extraction pipeline
│ └── train.py # Model training script
├── models/ # Serialized .pkl model & scaler artifacts
└── datasets/ # Training data sources
Browser Tab
│
▼
Chrome Extension (Content Script)
→ Extracts URL + DOM metadata locally
│
▼
Go API Gateway :8080 (/api/analyze)
→ Validates JWT
→ Forwards payload to ML service
→ Persists result to Supabase
→ Returns verdict to extension
│
▼
Python FastAPI ML Service :8000 (/analyze)
→ Runs preprocess.py (feature extraction)
→ Inference via Balanced Random Forest
→ Returns { verdict, phishing_probability, analyzed_features }
│
▼
React Dashboard
→ Polls /api/logs & /api/stats
→ Renders live threat feed and trust score
A Balanced Random Forest Classifier (n_estimators=100, class_weight='balanced') trained on a labeled dataset of 79,760 websites.
Accuracy : 94.58%
Precision: 0.94 (phishing) / 0.95 (legitimate)
Recall : 0.92 (phishing) / 0.96 (legitimate)
F1-Score : 0.93 (phishing) / 0.96 (legitimate)
class_weight='balanced' ensures the minority phishing class receives proportionally higher loss weighting, reducing false negatives — the more dangerous failure mode for a security tool.
Lexical / URL Features
| Feature | Description |
|---|---|
url_entropy |
Shannon entropy of the URL string |
has_ip_address |
Flags raw IP addresses used in place of a hostname |
subdomain_count |
Number of subdomain levels |
url_length |
Total character length of the URL |
param_count |
Number of query string parameters |
sensitive_keywords |
Presence of terms like login, verify, secure, account |
Structural / DOM Intent Features
| Feature | Description |
|---|---|
external_link_ratio |
Ratio of links pointing outside the page's own domain |
empty_links_ratio |
Ratio of href="#" or javascript:void(0) anchors |
suspicious_form_action |
Form action attribute submitting to a third-party host |
hidden_iframe_count |
Number of zero-dimension or hidden iframe elements |
script_to_content_ratio |
Ratio of <script> tags to visible content nodes |
password_field_count |
Number of <input type="password"> fields |
input_field_count |
Total number of form input fields |
meta_refresh_present |
Presence of automatic page redirect via <meta http-equiv="refresh"> |
external_resource_ratio |
Ratio of externally hosted CSS/JS/image resources |
dom_nesting_depth |
Maximum depth of the DOM tree |
The dashboard includes a no-login demo mode accessible directly from the Login page. It walks visitors through a complete phishing interception simulation.
Demo flow:
- Visitor clicks "Try the Demo" on the Login page.
- A simulated browser viewport renders a fake phishing page (spoofed PayPal login).
- After a 2-second scan animation, the Sentinel blocking overlay appears — identical to what the Chrome Extension renders in a real browser.
- Dismissing the overlay reveals a mock analytics dashboard populated with realistic scan history and threat statistics.
- A CTA at the bottom prompts the visitor to create an account.
Mounting the demo in Login.jsx:
import { useState } from "react";
import Demo from "../../components/Demo/Demo";
export default function Login() {
const [showDemo, setShowDemo] = useState(false);
return (
<>
{showDemo && <Demo onClose={() => setShowDemo(false)} />}
{/* your existing login form ... */}
<button onClick={() => setShowDemo(true)}>
Try the Demo — no account needed
</button>
</>
);
}The Demo component is self-contained and manages its own state. It renders as a fixed full-screen modal and accepts a single onClose prop.
| Method | Endpoint | Auth | Description |
|---|---|---|---|
POST |
/register |
No | Create a new user account |
POST |
/login |
No | Authenticate and receive a JWT |
GET |
/health |
No | Service health check |
POST |
/api/analyze |
JWT | Submit URL + DOM for ML analysis |
GET |
/api/stats |
JWT | Fetch per-user scan statistics |
GET |
/api/logs |
JWT | Fetch paginated scan history (newest first) |
Request
{
"url": "https://secure-login.paypa1.com/verify",
"dom_content": "<html>...</html>",
"metadata": {}
}Response
{
"is_spoof": true,
"confidence_score": 0.94,
"threat_level": "Phishing",
"detected_anomalies": []
}| Tool | Version |
|---|---|
| Go | 1.21+ |
| Python | 3.10+ |
| Node.js | 18+ |
| Supabase project | (for PostgreSQL) |
cd ml-service
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
python src/main.py
# Listening on http://localhost:8000Create api/.env:
DATABASE_URI="host=<supabase-host> user=postgres.<project-id> password=<password> dbname=postgres port=5432 sslmode=require"
JWT_SECRET="your-secure-secret"cd api
go mod tidy
go run cmd/main.go
# Listening on http://localhost:8080Create dashboard/.env:
VITE_API_URL="http://localhost:8080"cd dashboard
npm install
npm run dev- Navigate to
chrome://extensions/in Chrome. - Enable Developer Mode (top-right toggle).
- Click Load unpacked and select the
extension/directory. - Copy the generated Extension ID and paste it into
Login.jsxto enable cross-context token sync.
Place your dataset CSV in ml-service/datasets/, then:
cd ml-service
python src/train.pyOutput artifacts saved to ml-service/models/:
phishing_sentinel_model_v2.pkl— Serialized classifierscaler.pkl— Feature scaler
- Passwords are currently stored as plain text in this build. Hash with
bcryptbefore any production deployment. - The CORS policy is set to
*for local development. RestrictAccess-Control-Allow-Originto specific origins in production. - The ML service is not directly exposed; all external traffic routes through the authenticated Go gateway.
- JWT tokens expire after 24 hours and are validated on every protected route.
| Layer | Technology |
|---|---|
| API Gateway | Go, Gin, GORM, golang-jwt |
| Database | Supabase (PostgreSQL) |
| ML Service | Python, FastAPI, scikit-learn, pandas |
| Dashboard | React, Vite, Tailwind CSS |
| Extension | Vanilla JS, Manifest V3 |
MIT License. See LICENSE for details.