An end-to-end machine learning project that detects whether a URL is safe or malicious using NLP and classification techniques.
Malicious URLs are a primary vector for phishing, malware delivery, and cyberattacks. This project identifies potentially harmful URLs by analyzing URL text patterns and classifying them as:
- โ
goodโ Safe URL - โ
badโ Malicious / Suspicious URL
The system is composed of three parts:
| Component | Description |
|---|---|
| MalURL_Model | Jupyter notebook for model training + dataset |
| MalURL_Backend | Flask REST API that loads the model and serves predictions |
| MalURL_Frontend | React web interface for user interaction |
ML-based-Malicious-URL-Detector/
โโโ MalURL_Backend/
โ โโโ main.py # Flask API server
โ โโโ trained_model.pkl # Serialized Logistic Regression model
โ โโโ vectorizer.pkl # Serialized TF-IDF vectorizer
โ
โโโ MalURL_Frontend/
โ โโโ public/
โ โโโ src/
โ โโโ package.json
โ โโโ package-lock.json
โ
โโโ MalURL_Model/
โ โโโ Detecting Malicious URL.ipynb # Training notebook
โ โโโ urldata.csv # Labeled URL dataset
โ
โโโ README.md
User Input (URL)
โ
React Frontend
โ HTTP POST /verify
Flask Backend
โ
TF-IDF Vectorizer โ Logistic Regression Model
โ
Prediction: "good" or "bad"
โ
SweetAlert2 Popup
- The model is trained on a labeled dataset of URLs (
good/bad) - URLs are tokenized and transformed into numerical features via TF-IDF
- A Logistic Regression classifier is trained and achieves ~96% accuracy
- The model and vectorizer are serialized as
.pklfiles - The Flask API loads these files and exposes a
/verifyendpoint - The React frontend sends URLs to the API and displays results
- React
- Axios
- SweetAlert2
- React Router DOM
- Python / Flask
- Flask-CORS
- Joblib
- Scikit-learn
- Pandas & NumPy
- TF-IDF Vectorizer
- Logistic Regression
- Jupyter Notebook
- Python 3.8+
- Node.js 16+
- npm
# 1. Navigate to the backend folder
cd MalURL_Backend
# 2. Create and activate a virtual environment
# Windows
python -m venv venv
venv\Scripts\activate
# macOS / Linux
python3 -m venv venv
source venv/bin/activate
# 3. Install dependencies
pip install flask flask-cors joblib scikit-learn pandas numpy
# 4. Start the server
python main.pyThe backend will be running at: http://localhost:5000
# 1. Navigate to the frontend folder
cd MalURL_Frontend
# 2. Install dependencies
npm install
# 3. Start the React app
npm startThe frontend will be running at: http://localhost:3000
Checks whether a given URL is safe or malicious.
Request Body
{
"url": "http://example.com"
}Response
{
"result": "good"
}or
{
"result": "bad"
}The notebook at MalURL_Model/Detecting Malicious URL.ipynb covers the full ML pipeline:
- Loading and exploring
urldata.csv - URL preprocessing and tokenization
- TF-IDF vectorization
- Train/test split
- Logistic Regression training
- Model evaluation (~96% accuracy)
- Saving model and vectorizer as
.pklfiles
- Phishing URL detection
- Cybersecurity education and research
- Secure browsing assistants
- URL screening / validation tools
- ML + cybersecurity portfolio projects
- Advanced feature engineering (domain age, WHOIS data, etc.)
- Ensemble methods (Random Forest, XGBoost) for improved accuracy
- URL reputation and domain-based features
- Docker support and cloud deployment
- Batch URL scanning
- Model confidence scores in API response
- Logging, analytics, and monitoring dashboard
- Improved frontend UI/UX
- Ensure
trained_model.pklandvectorizer.pklare present inMalURL_Backend/before starting the server. - The frontend communicates with the backend on
localhostupdate the endpoint URL before deploying to production. - CORS is enabled for local development; review CORS settings before any public deployment.
Kushan Bhagya
This project is intended for educational and portfolio purposes only.