Skip to content

kushanbhagya/ML-based-Malicious-URL-Detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ” ML-based Malicious URL Detector

An end-to-end machine learning project that detects whether a URL is safe or malicious using NLP and classification techniques.

Python React Flask Scikit-learn Accuracy


๐Ÿ“Œ Overview

Malicious URLs are a primary vector for phishing, malware delivery, and cyberattacks. This project identifies potentially harmful URLs by analyzing URL text patterns and classifying them as:

  • โœ… good โ€” Safe URL
  • โŒ bad โ€” Malicious / Suspicious URL

The system is composed of three parts:

Component Description
MalURL_Model Jupyter notebook for model training + dataset
MalURL_Backend Flask REST API that loads the model and serves predictions
MalURL_Frontend React web interface for user interaction

๐Ÿ—‚๏ธ Project Structure

ML-based-Malicious-URL-Detector/
โ”œโ”€โ”€ MalURL_Backend/
โ”‚   โ”œโ”€โ”€ main.py               # Flask API server
โ”‚   โ”œโ”€โ”€ trained_model.pkl     # Serialized Logistic Regression model
โ”‚   โ””โ”€โ”€ vectorizer.pkl        # Serialized TF-IDF vectorizer
โ”‚
โ”œโ”€โ”€ MalURL_Frontend/
โ”‚   โ”œโ”€โ”€ public/
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ package.json
โ”‚   โ””โ”€โ”€ package-lock.json
โ”‚
โ”œโ”€โ”€ MalURL_Model/
โ”‚   โ”œโ”€โ”€ Detecting Malicious URL.ipynb   # Training notebook
โ”‚   โ””โ”€โ”€ urldata.csv                     # Labeled URL dataset
โ”‚
โ””โ”€โ”€ README.md

โš™๏ธ How It Works

User Input (URL)
      โ†“
React Frontend
      โ†“  HTTP POST /verify
Flask Backend
      โ†“
TF-IDF Vectorizer  โ†’  Logistic Regression Model
      โ†“
Prediction: "good" or "bad"
      โ†“
SweetAlert2 Popup
  1. The model is trained on a labeled dataset of URLs (good / bad)
  2. URLs are tokenized and transformed into numerical features via TF-IDF
  3. A Logistic Regression classifier is trained and achieves ~96% accuracy
  4. The model and vectorizer are serialized as .pkl files
  5. The Flask API loads these files and exposes a /verify endpoint
  6. The React frontend sends URLs to the API and displays results

๐Ÿงฐ Tech Stack

Frontend

  • React
  • Axios
  • SweetAlert2
  • React Router DOM

Backend

  • Python / Flask
  • Flask-CORS
  • Joblib
  • Scikit-learn

Machine Learning

  • Pandas & NumPy
  • TF-IDF Vectorizer
  • Logistic Regression
  • Jupyter Notebook

๐Ÿš€ Getting Started

Prerequisites

  • Python 3.8+
  • Node.js 16+
  • npm

Backend Setup

# 1. Navigate to the backend folder
cd MalURL_Backend

# 2. Create and activate a virtual environment
# Windows
python -m venv venv
venv\Scripts\activate

# macOS / Linux
python3 -m venv venv
source venv/bin/activate

# 3. Install dependencies
pip install flask flask-cors joblib scikit-learn pandas numpy

# 4. Start the server
python main.py

The backend will be running at: http://localhost:5000


Frontend Setup

# 1. Navigate to the frontend folder
cd MalURL_Frontend

# 2. Install dependencies
npm install

# 3. Start the React app
npm start

The frontend will be running at: http://localhost:3000


๐Ÿ“ก API Reference

POST /verify

Checks whether a given URL is safe or malicious.

Request Body

{
  "url": "http://example.com"
}

Response

{
  "result": "good"
}

or

{
  "result": "bad"
}

๐Ÿงช Model Training

The notebook at MalURL_Model/Detecting Malicious URL.ipynb covers the full ML pipeline:

  • Loading and exploring urldata.csv
  • URL preprocessing and tokenization
  • TF-IDF vectorization
  • Train/test split
  • Logistic Regression training
  • Model evaluation (~96% accuracy)
  • Saving model and vectorizer as .pkl files

๐Ÿ’ก Example Use Cases

  • Phishing URL detection
  • Cybersecurity education and research
  • Secure browsing assistants
  • URL screening / validation tools
  • ML + cybersecurity portfolio projects

๐Ÿ”ฎ Future Improvements

  • Advanced feature engineering (domain age, WHOIS data, etc.)
  • Ensemble methods (Random Forest, XGBoost) for improved accuracy
  • URL reputation and domain-based features
  • Docker support and cloud deployment
  • Batch URL scanning
  • Model confidence scores in API response
  • Logging, analytics, and monitoring dashboard
  • Improved frontend UI/UX

โš ๏ธ Notes

  • Ensure trained_model.pkl and vectorizer.pkl are present in MalURL_Backend/ before starting the server.
  • The frontend communicates with the backend on localhost update the endpoint URL before deploying to production.
  • CORS is enabled for local development; review CORS settings before any public deployment.

๐Ÿ‘ค Author

Kushan Bhagya


๐Ÿ“„ License

This project is intended for educational and portfolio purposes only.

About

An end-to-end machine learning web application that detects whether a URL is safe or malicious. The project includes a React frontend for URL submission, a Flask backend API for prediction, and a trained NLP-based classification model built with TF-IDF vectorization and Logistic Regression.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors