📩 SMS Spam Classifier

A machine learning-based web application that classifies SMS messages as either "Spam" or "Ham" (legitimate) with high precision. The application is built using Python and deployed via Streamlit, utilizing a Multinomial Naive Bayes classifier trained on the SMS Spam Collection dataset.

🚀 Project Overview

Unwanted spam messages are a nuisance and a potential security threat. This project aims to filter these messages out by analyzing their text content. The system uses Natural Language Processing (NLP) techniques to preprocess text and a machine learning model to predict the category of the message.

Input: Raw SMS text.
Output: Classification (Spam/Not Spam).
Key Metric: The model was selected specifically for its 100% Precision score, ensuring that legitimate messages are not accidentally classified as spam.

🛠️ Tech Stack

Language: Python 3.x
Web Framework: Streamlit
Machine Learning: Scikit-learn (MultinomialNB, TF-IDF)
Natural Language Processing: NLTK (PorterStemmer, Stopwords, Tokenization)
Data Manipulation: Pandas, NumPy
Visualization: Matplotlib, Seaborn, WordCloud

📂 Project Structure

SMS-Spam-detection.ipynb: Jupyter Notebook for EDA, preprocessing, and model training.
app1.py: Main Streamlit application script.
spam.csv: Dataset used for training.
vectorizer.pkl: Saved TF-IDF Vectorizer (generated by notebook).
model.pkl: Saved Naive Bayes Model (generated by notebook).
requirements.txt: List of project dependencies.
README.md: Project documentation.

⚙️ How It Works

1. Model Training (`SMS-Spam-detection.ipynb`)

The notebook covers the entire data science pipeline:

Data Cleaning: Dropping empty columns, renaming features, and handling duplicates.
Exploratory Data Analysis (EDA): Analyzing message lengths, class distribution (imbalanced), and visualizing frequent words using WordClouds.
Preprocessing:
- Lowercasing text.
- Tokenization (breaking text into words).
- Removing special characters, punctuation, and stopwords.
- Stemming (reducing words to their root form).
Vectorization: Using TfidfVectorizer (max features=3000) to convert text into numerical data.
Model Selection: After testing multiple algorithms (Logistic Regression, SVM, Random Forest, etc.), Multinomial Naive Bayes was chosen for offering the best balance of accuracy (~97%) and precision (1.0).

2. The Application (`app1.py`)

The Streamlit app provides a user interface for the model:

Loads the pre-trained vectorizer.pkl and model.pkl files.
Accepts user input via a text area.
Preprocesses the input using the same pipeline defined in training.
Displays the prediction ("Spam" or "Not Spam").

🔧 Installation & Run

Clone the repository:

git clone [https://github.com/yourusername/sms-spam-classifier.git](https://github.com/yourusername/sms-spam-classifier.git)
cd sms-spam-classifier

Create a virtual environment (Recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Note: Ensure you download the necessary NLTK data:

import nltk
nltk.download('punkt')
nltk.download('stopwords')

Run the App:
```
streamlit run app1.py
```

📦 Requirements

streamlit
pandas
numpy
scikit-learn
nltk
matplotlib
seaborn
wordcloud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📩 SMS Spam Classifier

🚀 Project Overview

🛠️ Tech Stack

📂 Project Structure

⚙️ How It Works

1. Model Training (`SMS-Spam-detection.ipynb`)

2. The Application (`app1.py`)

🔧 Installation & Run

📦 Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
SMS-Spam-detection.ipynb		SMS-Spam-detection.ipynb
app1.py		app1.py
model.pkl		model.pkl
sms_synthetic_10k.csv		sms_synthetic_10k.csv
spam.csv		spam.csv
vectorizer.pkl		vectorizer.pkl

Folders and files

Latest commit

History

Repository files navigation

📩 SMS Spam Classifier

🚀 Project Overview

🛠️ Tech Stack

📂 Project Structure

⚙️ How It Works

1. Model Training (SMS-Spam-detection.ipynb)

2. The Application (app1.py)

🔧 Installation & Run

📦 Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Model Training (`SMS-Spam-detection.ipynb`)

2. The Application (`app1.py`)

Packages