📧 Spam Mail Prediction using Logistic Regression

This project predicts whether an incoming email is Spam or Ham (not spam) using a Logistic Regression model. The workflow includes preprocessing the dataset, splitting into training/testing sets, and training the model for accurate predictions.

🚀 Workflow

Mail Data Collect email datasets containing labeled examples of spam and ham messages.
Data Preprocessing
- Cleaning the text (removing stop words, punctuation, and numbers).
- Converting text into numerical form using techniques like TF-IDF.
- Handling missing values.
Train-Test Split Split the processed dataset into training and testing subsets to evaluate model performance.
Model Training Train a Logistic Regression model using the training dataset.
Prediction
- Input: New email text.
- Output: Spam or Ham.

🛠️ Tech Stack

Language: Python
Libraries:
- pandas – Data manipulation
- numpy – Numerical computations
- scikit-learn – Machine learning algorithms and preprocessing
- matplotlib / seaborn – Visualization

📊 Model Evaluation

The trained Logistic Regression model is evaluated using:

Accuracy
Precision
Recall
F1 Score

📊 Classification Report

This report summarizes the performance of the classification model. The model was evaluated on a test set of 15 samples, focusing on its ability to distinguish between "Ham" and "Spam" messages.

Detailed Metrics

Class	Precision	Recall	F1-Score	Support
Ham	0.64	1.00	0.78	7
Spam	1.00	0.50	0.67	8

Precision: The percentage of positive predictions that were actually correct.
Recall: The percentage of actual positives that were correctly identified.
F1-Score: A measure that balances precision and recall.
Support: The number of occurrences of each class in the test set.

Overall Performance

Metric	Score
Accuracy	0.73
Macro Average	0.72
Weighted Average	0.72

Accuracy: The percentage of total predictions that were correct.
Macro Average: The unweighted average of the metrics across all classes.
Weighted Average: The average of the metrics, weighted by the number of samples in each class.

📌 Future Improvements

Experiment with advanced models like Naive Bayes or Random Forest.
Implement real-time email scanning.
Create a web-based UI for user interaction.

🖊️ Author

Husnain Khaliq

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
Spam_Mail_Prediction.ipynb		Spam_Mail_Prediction.ipynb
mail_data.csv		mail_data.csv
spam_ham_dataset.csv		spam_ham_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📧 Spam Mail Prediction using Logistic Regression

🚀 Workflow

🛠️ Tech Stack

📊 Model Evaluation

📊 Classification Report

Detailed Metrics

Overall Performance

📌 Future Improvements

🖊️ Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📧 Spam Mail Prediction using Logistic Regression

🚀 Workflow

🛠️ Tech Stack

📊 Model Evaluation

📊 Classification Report

Detailed Metrics

Overall Performance

📌 Future Improvements

🖊️ Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages