Skip to content

Yash49hdfmj/Email_spam_Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

README.md for Email Spam Classifier


Email Spam Classifier

This repository contains an implementation of an email spam classifier using a Naive Bayes algorithm. The classifier is designed to distinguish between spam and non-spam (ham) emails based on their content. This is achieved through the use of text preprocessing, feature extraction using a bag-of-words model, and training a Naive Bayes classifier.

Table of Contents

  1. Installation
  2. Usage
  3. Data Description
  4. Model Description
  5. Example Usage
  6. Results
  7. Contributing
  8. License

Installation

To run the spam classifier, you need to have Python 3.x installed along with the following libraries:

  • pandas
  • numpy
  • scikit-learn

You can install the necessary libraries using pip:

pip install pandas numpy scikit-learn

Usage

  1. Clone this repository to your local machine.
  2. Place the spam.csv file in the repository's root directory.
  3. Run the Python script to train the model and classify emails.
python spam_classifier.py

Data Description

The dataset used for this classifier is a collection of labeled email messages. The dataset contains the following columns:

  • Category: The label for each email, either "spam" or "ham".
  • Message: The content of the email.

The dataset is preprocessed by adding an additional column spam which contains binary values:

  • 1: Indicates the email is spam.
  • 0: Indicates the email is not spam (ham).

Model Description

The model is built using the following steps:

  1. Data Preprocessing: The Category column is converted into a numerical format, where spam is represented as 1 and ham as 0.
  2. Train-Test Split: The data is split into training and testing sets.
  3. Feature Extraction: The CountVectorizer is used to convert the text data into a sparse matrix of word counts.
  4. Model Training: A Multinomial Naive Bayes classifier is trained on the transformed training data.
  5. Prediction and Evaluation: The model is tested on the test set to evaluate its performance.

Example Usage

Here’s an example of how you can use the trained model to classify new emails:

# Example ham email
email_ham = ["could you help me?"]
email_ham_count = cv.transform(email_ham)
print("Ham Email Prediction:", model.predict(email_ham_count))

# Example spam email
email_spam = ["free"]
email_spam_count = cv.transform(email_spam)
print("Spam Email Prediction:", model.predict(email_spam_count))

Results

After training, the model can be tested using the test data split. The model's accuracy is measured by comparing its predictions to the true labels of the test set.

# Evaluating the model
x_test_count = cv.transform(x_test)
accuracy = model.score(x_test_count, y_test)
print(f"Model Accuracy: {accuracy}")

Contributing

Contributions to this repository are welcome. You can contribute by:

  • Submitting bug reports and feature requests.
  • Forking the repository and creating a pull request with new features or improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.


This README file provides an overview of the email spam classifier project, including installation instructions, a description of the data and model, and usage examples. Feel free to modify the content to better suit your project specifics!

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published