Skip to content

caydenrgarrett/ML-Fake-News-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Fake News Detection Project

image alt

This project demonstrates a simple fake news detection model using a Passive Aggressive Classifier and TF-IDF vectorization.

Project Steps:

  1. Load Data: The project starts by loading the news data from a CSV file into a pandas DataFrame.
df = pd.read_csv('/content/news.csv')
print(df.shape)
display(df.head())

# Get accurate labels from dataframe
labels=df.label
labels.head()

Below is the training set used for the model and output:

Below is a preview of the dataset used for training and testing the model:

| Unnamed: 0 | Title                                         | Text                                                        | Label |
|------------|-----------------------------------------------|-------------------------------------------------------------|-------|
| 8476       | You Can Smell Hillary’s Fear                  | Daniel Greenfield, a Shillman Journalism Fello...            | FAKE  |
| 10294      | Watch The Exact Moment Paul Ryan Committed... | Google Pinterest Digg Linkedin Reddit Stumbleu...            | FAKE  |
| 3608       | Kerry to go to Paris in gesture of sympathy   | U.S. Secretary of State John F. Kerry said Mon...            | REAL  |
| 10142      | Bernie supporters on Twitter erupt in anger...| — Kaydee King (@KaydeeKing) November 9, 2016 T...            | FAKE  |
| 875        | The Battle of New York: Why This Primary...   | It's primary day in New York and front-runners...            | REAL  |
  1. Split Data: The data is split into training and testing sets to prepare for model training and evaluation.
# Split into training and testing sets
x_train,x_test,y_train,y_test=train_test_split(df['text'], labels, test_size=0.2, random_state=7)
  1. Vectorize Text: The text data is transformed into numerical features using TF-IDF (Term Frequency-Inverse Document Frequency) vectorization. This process converts the text into a matrix of token counts, weighted by their frequency.
# Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)

# Train and transform the training set, and transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)
  1. Train Model: A Passive Aggressive Classifier is initialized and trained on the vectorized training data. This is a type of online learning algorithm that is suitable for large datasets.
# Initialize the PAC
pac = PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train, y_train)
  1. Evaluate Model: The trained model is used to make predictions on the test set, and the accuracy of the model is calculated. A confusion matrix is also generated to understand the performance in terms of true positives, true negatives, false positives, and false negatives.
# Make predictions on the test set and calculate accuracy
y_pred = pac.predict(tfidf_test)
score = accuracy_score(y_test, y_pred)
print(f"Accuracy: {round(score*100,2)}%")

Output:

Accuracy: 92.58%

Code Highlights:

  • Loading data using pandas.read_csv().
  • Splitting data into training and testing sets using sklearn.model_selection.train_test_split().
  • Initializing and fitting a TfidfVectorizer from sklearn.feature_extraction.text.
  • Initializing and training a PassiveAggressiveClassifier from sklearn.linear_model.
  • Evaluating the model using sklearn.metrics.accuracy_score() and sklearn.metrics.confusion_matrix().

Results:

The model achieved an accuracy of 92.2% on the test set.

Predicted FAKE Predicted REAL
Actual FAKE 586 (TP) 43 (FN)
Actual REAL 51 (FP) 587 (TN)

About

This project applies machine learning to detect fake news articles. Using TF-IDF vectorization and a Passive Aggressive Classifier, the model classifies news as real or fake and is evaluated with accuracy and a confusion matrix.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors