This project aims to detect fake news articles using Natural Language Processing (NLP) and Machine Learning techniques within a Jupyter Notebook environment. The model is trained to classify news as FAKE or REAL based on the article content.
Jupyter Notebook (Recommended)
Pandas – data handling
NumPy – numerical operations
Matplotlib / Seaborn – data visualization
NLTK – text preprocessing
Scikit-learn – machine learning tools
This project uses the Fake and Real News Dataset from Kaggle.
After downloading, extract the zip and place the Fake.csv file in your project directory.
The dataset used is Fake.csv, which contains:
title – Headline of the news article
text – Full content of the article
subject – Category or topic (e.g., News, Politics)
date – Date of publication
(Assumption: Dataset includes a label column with FAKE or REAL values)
Open your terminal or Anaconda Prompt and launch Jupyter Notebook:
bash Copy Edit jupyter notebook Open the notebook file: fake_news_detection.ipynb
Run each cell step-by-step:
Import libraries
Load dataset
Preprocess text
Vectorize text using TF-IDF
Train model
Evaluate results
Accuracy Score of the model
Classification Report with precision, recall, F1-score
Confusion Matrix Heatmap for visual evaluation
Importing Libraries
Loading the Dataset
Text Cleaning – Lowercasing, removing punctuation and stopwords
TF-IDF Vectorization
Model Training – Logistic Regression
Evaluation – Accuracy, Confusion Matrix, Report
Experiment with other models (SVM, Naive Bayes, Random Forest)
Add LSTM or Transformer-based Deep Learning models
Build a web interface using Flask or Streamlit for live news testing
python
Copy
Edit
Fake-News-Detection/
│
├── fake_news_detection.ipynb
├── Fake.csv
├── README.md
Accuracy: 0.985 Classification Report: precision recall f1-score support
Real 0.98 0.99 0.99 780
Fake 0.99 0.98 0.98 776
accuracy 0.98 1556
macro avg 0.98 0.98 0.98 1556 weighted avg 0.98 0.98 0.98 1556 Confusion Matrix:
Predicted Real Predicted Fake Real 770 10 Fake 14 762
A heatmap showing the confusion matrix:
X-axis: Predicted labels
Y-axis: True labels
Annotated values