GitHub - namann5/Ai_deepfake: AI-powered deepfake & synthetic image detection system — Node.js backend with EXIF forensics, visual artifact analysis, and weighted score aggregation.

💀 Deepfakes are getting scary good. Deepfake fights back — upload any image or video, get a verdict in seconds. Because not everything you see is real.

Department of Computer Science & Engineering (CSED) · Section 2FH

🧠 About the Project

Deepfake is a lightweight, web-based deepfake and AI-generated media verifier built as an academic project. As synthetic media becomes increasingly indistinguishable from reality, tools that help everyday users verify digital content are more critical than ever.

Deepfake analyzes uploaded images and videos through a 3-layer detection pipeline — powered by a real PyTorch deep learning model at its core:

Layer	What it does
🤖 Deep Learning Model	Xception backbone extracts deep visual forgery features; custom dense head produces a binary deepfake probability
🔎 Metadata Forensics	EXIF parsing — camera info detection, AI software flagging, timestamp analysis
⚖️ Score Aggregation	Weighted fusion → final probability score + verdict

The system outputs a probability score (e.g., 72% — Likely Synthetic) with confidence rating and detailed breakdown — no technical expertise required.

⚠️ Light Version — built for academic and demonstration purposes. Not legal or forensic-grade.

✨ Features

Feature	Description	Status
🖼️ Image Upload	Drag-and-drop or click — JPEG, PNG, WEBP supported	✅ Done
🎞️ Video Upload	OpenCV frame sampling + Haar Cascade face detection	✅ Done
🤖 Deep Learning Detection	Xception + custom dense head — trained forgery feature extraction	✅ Done
📊 AI Probability Score	0–100% confidence score with Real / Synthetic verdict	✅ Done
🔎 Metadata Forensics	EXIF parsing — flags missing camera, AI software, timestamps	✅ Done
⚖️ Weighted Score Fusion	3 signals combined into one final score intelligently	✅ Done
🗄️ Result Logging	Every scan saved to MongoDB with timestamp & breakdown	✅ Done
🎨 React UI	Reusable components — responsive, fast, state-driven	✅ Done
⚡ Fast Pipeline	End-to-end analysis completes in under 3 seconds	✅ Done
🐳 Docker Support	Full stack containerized — one command to run everything	✅ Done
🔁 CI/CD Pipeline	GitHub Actions — auto test, build, and publish on every push	✅ Done

🧬 ML Stack Architecture

The project contains three ML stacks. Only one is the active primary inference path.

✅ Active Stack — Primary Inference Path

Component	Technology
Service file	`image_server.py`
Framework	PyTorch
Backbone	Xception (`pytorchcv`)
Custom head	`BatchNorm1d → Linear → ReLU → BatchNorm1d → Linear`
Pooling	`AdaptiveAvgPool2d(1)` — resolution-robust feature compression
Video handling	OpenCV frame sampling + Haar Cascade face detection
API layer	FastAPI

Model-layer purpose:

Xception backbone — extracts deep visual features from image texture and patterns
AdaptiveAvgPool2d(1) — makes feature size robust for varied input resolutions
Custom dense head — maps extracted features to a single binary logit
Output — logit converted to deepfake probability, thresholded into REAL or DEEPFAKE
Video — multiple sampled frames scored individually by the image model, then aggregated

🟡 Secondary Stack — Dedicated Video Classifier (Not Primary)

Component	Technology
Service file	`video_server.py`
Framework	TensorFlow / Keras
Model format	`.keras`
Purpose	Direct video-classifier inference from selected frames

The Node.js backend currently routes video through the image endpoint in image_server.py. This stack exists for dedicated video classification but is not the active runtime path.

🔴 Legacy Stack — Archived

Component	Technology
Service file	`deepscan-backend/ml_service/app.py`
Framework	TensorFlow / Keras
Architecture	Reconstructed DenseNet121-based image model
API layer	Flask
Purpose	Older inference service — superseded by the active FastAPI stack

Active vs Legacy at a glance:

  image_server.py (FastAPI + PyTorch + Xception)   ← PRIMARY ✅
  video_server.py (Keras .keras model)              ← Secondary 🟡
  deepscan-backend/ml_service/app.py (Flask + DenseNet121) ← Legacy 🔴

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────┐
│                    LAYER 1 — FRONTEND                       │
│      React.js  ·  Component UI  ·  Upload Widget  ·  Axios  │
└─────────────────────┬───────────────────────────────────────┘
                      │  HTTP Request (multipart/form-data)
                      ▼
┌─────────────────────────────────────────────────────────────┐
│                 LAYER 2 — API GATEWAY                       │
│      Node.js / Express.js  ·  Multer Validator  ·  Router   │
└──────────┬──────────────────────────────┬───────────────────┘
           │                              │
           ▼                              ▼
┌──────────────────────┐      ┌───────────────────────────────┐
│  METADATA FORENSICS  │      │     ML DETECTION ENGINE       │
│  EXIF Parser (exifr) │      │  image_server.py (FastAPI)    │
│  Timestamp Checker   │      │  PyTorch + Xception Backbone  │
│  AI Software Flags   │      │  AdaptiveAvgPool2d(1)         │
│  Camera Info Check   │      │  Custom Dense Head            │
│                      │      │  OpenCV + Haar Cascade (video)│
└──────────┬───────────┘      └──────────────┬────────────────┘
           │                                  │
           └──────────────┬───────────────────┘
                          │  Weighted Score Fusion
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                  SCORE AGGREGATOR                           │
│     Model (50%)  +  Artifact (30%)  +  Metadata (20%)       │
│                  → Final Probability %                      │
│         → Verdict: REAL / UNCERTAIN / SYNTHETIC             │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
              Result returned to UI
         [ 72% — ⚠ LIKELY SYNTHETIC · Medium Confidence ]

          ┌────────────────────────┐
          │        MongoDB         │
          │  Store results & logs  │
          └────────────────────────┘

Docker Architecture

  Browser (localhost:3001)
          │
          ▼
  [ Frontend Container ]     React / Nginx — Port 3001→3000
          │
          │ HTTP API calls
          ▼
  [ Backend Container ]      Node.js + Express — Port 5000
    /api/analyze
    /api/results
          │
          │ HTTP (internal)
          ▼
  [ ML Service Container ]   FastAPI + PyTorch — image_server.py
          │
          │ Mongoose ODM
          ▼
  [ MongoDB Container ]      mongo:7 — Port 27017 — Volume: mongo_data

🔄 Workflow

User Uploads Image / Video
       │
       ▼
┌─────────────┐
│  Validate   │  ← File type · Size limit · Multer
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Preprocess  │  ← Resize · Normalize · Extract Metadata
│             │    (Video: OpenCV frame sampling + Haar Cascade)
└──────┬──────┘
       │
       ├──────────────────────────┐
       ▼                          ▼
┌─────────────┐          ┌────────────────┐
│  ML Model   │          │    Metadata    │
│  Inference  │          │   Forensics    │
│ (Xception + │          │   (exifr)      │
│  PyTorch)   │          │               │
└──────┬──────┘          └───────┬────────┘
       │                          │
       └──────────┬───────────────┘
                  ▼
         ┌────────────────┐
         │Score Aggregator│  ← Weighted fusion
         └───────┬────────┘
                 │
                 ▼
         ┌──────────────┐
         │  Save to DB  │  ← MongoDB logging
         └───────┬──────┘
                 │
                 ▼
        Display Result to User
      ┌─────────────────────────┐
      │  Final Score: 72%       │
      │  Verdict: ⚠ SYNTHETIC   │
      │  Confidence: Medium     │
      │  Breakdown: shown       │
      └─────────────────────────┘

Step-by-step breakdown:

📤 Upload — JPEG, PNG, WEBP, or video · max 5MB accepted
✅ Validate — Multer checks file type and size before processing
🤖 ML Inference — Xception backbone + custom dense head extracts deep forgery features; for video, OpenCV samples frames and Haar Cascade detects faces before per-frame scoring
🔎 Metadata Forensics — EXIF flags missing camera info, AI software signatures, and timestamp anomalies
⚖️ Score Fusion — Model 50% + Artifact 30% + Metadata 20% combined into a single probability score
🗄️ DB Logging — Full result saved to MongoDB for scan history
📊 Response — Score, verdict, confidence level, and detailed breakdown returned to the frontend

🛠️ Tech Stack

Layer	Technology	Purpose
Frontend	React.js · Axios · CSS3 · Nginx	Interactive UI, API communication, production serving
Backend	Node.js · Express.js · Multer	REST API, file handling, routing
ML Inference	PyTorch · Xception (pytorchcv) · FastAPI	Deep learning model serving — primary inference path
Video Processing	OpenCV · Haar Cascade	Frame sampling, face-focused preprocessing for video input
Metadata Analysis	exifr	EXIF metadata extraction and forensics scoring
Database	MongoDB · Mongoose	Scan result persistence and history
DevOps	Docker · Docker Compose · GitHub Actions	Containerization, orchestration, CI/CD
Dev Tools	Postman · VS Code · Git · GitHub	API testing, development, version control

🔬 EXIF Metadata Scoring

The metadata forensics layer analyzes hidden EXIF data embedded in every real camera photo. AI-generated images typically have none of this data — making its absence a strong synthetic signal.

Base Score: 50  (neutral starting point)

┌─────────────────────────────────────────────────────────────────┐
│                     EXIF SIGNAL TABLE                          │
├────────────────────────────┬────────────────────────────────────┤
│  Signal                    │  Score Impact                      │
├────────────────────────────┼────────────────────────────────────┤
│  No EXIF data at all       │  → score = 85  (instant flag)      │
│  No camera make/model      │  → score += 20                     │
│  Camera make/model found   │  → score -= 20                     │
│  AI software detected *    │  → score += 40                     │
│  No timestamp              │  → score += 10                     │
│  Timestamp found           │  → score -= 10                     │
│  No GPS data               │  → score += 5                      │
│  GPS data found            │  → score -= 10                     │
├────────────────────────────┼────────────────────────────────────┤
│  Final score               │  clamped between 0 and 100         │
└────────────────────────────┴────────────────────────────────────┘

* AI software detection covers:
  Stable Diffusion · Midjourney · DALL-E · Adobe Firefly

💡 Note: Images shared via WhatsApp or Telegram have their EXIF stripped, which inflates the metadata score even for real photos. Always upload the original file directly via USB for accurate results.

📡 API Endpoints

`POST /api/analyze`

Upload an image or video for deepfake analysis.

Request

Body    : form-data
Key     : image (type: File)
Allowed : .jpg · .jpeg · .png · .webp · video formats
Max Size: 5MB

Response

{
  "id": "69b03d5a06d049d960e7937c",
  "final_score": 72,
  "verdict": "LIKELY SYNTHETIC",
  "confidence": "Medium",
  "breakdown": {
    "model_score": 68,
    "artifact_score": 75,
    "metadata_score": 85
  },
  "flags": ["No EXIF data found — strong synthetic signal"],
  "analyzed_at": "2026-03-10T15:48:42.235Z"
}

`GET /api/results`

Fetch last 20 scan results from the database.

Response

{
  "results": [ ...array of past scans... ]
}

🚀 Getting Started

Option 1: Docker (Recommended)

See Docker Setup below — one command runs everything.

Option 2: Manual Setup

Prerequisites: Node.js v18+ · Python 3.9+ · MongoDB running locally

git clone https://github.com/namann5/Ai_deepfake.git
cd Ai_deepfake

ML Service (Active — image_server.py):

cd ml_service
pip install torch torchvision pytorchcv fastapi uvicorn opencv-python exifr
uvicorn image_server:app --port 8000

Backend:

cd deepfake-backend
npm install
node server.js

Create .env inside deepfake-backend/:

PORT=5000
MONGODB_URI=mongodb://localhost:27017/deepfake
ML_SERVICE_URL=http://localhost:8000

Frontend:

cd deepfake-frontend
npm install
npm start

Access: Frontend → http://localhost:3000 · Backend → http://localhost:5000 · ML Service → http://localhost:8000

🐳 Docker Setup

Prerequisites: Install Docker Desktop

Start everything (dev mode with hot-reload)

docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d

Stop everything (keep data)

docker compose -f docker-compose.yml -f docker-compose.dev.yml down

Stop and delete all data (fresh start)

docker compose -f docker-compose.yml -f docker-compose.dev.yml down -v

Rebuild after Dockerfile changes

docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d --build

View live logs

docker compose -f docker-compose.yml -f docker-compose.dev.yml logs -f

Access URLs (Docker)

Service	URL
Frontend App	http://localhost:3001
Backend API	http://localhost:5000
Backend Health	http://localhost:5000/
ML Service	http://localhost:8000
All Results	http://localhost:5000/api/results
MongoDB	mongodb://localhost:27017/deepfake

Why Docker?

Before Docker	After Docker
Install Node + Python + MongoDB manually	`docker compose up` — done
4 separate terminals to start	One command starts everything
"Works on my machine" bugs	Identical environment everywhere
Hours to onboard a new dev	~5 minutes from clone to running
Manual MongoDB + ML service setup	All containers start automatically

🔁 CI/CD Pipeline

Every push or pull request to main automatically triggers the GitHub Actions pipeline:

Push to main
     │
     ├── Job 1: Backend Test    → npm ci → run Jest tests
     ├── Job 2: Frontend Build  → npm ci → npm run build
     └── Job 3: Docker Push     → build images → push to GHCR
                                  (only on direct push, not PRs)

Pipeline file: .github/workflows/ci.yml

📐 Project Scope

✅ In Scope

Detection of AI-generated and deepfake images
Video deepfake detection via frame sampling and face detection
Deep learning model inference using PyTorch + Xception backbone
Metadata forensics via EXIF analysis
Probability scores with verdict and confidence rating
Clean web-based UI accessible to non-technical users
Scan history stored in MongoDB
Docker containerization for consistent cross-platform environments
CI/CD automated testing and deployment pipeline

❌ Out of Scope

Real-time live-stream deepfake detection
Legal or forensic-grade accuracy guarantees
Detection of AI models released after training data cutoff
Audio deepfake analysis

📅 Implementation Plan

Phase 1  ██████████  Requirement Analysis & Problem Understanding   ✅ Done
Phase 2  ██████████  Dataset & Pre-trained Model Selection          ✅ Done
Phase 3  ██████████  System Architecture Design                     ✅ Done
Phase 4  ██████████  Backend Development                            ✅ Done
Phase 5  ██████████  Frontend Development                           ✅ Done
Phase 6  ██████████  Frontend–Backend Integration                   ✅ Done
Phase 7  ████░░░░░░  Testing & Performance Evaluation               🔄 In Progress
Phase 8  ██░░░░░░░░  Deployment & Documentation                     🔄 In Progress

👥 Team

Name	Role	ID
Anurag Singh	Backend Development & System Analysis	12515990006
Arpita Raj	Frontend Development & UI Design	12515990007
Harshita Nagpal	Frontend Development & Documentation	12515990016
Naman Singh	Backend Development & Testing	12515990024

Supervisor: Mr. Abhishek Singh (Technical Trainer) Submitted To: Mr. Sanjay Madaan

📚 References

Research papers on Deepfake Detection
FaceForensics++ dataset — Kaggle
Xception architecture — Chollet, F. (2017)
pytorchcv documentation — pre-trained model library
PyTorch documentation — model training and inference
OpenCV documentation — video processing and face detection
exifr documentation — EXIF metadata parsing
FastAPI documentation — ML model serving
IEEE and ACM digital libraries
Docker documentation — containerization
GitHub Actions documentation — CI/CD

Department of Computer Science & Engineering (CSED) · Section 2FH · Academic Project

Deepfake — Fighting synthetic misinformation, one pixel at a time. 🔍

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github/workflows		.github/workflows
Ai_deepfake		Ai_deepfake
deepscan-backend		deepscan-backend
deepscan-frontend		deepscan-frontend
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
HOW_TO_RUN.md		HOW_TO_RUN.md
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
render.yaml		render.yaml

Folders and files

Latest commit

History

Repository files navigation

📌 Table of Contents

🧠 About the Project

✨ Features

🧬 ML Stack Architecture

✅ Active Stack — Primary Inference Path

🟡 Secondary Stack — Dedicated Video Classifier (Not Primary)

🔴 Legacy Stack — Archived

🏗️ System Architecture

Docker Architecture

🔄 Workflow

🛠️ Tech Stack

🔬 EXIF Metadata Scoring

📡 API Endpoints

POST /api/analyze

GET /api/results

🚀 Getting Started

Option 1: Docker (Recommended)

Option 2: Manual Setup

🐳 Docker Setup

Start everything (dev mode with hot-reload)

Stop everything (keep data)

Stop and delete all data (fresh start)

Rebuild after Dockerfile changes

View live logs

Access URLs (Docker)

Why Docker?

🔁 CI/CD Pipeline

📐 Project Scope

✅ In Scope

❌ Out of Scope

📅 Implementation Plan

👥 Team

📚 References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/analyze`

`GET /api/results`

Packages