Skip to content

f1shyfang/Notreally

Repository files navigation

NotReal.ly - Deepfake Detection Platform

A multi-modal deepfake detection system featuring a Next.js frontend and a Flask backend. Upload a video and the backend runs a real analysis pipeline (facial-landmark analysis, optional audio analysis, and container metadata) through a trained XGBoost classifier to produce an authenticity score with live progress.

✨ Features

  • Multi-Modal Analysis: Combines MediaPipe facial-feature analysis, optional audio (MFCC) processing, and ffmpeg container/metadata extraction.
  • Live Progress: Analysis runs in a background thread; the client polls for the real completion percentage and pipeline stage, so the progress bar reflects actual work done.
  • Interactive Dashboard: Charts and a detailed feature/metric breakdown (lazily loaded to keep the initial bundle small).
  • Explainable Output: A human-readable summary highlights which indicators (blink rate, facial jitter, audio patterns) contributed to the verdict.
  • Modern UI: Built with Next.js (App Router), React 19, TypeScript, and Tailwind CSS.

🏗️ Architecture

Notreally/
├── frontend/                 # Next.js React application
│   ├── src/
│   │   ├── app/              # App Router (layout.tsx, page.tsx, globals.css)
│   │   ├── components/       # FileUpload, LoadingProgress, AnalysisDashboard
│   │   └── types/            # analysis.ts (TypeScript types)
│   ├── next.config.ts
│   └── package.json
├── backend/                  # Flask Python API
│   ├── app.py                # Flask app, routes, background analysis worker
│   ├── db.py                 # SQLite job store (jobs table w/ progress + stage)
│   ├── feature_extractor.py  # MediaPipe + librosa + ffmpeg feature extraction
│   ├── predictor.py          # model.pkl loader (cached) + predict_proba
│   ├── train_model.py        # Trains the XGBoost model -> model.pkl
│   ├── model.pkl             # Trained classifier (checked in)
│   ├── notreally.db          # SQLite database
│   ├── requirements.txt
│   ├── Dockerfile
│   └── uploads/              # Uploaded video storage (created at runtime)
├── docker-compose.yml        # Containerized backend
├── DEPLOY.md                 # Docker deployment guide
├── render.yaml / coolify.yaml
└── README.md

🧰 Tech Stack

Frontend

  • Next.js 15.5.18 with the App Router (Turbopack dev/build)
  • React 19.1.0 / React DOM 19.1.0
  • TypeScript 5
  • Tailwind CSS 4 for styling
  • Recharts 3 for data visualization (lazy-loaded in the dashboard)
  • Lucide React for icons
  • Axios for API communication

Backend

  • Flask + Flask-CORS for the REST API
  • OpenCV (opencv-python-headless) for video frame decoding
  • MediaPipe (0.10.14) for facial landmark / Face Mesh analysis
  • Librosa + ffmpeg-python for audio (MFCC) extraction
  • XGBoost + scikit-learn for ML classification, loaded via joblib
  • NumPy (<2), pandas for numeric work
  • SQLite (Python stdlib sqlite3) for job/result storage — actually used, not planned
  • Gunicorn for production serving

🚀 Quick Start

Prerequisites

  • Node.js 18+ and npm
  • Python 3.8+
  • FFmpeg (required for audio extraction and metadata probing)

Backend Setup

cd backend
pip install -r requirements.txt
python app.py

The backend listens on the port from the PORT env var, defaulting to 52513 (see app.py). It binds to 0.0.0.0. The SQLite database and uploads directory are created automatically on startup, and the model is warmed into memory so the first request is fast.

Frontend Setup

cd frontend
npm install
npm run dev

The frontend defaults to talking to http://localhost:52513. To point it elsewhere, set NEXT_PUBLIC_API_BASE_URL (e.g. in frontend/.env.local).

The application will be available at:

Train the model (optional — a model.pkl is already checked in)

cd backend
# Quotes are needed for the space in the real dataset folder name
python train_model.py "DFD_original sequences" DFD_manipulated_sequences
# This (re)writes backend/model.pkl

📊 Analysis Pipeline

Analysis is asynchronous. POST /api/analyze saves the file, creates a job in SQLite, starts a background thread, and returns a job_id immediately. The client then polls GET /api/results/<job_id> (about once a second) for the live progress percentage and stage.

The background worker reports real progress as it proceeds through these stages:

  1. Upload — the file is saved to the uploads directory and a job row is inserted (stage: queued).
  2. Facial analysis (analyzing_video, ~5%→80%): OpenCV reads frames and MediaPipe Face Mesh extracts landmarks. It computes Eye Aspect Ratio for blink detection (smoothed, adaptive threshold, refractory period) and nose-displacement-based facial jitter. To keep long videos bounded, only every Nth frame is analyzed (frame_stride=5) and the number of analyzed frames is capped by NOTREALLY_MAX_FRAMES (default 600).
  3. Audio analysis (analyzing_audio, ~82%): optional — disabled by default. When enabled via NOTREALLY_ENABLE_AUDIO, librosa extracts MFCC mean/std (16 kHz mono, ffmpeg fallback).
  4. Metadata (reading_metadata, ~90%): ffmpeg probes container format, duration, bitrate, codecs, resolution, fps, and audio sample rate.
  5. Classification (finalizing, ~95%): features are assembled into a fixed 10-element vector and scored by the XGBoost model (predict_proba) to produce prob_real/prob_fake and an authenticity score.
  6. Done (completed, 100%): results (authenticity score, confidence, probabilities, legacy feature summary, human-readable verdict) are written to the job row.

🎯 Key Features

File Upload Component

  • Drag-and-drop interface with file type/size validation (100 MB max)
  • Real upload-progress reporting, then a switch to backend-driven analysis progress

Loading Progress (LoadingProgress.tsx)

  • Two-phase progress bar: a determinate upload phase driven by axios upload events, then an analyzing phase driven by the backend-reported percentage and stage label
  • Falls back to an indeterminate animation only before the first backend progress arrives

Analysis Dashboard

  • Lazy-loaded (next/dynamic, ssr: false) so the heavy Recharts dependency stays out of the initial bundle
  • Authenticity score with confidence, feature breakdown charts, metrics, and an explainable summary

🔌 API Endpoints

Method Path Description
POST /api/analyze Upload a video (multipart/form-data, field file). Returns immediately with { job_id, status, message } and runs analysis in a background thread.
GET /api/results/<job_id> Poll for status. Returns { job_id, status, results, created_at, filename, progress, stage }. status is processing / completed / failed.
GET /api/health Health check: { status: "healthy", ... }.
GET / Root/info endpoint listing available endpoints (used for platform health checks).

🗄️ Data Storage

Jobs and results are persisted in SQLite (notreally.db by default). The jobs table schema:

Column Type Notes
id TEXT (PK) Job UUID
filename TEXT Original upload name
filepath TEXT Saved path on disk
status TEXT processing / completed / failed
created_at TEXT ISO timestamp
results_json TEXT JSON-encoded results (null until done)
progress INTEGER Live completion 0–100
stage TEXT Current pipeline stage

init_db() creates the table and migrates older databases that predate the progress/stage columns. Connections use a 30s timeout so the analysis thread and polling requests don't collide on "database is locked".

🔧 Configuration (Environment Variables)

Backend

  • PORT — server port (default 52513)
  • NOTREALLY_DB_PATH — SQLite database path (default backend/notreally.db)
  • NOTREALLY_UPLOAD_DIR — upload directory (default uploads)
  • NOTREALLY_MAX_FRAMES — max frames analyzed per video (default 600)
  • NOTREALLY_ENABLE_AUDIO — set 1/true to enable audio analysis (off by default); legacy NOTREALLY_DISABLE_AUDIO is also honored
  • MODEL_PATH — path to the model pickle (default backend/model.pkl)

Frontend

  • NEXT_PUBLIC_API_BASE_URL — backend base URL (default http://localhost:52513)

🐳 Deployment

The backend ships with a Dockerfile and docker-compose.yml. See DEPLOY.md for full instructions. Quick version:

docker-compose up -d
curl http://localhost:52513/api/health

render.yaml and coolify.yaml are provided for those platforms.

🛠️ Development

Adding New Features

  1. Update types in frontend/src/types/
  2. Create/extend components in frontend/src/components/
  3. Add API endpoints in backend/app.py
  4. Extend the analysis pipeline in backend/feature_extractor.py and the feature vector / model as needed

📈 Future Enhancements

  • Real-time video streaming analysis
  • Batch processing for multiple files
  • User authentication and history
  • Advanced ML models (CNN, Transformer)
  • API rate limiting and security hardening
  • Cloud deployment automation

🤝 Contributing

This is a hackathon project, but contributions are welcome:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

📄 License

This project is created for educational and hackathon purposes.

About

First place lyra WIT Hackathon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors