Skip to content

PrathamGupta423/MAVERIK

Repository files navigation

MAVERIK

Multi-modal Analysis for Video Extraction, Retrieval, and Indexing with Knowledge

MAVERIK is an end-to-end open-source framework for processing, indexing, and intelligently querying long-form videos using multiple modalities:

  • Audio → Whisper (English Speach-to-Text), AI4Bharat Indic Conformer Model (Hindi Speech-to-Text) + Gemini 2.0 Flash (Translation)
  • Visual objects → YOLOv11m + ByteTrack + Blip
  • On-screen text → DeepSeek OCR
  • Visual captioning → Gemini-flash-2.0
  • Dense multi-modal vector database using ChromaDB
  • Agentic query engine (LangChain + Gemini) that plans, searches, evaluates, and refines until high-quality results are obtained

Running the Web App

  1. Initialize and activate Conda environment with Python 3.12.8
conda create --name maverick python=3.12.8
conda activate maverick
  1. Install dependencies:
pip install -r requirements.txt
  1. Create ./data directory for the Django database

  2. Run database migrations:

python manage.py migrate
  1. Create secrets.json for Gemini and HF API Keys
{
    "gemini_api_key": "AI__KEY__",
    "huggingface_api_key": "hf__KEY__"
}
  1. Install ffmpeg
# (On Ubuntu / Debian-based)
sudo apt install ffmpeg

Run Web App

Start both services:

./run_ui.sh

Open in browser:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5