Project Snapshot – A multi-agent AI system that ingests video content and produces concise, meaningful summaries. Ideal for educational, corporate, or content-creator use-cases.
- Why This Project
- Features
- Architecture & Agents
- Quick Start
- Usage
- Configuration & Customize
- Project Roadmap
- Contributing
- License
- Credits
In an era overloaded with visual and video content, there’s a growing need to extract key insights quickly.
This project addresses that by leveraging multiple intelligent agents—each specialized in tasks like transcription, topic detection, summarization and delivery—making it easier to consume long-form videos in minimal time.
- Automatically transcribe video or audio into text
- Identify structure: chapters, scenes, themes
- Generate human-readable summaries (bullets, paragraphs or key-points)
- Optionally, generate follow-up lists like “Further reading”, “Action Items”, or “Questions to Ask”
- Modular architecture makes adding support for new media types (e.g., podcasts) straightforward
- 🎥 Video-to-Text: Transcribes spoken words using state-of-the-art speech recognition
- 🧭 Topic & Scene Segmentation: Divides the video into logical segments
- ✍️ Summarization Agent: Produces a concise summary of each segment and the overall video
- 📝 Action Item Generator: Optional output of questions, insights or tasks for the viewer
- 🔧 Plug-and-Play: Build your own agents (e.g., for translation, different languages, custom summary style)
- 🧩 File Format Agnostic: Supports MP4, MKV, WebM, MP3 (with extension)
- 📊 Extensible Reporting: Export summaries as .txt, .md, or JSON for integration with other tools
- Transcription Agent – handles audio extraction & speech-to-text
- Segmentation Agent – chunks the transcript into meaningful parts
- Summarization Agent – fine-tunes the summary length/style
- Action-Item Agent (optional) – produces actionable items or questions from content
- Python 3.8 or higher
- ffmpeg (for audio/video processing)
- Access to speech-recognition model (please set your API keys if required)
git clone https://github.com/akash8190/Video-Summarizer-Using-MultiAgent.git
cd Video-Summarizer-Using-MultiAgent
pip install -r requirements.txt
---
## 🏗 Architecture & Agents
### Agent Breakdown:
- **Transcription Agent** – handles audio extraction & speech-to-text
- **Segmentation Agent** – chunks the transcript into meaningful parts
- **Summarization Agent** – fine-tunes the summary length/style
- **Action-Item Agent** (optional) – produces actionable items or questions from content
---
## 🧰 Quick Start
### Prerequisites
- Python 3.8 or higher
- ffmpeg (for audio/video processing)
- Access to speech-recognition model (please set your API keys if required)
### Installation
```bash
git clone https://github.com/akash8190/Video-Summarizer-Using-MultiAgent.git
cd Video-Summarizer-Using-MultiAgent
pip install -r requirements.txt
## CLI Options:
--input : path to the video/audio file
--output : path to the summary file
--style : paragraph, bulleted, or keypoints
(Optional) --action-items : if you want a follow-up list