A comprehensive AI-powered meeting intelligence system that converts video/audio meetings into searchable, actionable insights with video clip retrieval.
- Problem: Research papers use visual change detection, but meeting videos are static (talking heads)
- Solution: Audio-aware padding (start-2s to end+2s) prevents mid-word audio cutoff
- Implementation: FFmpeg-based clipping with smart padding heuristics
- Problem: Standard chunking loses context (e.g., "he said it's too high" without "budget" reference)
- Solution: Prepend metadata context
[Time: 00:01:23-00:02:45 | Speakers: Alice, Bob]to embeddings - Result: Queries like "what did they decide about the budget?" now match correctly
- Problem: LLMs hallucinate speaker attribution in long contexts
- Solution: Structured prompts with explicit speaker ID requirements
- Result: Action items correctly attributed to speakers
- Problem: Plain text responses are hard to scan and distinguish important information
- Solution: LLM uses markdown formatting; frontend renders rich text with highlighting
- Features:
- Bold for important terms, decisions, metrics (e.g.,
**$500K budget**) - Headers (# ## ###) for organization and hierarchy
- Bullet lists and numbered lists for clarity
Inline codefor technical terms- Automatic keyword highlighting (decisions, actions, deadlines, etc.)
- Proper spacing and visual hierarchy
- Bold for important terms, decisions, metrics (e.g.,
- ASR: WhisperX (CTranslate2-accelerated Whisper) with word-level alignment
- Speaker Diarization: pyannote.audio (speaker-diarization-3.1)
- Embeddings: sentence-transformers (all-MiniLM-L6-v2, 384-dim)
- Vector Search: FAISS (IndexFlatIP, cosine similarity)
- LLM: Anthropic Claude / OpenAI GPT / OpenRouter (configurable)
- Orchestration: LangChain + LangGraph
- Framework: FastAPI (async, high performance)
- Server: Uvicorn (ASGI)
- Language: Python 3.11+
- Video Processing: FFmpeg (audio extraction + video clipping)
- Vector DB: FAISS (persisted to disk, lazy-loaded)
- Chat Storage: SQLite (persistent chat history)
- Config: Pydantic Settings (
.envdriven) - Real-time: WebSocket (pipeline progress) + SSE (streaming chat)
- UI: React (Single Page Application)
- Styling: Tailwind CSS
- Communication: REST API + SSE streaming + WebSocket
- Components: UploadSection, JobList, JobView, PipelineBar, ChatTab, ClipCard
- Build: Create React App, served by FastAPI static mount
- Package Management: UV (fast, reliable)
- Environment: Virtual environments (.venv)
- Cross-platform: pathlib, subprocess (works on Windows/macOS/Linux)
- Python 3.11+
- FFmpeg (for video processing)
- UV package manager (recommended)
# Clone repository
git clone https://github.com/Pandharimaske/Meeting_Intelligence_Platform.git
cd Meeting_Intelligence_Platform
# Install dependencies (using UV - recommended)
uv sync
# Or using pip
pip install -e .# Copy and edit config
cp .env.example .env
# Edit .env with your API keys (optional)# Option 1: One-click startup (builds frontend + starts server)
bash scripts/start.sh
# Option 2: Manual start
uv run uvicorn backend.routes:app --host 0.0.0.0 --port 8000
# Open in browser
open http://localhost:8000# Upload a video
curl -X POST "http://localhost:8000/api/v1/upload" \
-F "file=@meeting.mp4"
# Ask a question and get AI-powered answer
curl -X POST "http://localhost:8000/api/v1/jobs/{job_id}/chat" \
-H "Content-Type: application/json" \
-d '{"question": "What were the key decisions?"}'
# Full API docs available at: http://localhost:8000/docsClip keywords: show, clip, play, video, segment, watch, see, display, footage, recording, playback, zoom, visual, etc.
Use case: Visual context, speaker emphasis, exact wording verification
UX Benefit: Clips appear inline in chat flow (not in separate panel) β keeps conversation context in one view
Example response to "What are the decisions and action items?"
## π Key Decisions
- **Decision:** Approved **$500K budget** for Q2 marketing [00:15:30]
- **Owner:** Sarah Chen (Marketing Lead)
## β
Action Items
1. **Owner:** John Smith - Set up customer feedback survey by **Friday 3/15** [00:18:45]
2. **Owner:** Lisa Wong - Schedule engineering review for timeline impact [00:19:20]
3. **Owner:** Ahmed Patel - Update stakeholders on budget allocation [00:21:05]
## π Important Notes
- Timeline is **critical path** for Q2 launch
- Customer feedback will drive feature prioritization
- Follow-up meeting scheduled for **Monday 9am**
Formatting Features:
- β Bold highlighting for important terms and metrics
- β Headers for section organization (## ### ####)
- β Bullet lists for key points
- β Numbered lists for sequential action items
- β Automatic keyword highlighting of: decisions, actions, deadlines, owners, critical terms
- β Inline code for technical terms and product names
- β Proper spacing for visual hierarchy and readability
Meeting_Intelligence_Platform/
βββ backend/
β βββ routes.py # FastAPI app, REST/WebSocket/SSE endpoints
β βββ chat_db.py # SQLite chat history persistence
β βββ core/
β βββ settings.py # Pydantic Settings (.env config)
βββ processing/
β βββ audio/
β β βββ extractor.py # FFmpeg audio extraction
β β βββ transcription/
β β βββ converter.py # WhisperX transcription + diarization
β βββ text/
β β βββ parser.py # SRT/VTT parser
β β βββ chunking/
β β β βββ chunker.py # Semantic chunking with metadata
β β βββ diarization/
β β βββ speaker_diarization.py # Standalone speaker ID
β βββ vector/
β β βββ store.py # FAISS vector store (build/search/persist)
β βββ reports/
β β βββ mom_generator.py # Template-based MoM
β β βββ rag_mom_generator.py # RAG-based MoM + chat engine
β βββ video/
β βββ clipper.py # FFmpeg video clipping
βββ frontend/
β βββ src/
β β βββ App.js # Main React app (tabs, chat, video)
β β βββ components/
β β β βββ UploadSection.js # Drag-and-drop upload
β β β βββ JobList.js # Sidebar job listing
β β β βββ JobView.js # Transcript/MoM/Chat tabs
β β β βββ ProgressOverlay.js # Pipeline progress bar
β β βββ hooks/
β β β βββ useWebSocket.js # WebSocket progress hook
β β βββ utils/
β β βββ api.js # REST + SSE API client
β βββ build/ # Production build (served by FastAPI)
βββ scripts/
β βββ run_server.py # Uvicorn launcher
β βββ start.sh # One-click startup script
βββ data/ # Runtime data (not committed)
β βββ jobs/jobs.json # Job registry
β βββ jobs/{id}/ # Per-job: chunks, MoM, vectors, transcripts
β βββ video/ # Uploaded videos
β βββ audio/ # Extracted audio
β βββ clips/ # Generated video clips
βββ docs/ # Documentation
βββ pyproject.toml # Dependencies & metadata
βββ .env # API keys & configuration
- β End-to-end pipeline: Video β Transcript β MoM β Chat β Clips
- β All research gaps addressed (2/4 fixed, 2 identified for future)
- β Production-ready error handling and logging
- Transcription: ~10x realtime on CPU (WhisperX base model)
- MoM Generation: <30 seconds for 1-hour meeting (RAG per-section)
- Video Clipping: <5 seconds for any segment (FFmpeg stream-copy)
- Search Latency: <100ms for FAISS semantic queries
- Chat Streaming: Token-by-token SSE delivery
- Job Reload: <1 second (persisted to disk)
- β Intuitive web interface
- β Real-time progress updates
- β Click-to-play video segments
- β Timestamped citations throughout
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Install in development mode
uv sync --dev
# Run tests
pytest
# Format code
black .
isort .This project is licensed under the MIT License - see the LICENSE file for details.
- Research Papers: AMMGS, AutoMeet, CLIP-It! for foundational concepts
- Open Source: Whisper, pyannote.audio, FAISS, sentence-transformers
- Community: FastAPI, UV, and the broader Python ecosystem
Built with β€οΈ for making meetings more productive and searchable.
Previously, refreshing the browser or restarting the server would lose all processing state, requiring users to re-upload and re-process meetings every time.
All jobs are now persisted to disk automatically:
data/jobs/
βββ jobs.json # Job database (auto-updated)
βββ {job_id}/
β βββ {video|audio|transcript} # Original file
β βββ chunks.json # Semantic chunks
β βββ vector_store/ # FAISS index (persisted)
β β βββ index.faiss # Vector index
β β βββ chunks.pkl # Chunk metadata
β β βββ meta.json # Model info
β βββ transcripts/ # JSON + TXT transcripts
β βββ mom.json # Minutes of Meeting
- Auto-Save After Each Step - Jobs database (jobs.json) is saved after each pipeline step
- Vector Store Persistence - FAISS indexes are serialized to disk when building completes
- On-Demand Loading - When you open a previous job, vector store is loaded from disk automatically
- Zero Manual Action - No cache management needed; works transparently
Before:
- Upload video β wait 5 minutes processing β refresh β LOST! Start over
After:
- Upload video β wait 5 minutes β refresh β jobs still in sidebar
- Click previous job β instantly loads (from cache, no re-processing)
- β First load of job: Full processing (~5-10 min for 1 hour video)
- β Reload after refresh: Instant (<1 second to load from cache)
- β Switching between jobs: Instant (vector store lazy-loaded from disk)
- β Disk usage: ~100-300 MB per hour of video including compressed FAISS index
Built with β€οΈ for making meetings more productive and searchable.