An advanced AI-powered speech analysis system that provides real-time personality insights and communication feedback using local LLMs, RAG, and multi-agent AI architecture.
- Overview
- Features
- Architecture
- System Requirements
- Quick Start
- Installation
- Usage
- Project Structure
- API Documentation
- Configuration
- Development
- Testing & Evaluation
- Troubleshooting
- Contributing
- License
TEAM-5 is a comprehensive speech analysis pipeline that combines state-of-the-art speech processing, natural language understanding, and multi-agent AI systems to provide detailed personality insights and communication analysis. The system uses local LLMs (via Ollama) and Retrieval-Augmented Generation (RAG) to deliver personalized, actionable feedback.
- Real-time Speech Analysis: Process audio through advanced speech-to-text and acoustic feature extraction
- Multi-Agent AI System: Specialized agents for communication, confidence, and personality analysis
- RAG-Enhanced Reports: Knowledge-augmented insights using vector database retrieval
- Quality Assurance: Built-in evaluation framework using LangChain evaluators
- Full-Stack Solution: FastAPI backend + React frontend for seamless user experience
- π€ Audio Recording & Processing: Multi-format audio support with noise reduction
- π Speech-to-Text: High-accuracy transcription using Faster-Whisper
- π Acoustic Analysis: Comprehensive feature extraction (pitch, energy, pauses, speech rate)
- π§ Multi-Agent AI: Specialized agents for different analysis aspects
- π RAG System: ChromaDB-powered knowledge retrieval
- π‘οΈ Guardrails AI: Input/output validation and safety checks
- π Evaluation Framework: Quality assessment using LangChain evaluators
- π REST API: FastAPI-powered endpoints for frontend integration
- β‘ React + TypeScript: Modern, type-safe UI development
- π¨ Responsive Design: Works on desktop and mobile devices
- π€ File Upload: Support for various audio formats
- π Results Visualization: Interactive display of analysis results
- β±οΈ Real-time Feedback: Instant processing status updates
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend β
β (React + TypeScript) β
β Vite Dev Server / Build β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β HTTP/REST API
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β Backend API β
β (FastAPI + Uvicorn) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Speech Processing Pipeline β
β ββββββββββββ βββββββββββββ ββββββββββββ ββββββββββββββ β
β β Audio ββ β Speech ββ β Feature ββ β Multi- β β
β β Recordingβ β to Text β βExtractionβ βAgent Systemβ β
β ββββββββββββ βββββββββββββ ββββββββββββ βββββββ¬βββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββ β
β β RAG System (ChromaDB + Ollama) β β
β β - Communication Knowledge - Confidence Psychology β β
β β - Personality Traits - Improvement Tips β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Evaluation & Guardrails β β
β β - LangChain Evaluators - Input/Output Validation β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Ollama β
β Local LLM β
β (mistral) β
ββββββββββββββββ
- Speech Processing: Audio recording, preprocessing, and transcription
- Feature Extraction: Acoustic analysis (openSMILE, PyAnnote)
- Multi-Agent System:
- Communication Agent: Analyzes clarity, fluency, structure
- Confidence Agent: Evaluates vocal confidence and emotional tone
- Personality Agent: Maps communication patterns to personality traits
- RAG System: Retrieves relevant expert knowledge for enhanced insights
- Report Generation: LLM-powered personalized feedback reports
- Evaluation Framework: Quality assessment and validation
- OS: Linux, macOS, or Windows 10+
- Python: 3.8 or higher
- RAM: 8GB (16GB recommended)
- Storage: ~5GB for models and dependencies
- Node.js: 18.x or higher (for frontend)
- Microphone: Required for audio recording
- RAM: 16GB or more
- GPU: CUDA-compatible GPU (optional, for faster inference)
- Storage: SSD with 10GB+ free space
# Clone the repository
git clone https://github.com/GSMPRANEETH/TEAM-5.git
cd TEAM-5
# Backend setup
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Install and start Ollama
ollama pull mistral
# Start backend API
uvicorn api:app --reload --port 8000
# In a new terminal - Frontend setup
cd ../frontend
npm install
npm run devcd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py # Run standalone pipelinegit clone https://github.com/GSMPRANEETH/TEAM-5.git
cd TEAM-5cd backend
python -m venv venv
# Activate virtual environment
# Linux/macOS:
source venv/bin/activate
# Windows:
venv\Scripts\activatepip install -r requirements.txtOllama provides local LLM inference.
Linux/macOS:
curl -fsSL https://ollama.ai/install.sh | shWindows:
- Download from ollama.ai/download
- Run the installer
- Add Ollama to PATH:
- Open "Edit environment variables for your account"
- Add
C:\Users\%USERNAME%\AppData\Local\Programs\Ollamato PATH - Restart terminal
Verify Installation:
ollama --versionollama pull mistralcd ../frontend
npm installEdit backend configuration files as needed:
backend/llm1/llm_config.py- LLM settings (model, temperature, max tokens)backend/rag/config.py- RAG system configurationbackend/evals/eval_config.py- Evaluation criteria
cd backend
source venv/bin/activate
uvicorn api:app --reload --port 8000API will be available at: http://localhost:8000
API docs: http://localhost:8000/docs
cd frontend
npm run devFrontend will be available at: http://localhost:5173
cd backend
python main.pyThis will:
- Record 45 seconds of audio
- Process and analyze speech
- Generate comprehensive report
- Display results in terminal
import requests
# Upload audio file
with open("audio.wav", "rb") as f:
response = requests.post(
"http://localhost:8000/analyze",
files={"file": f}
)
result = response.json()
print(result)cd backend
# Test LLM connection
python test_llm_step5.py
# Test RAG system
python test_rag.py
# Run evaluations
python -m evals.test_evalsTEAM-5/
βββ backend/ # Python backend
β βββ api.py # FastAPI application
β βββ main.py # Standalone pipeline
β βββ link.py # Pipeline orchestration
β βββ requirements.txt # Python dependencies
β βββ README.md # Backend documentation
β β
β βββ agents/ # Multi-agent system
β β βββ communication_agent.py
β β βββ confidence_agent.py
β β βββ personality_agent.py
β β
β βββ llm/ # LLM wrapper (agents)
β β βββ local_llm.py
β β
β βββ llm1/ # LLM config & reporting
β β βββ llm_config.py
β β βββ local_llm.py
β β βββ prompt_templates.py
β β βββ report_generator.py
β β
β βββ rag/ # RAG system
β β βββ config.py
β β βββ retriever.py
β β βββ knowledge_base.py
β β βββ rag_pipeline.py
β β βββ documents/
β β
β βββ evals/ # Evaluation framework
β β βββ eval_config.py
β β βββ eval_runner.py
β β βββ eval_refinement.py
β β βββ test_evals.py
β β
β βββ utils/ # Utilities
β β βββ parser.py
β β βββ feature_scoring.py
β β
β βββ speech_to_text.py # Whisper transcription
β βββ speech_features.py # Acoustic analysis
β βββ record_audio.py # Audio recording
β βββ preprocess_audio.py # Audio preprocessing
β βββ guardrails_config.py # Safety & validation
β βββ agent.py # Agent orchestrator
β
βββ frontend/ # React frontend
β βββ src/ # Source code
β β βββ App.tsx
β β βββ components/
β βββ public/ # Static assets
β βββ package.json
β βββ vite.config.ts
β βββ README.md
β
βββ README.md # This file
Analyze uploaded audio file.
Request:
POST /analyze HTTP/1.1
Content-Type: multipart/form-data
file: <audio_file>Response:
{
"transcript": "...",
"audio_features": { ... },
"communication_analysis": { ... },
"confidence_emotion_analysis": { ... },
"personality_analysis": { ... },
"final_report": "..."
}When the backend is running, visit:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
LLM_MODEL_NAME = "mistral" # Change model
TEMPERATURE = 0.3 # Creativity (0.0-1.0)
MAX_TOKENS = 512 # Max response lengthCHROMA_PERSIST_DIR = "./chroma_db" # Vector DB storage
TOP_K_RESULTS = 3 # Documents to retrieveDURATION = 45 # Recording duration (seconds)
SAMPLE_RATE = 16000 # Required for Whisper
CHANNELS = 1 # Mono audiocd backend
source venv/bin/activate
# Run with auto-reload
uvicorn api:app --reload
# Run tests
python -m pytest
# Format code
black .
flake8 .cd frontend
# Development server
npm run dev
# Build for production
npm run build
# Preview production build
npm run preview
# Lint
npm run lintThe system includes a comprehensive evaluation framework:
cd backend
python -m evals.test_evalsEvaluation Criteria:
- Helpfulness, Relevance, Coherence
- Actionability, Specificity, Accuracy
- Completeness, Constructiveness
# Test LLM connection
python test_llm_step5.py
# Test RAG retrieval
python test_rag.py
# Test full pipeline
python main.pyError: Ollama not available
Solution:
- Ensure Ollama is running:
ollama serve - Check model is pulled:
ollama list - Pull model if needed:
ollama pull mistral - Windows: Verify Ollama is in PATH
- Linux/macOS: Check
which ollama
Solution:
# Ensure virtual environment is activated
source venv/bin/activate # or venv\Scripts\activate
# Reinstall dependencies
pip install -r requirements.txtSolution:
- Use smaller LLM model
- Reduce
MAX_TOKENSin configuration - Close other applications
- Upgrade to 16GB+ RAM
Solution:
# Clear database
rm -rf backend/chroma_db/
# Restart backendContributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 for Python code
- Use TypeScript for frontend code
- Add tests for new features
- Update documentation
This project is licensed under the MIT License - see the LICENSE file for details.
- Faster-Whisper: OpenAI Whisper implementation
- Ollama: Local LLM runtime
- LangChain: LLM application framework
- ChromaDB: Vector database for AI
- FastAPI: Modern Python web framework
- React: UI library
For issues and questions:
- Open an issue on GitHub
- Contact: [Project Team]
- Support for additional LLM providers
- Multi-language support
- Real-time streaming analysis
- Advanced visualization dashboards
- Mobile app
- Docker containerization
- Cloud deployment guide
Built with β€οΈ by TEAM-5 (chatgpt)[https://chatgpt.com/g/g-p-693cf63b3d608191a200ca21f1c5f7e2-tts/project] (perplexity)[https://www.perplexity.ai/spaces/tts-1gnsM.HoSV.NHB6HbWS31g#0]