🎙️ TEAM-5 Speech Analysis Pipeline

An advanced AI-powered speech analysis system that provides real-time personality insights and communication feedback using local LLMs, RAG, and multi-agent AI architecture.

📋 Table of Contents

Overview
Features
Architecture
System Requirements
Quick Start
Installation
Usage
Project Structure
API Documentation
Configuration
Development
Testing & Evaluation
Troubleshooting
Contributing
License

🌟 Overview

TEAM-5 is a comprehensive speech analysis pipeline that combines state-of-the-art speech processing, natural language understanding, and multi-agent AI systems to provide detailed personality insights and communication analysis. The system uses local LLMs (via Ollama) and Retrieval-Augmented Generation (RAG) to deliver personalized, actionable feedback.

Key Capabilities

Real-time Speech Analysis: Process audio through advanced speech-to-text and acoustic feature extraction
Multi-Agent AI System: Specialized agents for communication, confidence, and personality analysis
RAG-Enhanced Reports: Knowledge-augmented insights using vector database retrieval
Quality Assurance: Built-in evaluation framework using LangChain evaluators
Full-Stack Solution: FastAPI backend + React frontend for seamless user experience

✨ Features

Backend Features

🎤 Audio Recording & Processing: Multi-format audio support with noise reduction
📝 Speech-to-Text: High-accuracy transcription using Faster-Whisper
📊 Acoustic Analysis: Comprehensive feature extraction (pitch, energy, pauses, speech rate)
🧠 Multi-Agent AI: Specialized agents for different analysis aspects
🔍 RAG System: ChromaDB-powered knowledge retrieval
🛡️ Guardrails AI: Input/output validation and safety checks
📈 Evaluation Framework: Quality assessment using LangChain evaluators
🚀 REST API: FastAPI-powered endpoints for frontend integration

Frontend Features

⚡ React + TypeScript: Modern, type-safe UI development
🎨 Responsive Design: Works on desktop and mobile devices
📤 File Upload: Support for various audio formats
📊 Results Visualization: Interactive display of analysis results
⏱️ Real-time Feedback: Instant processing status updates

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                         Frontend                             │
│                    (React + TypeScript)                      │
│                 Vite Dev Server / Build                      │
└────────────────────────┬────────────────────────────────────┘
                         │ HTTP/REST API
                         │
┌────────────────────────▼────────────────────────────────────┐
│                      Backend API                             │
│                     (FastAPI + Uvicorn)                      │
├──────────────────────────────────────────────────────────────┤
│                    Speech Processing Pipeline                │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌────────────┐ │
│  │  Audio   │→ │  Speech   │→ │  Feature │→ │   Multi-   │ │
│  │ Recording│  │  to Text  │  │Extraction│  │Agent System│ │
│  └──────────┘  └───────────┘  └──────────┘  └─────┬──────┘ │
│                                                      │        │
│  ┌──────────────────────────────────────────────────▼──────┐ │
│  │            RAG System (ChromaDB + Ollama)              │ │
│  │  - Communication Knowledge    - Confidence Psychology  │ │
│  │  - Personality Traits         - Improvement Tips       │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                               │
│  ┌────────────────────────────────────────────────────────┐ │
│  │           Evaluation & Guardrails                      │ │
│  │  - LangChain Evaluators  - Input/Output Validation    │ │
│  └────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
                         │
                         ▼
                  ┌──────────────┐
                  │    Ollama    │
                  │ Local LLM    │
                  │  (mistral)   │
                  └──────────────┘

Component Breakdown

Speech Processing: Audio recording, preprocessing, and transcription
Feature Extraction: Acoustic analysis (openSMILE, PyAnnote)
Multi-Agent System:
- Communication Agent: Analyzes clarity, fluency, structure
- Confidence Agent: Evaluates vocal confidence and emotional tone
- Personality Agent: Maps communication patterns to personality traits
RAG System: Retrieves relevant expert knowledge for enhanced insights
Report Generation: LLM-powered personalized feedback reports
Evaluation Framework: Quality assessment and validation

💻 System Requirements

Minimum Requirements

OS: Linux, macOS, or Windows 10+
Python: 3.8 or higher
RAM: 8GB (16GB recommended)
Storage: ~5GB for models and dependencies
Node.js: 18.x or higher (for frontend)
Microphone: Required for audio recording

Recommended Requirements

RAM: 16GB or more
GPU: CUDA-compatible GPU (optional, for faster inference)
Storage: SSD with 10GB+ free space

🚀 Quick Start

Option 1: Full Stack Development

# Clone the repository
git clone https://github.com/GSMPRANEETH/TEAM-5.git
cd TEAM-5

# Backend setup
cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Install and start Ollama
ollama pull mistral

# Start backend API
uvicorn api:app --reload --port 8000

# In a new terminal - Frontend setup
cd ../frontend
npm install
npm run dev

Option 2: Backend Only

cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py  # Run standalone pipeline

📦 Installation

1. Clone Repository

git clone https://github.com/GSMPRANEETH/TEAM-5.git
cd TEAM-5

2. Backend Setup

Create Virtual Environment

cd backend
python -m venv venv

# Activate virtual environment
# Linux/macOS:
source venv/bin/activate
# Windows:
venv\Scripts\activate

Install Dependencies

pip install -r requirements.txt

3. Install Ollama (Required for LLM)

Ollama provides local LLM inference.

Linux/macOS:

curl -fsSL https://ollama.ai/install.sh | sh

Windows:

Download from ollama.ai/download
Run the installer
Add Ollama to PATH:
- Open "Edit environment variables for your account"
- Add C:\Users\%USERNAME%\AppData\Local\Programs\Ollama to PATH
- Restart terminal

Verify Installation:

ollama --version

Pull LLM Model

ollama pull mistral

4. Frontend Setup (Optional)

cd ../frontend
npm install

5. Configuration

Edit backend configuration files as needed:

backend/llm1/llm_config.py - LLM settings (model, temperature, max tokens)
backend/rag/config.py - RAG system configuration
backend/evals/eval_config.py - Evaluation criteria

🎯 Usage

Running the Full Stack

Start Backend API

cd backend
source venv/bin/activate
uvicorn api:app --reload --port 8000

API will be available at: http://localhost:8000 API docs: http://localhost:8000/docs

Start Frontend

cd frontend
npm run dev

Frontend will be available at: http://localhost:5173

Running Backend Standalone

cd backend
python main.py

This will:

Record 45 seconds of audio
Process and analyze speech
Generate comprehensive report
Display results in terminal

Using the API

import requests

# Upload audio file
with open("audio.wav", "rb") as f:
    response = requests.post(
        "http://localhost:8000/analyze",
        files={"file": f}
    )

result = response.json()
print(result)

Running Tests

cd backend

# Test LLM connection
python test_llm_step5.py

# Test RAG system
python test_rag.py

# Run evaluations
python -m evals.test_evals

📁 Project Structure

TEAM-5/
├── backend/                      # Python backend
│   ├── api.py                   # FastAPI application
│   ├── main.py                  # Standalone pipeline
│   ├── link.py                  # Pipeline orchestration
│   ├── requirements.txt         # Python dependencies
│   ├── README.md                # Backend documentation
│   │
│   ├── agents/                  # Multi-agent system
│   │   ├── communication_agent.py
│   │   ├── confidence_agent.py
│   │   └── personality_agent.py
│   │
│   ├── llm/                     # LLM wrapper (agents)
│   │   └── local_llm.py
│   │
│   ├── llm1/                    # LLM config & reporting
│   │   ├── llm_config.py
│   │   ├── local_llm.py
│   │   ├── prompt_templates.py
│   │   └── report_generator.py
│   │
│   ├── rag/                     # RAG system
│   │   ├── config.py
│   │   ├── retriever.py
│   │   ├── knowledge_base.py
│   │   ├── rag_pipeline.py
│   │   └── documents/
│   │
│   ├── evals/                   # Evaluation framework
│   │   ├── eval_config.py
│   │   ├── eval_runner.py
│   │   ├── eval_refinement.py
│   │   └── test_evals.py
│   │
│   ├── utils/                   # Utilities
│   │   ├── parser.py
│   │   └── feature_scoring.py
│   │
│   ├── speech_to_text.py        # Whisper transcription
│   ├── speech_features.py       # Acoustic analysis
│   ├── record_audio.py          # Audio recording
│   ├── preprocess_audio.py      # Audio preprocessing
│   ├── guardrails_config.py     # Safety & validation
│   └── agent.py                 # Agent orchestrator
│
├── frontend/                     # React frontend
│   ├── src/                     # Source code
│   │   ├── App.tsx
│   │   └── components/
│   ├── public/                  # Static assets
│   ├── package.json
│   ├── vite.config.ts
│   └── README.md
│
└── README.md                     # This file

📚 API Documentation

Endpoints

`POST /analyze`

Analyze uploaded audio file.

Request:

POST /analyze HTTP/1.1
Content-Type: multipart/form-data

file: <audio_file>

Response:

{
  "transcript": "...",
  "audio_features": { ... },
  "communication_analysis": { ... },
  "confidence_emotion_analysis": { ... },
  "personality_analysis": { ... },
  "final_report": "..."
}

Interactive API Docs

When the backend is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

⚙️ Configuration

LLM Configuration (`backend/llm1/llm_config.py`)

LLM_MODEL_NAME = "mistral"  # Change model
TEMPERATURE = 0.3            # Creativity (0.0-1.0)
MAX_TOKENS = 512            # Max response length

RAG Configuration (`backend/rag/config.py`)

CHROMA_PERSIST_DIR = "./chroma_db"  # Vector DB storage
TOP_K_RESULTS = 3                    # Documents to retrieve

Recording Configuration (`backend/main.py`)

DURATION = 45        # Recording duration (seconds)
SAMPLE_RATE = 16000  # Required for Whisper
CHANNELS = 1         # Mono audio

🛠️ Development

Backend Development

cd backend
source venv/bin/activate

# Run with auto-reload
uvicorn api:app --reload

# Run tests
python -m pytest

# Format code
black .
flake8 .

Frontend Development

cd frontend

# Development server
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

# Lint
npm run lint

🧪 Testing & Evaluation

Built-in Evaluations

The system includes a comprehensive evaluation framework:

cd backend
python -m evals.test_evals

Evaluation Criteria:

Helpfulness, Relevance, Coherence
Actionability, Specificity, Accuracy
Completeness, Constructiveness

Manual Testing

# Test LLM connection
python test_llm_step5.py

# Test RAG retrieval
python test_rag.py

# Test full pipeline
python main.py

🔧 Troubleshooting

Ollama Connection Issues

Error: Ollama not available

Solution:

Ensure Ollama is running: ollama serve
Check model is pulled: ollama list
Pull model if needed: ollama pull mistral
Windows: Verify Ollama is in PATH
Linux/macOS: Check which ollama

Import Errors

Solution:

# Ensure virtual environment is activated
source venv/bin/activate  # or venv\Scripts\activate

# Reinstall dependencies
pip install -r requirements.txt

Memory Issues

Solution:

Use smaller LLM model
Reduce MAX_TOKENS in configuration
Close other applications
Upgrade to 16GB+ RAM

ChromaDB Issues

Solution:

# Clear database
rm -rf backend/chroma_db/

# Restart backend

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow PEP 8 for Python code
Use TypeScript for frontend code
Add tests for new features
Update documentation

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Faster-Whisper: OpenAI Whisper implementation
Ollama: Local LLM runtime
LangChain: LLM application framework
ChromaDB: Vector database for AI
FastAPI: Modern Python web framework
React: UI library

📞 Support

For issues and questions:

Open an issue on GitHub
Contact: [Project Team]

🗺️ Roadmap

Built with ❤️ by TEAM-5 (chatgpt)[https://chatgpt.com/g/g-p-693cf63b3d608191a200ca21f1c5f7e2-tts/project] (perplexity)[https://www.perplexity.ai/spaces/tts-1gnsM.HoSV.NHB6HbWS31g#0]

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
backend		backend
frontend		frontend
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🎙️ TEAM-5 Speech Analysis Pipeline

📋 Table of Contents

🌟 Overview

Key Capabilities

✨ Features

Backend Features

Frontend Features

🏗️ Architecture

Component Breakdown

💻 System Requirements

Minimum Requirements

Recommended Requirements

🚀 Quick Start

Option 1: Full Stack Development

Option 2: Backend Only

📦 Installation

1. Clone Repository

2. Backend Setup

Create Virtual Environment

Install Dependencies

3. Install Ollama (Required for LLM)

Pull LLM Model

4. Frontend Setup (Optional)

5. Configuration

🎯 Usage

Running the Full Stack

Start Backend API

Start Frontend

Running Backend Standalone

Using the API

Running Tests

📁 Project Structure

📚 API Documentation

Endpoints

POST /analyze

Interactive API Docs

⚙️ Configuration

LLM Configuration (backend/llm1/llm_config.py)

RAG Configuration (backend/rag/config.py)

Recording Configuration (backend/main.py)

🛠️ Development

Backend Development

Frontend Development

🧪 Testing & Evaluation

Built-in Evaluations

Manual Testing

🔧 Troubleshooting

Ollama Connection Issues

Import Errors

Memory Issues

ChromaDB Issues

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

📞 Support

🗺️ Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

`POST /analyze`

LLM Configuration (`backend/llm1/llm_config.py`)

RAG Configuration (`backend/rag/config.py`)

Recording Configuration (`backend/main.py`)

Packages