Skip to content

dinesh9997/TEAM-05

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ TEAM-5 Speech Analysis Pipeline

Python 3.8+ FastAPI React License

An advanced AI-powered speech analysis system that provides real-time personality insights and communication feedback using local LLMs, RAG, and multi-agent AI architecture.

πŸ“‹ Table of Contents

🌟 Overview

TEAM-5 is a comprehensive speech analysis pipeline that combines state-of-the-art speech processing, natural language understanding, and multi-agent AI systems to provide detailed personality insights and communication analysis. The system uses local LLMs (via Ollama) and Retrieval-Augmented Generation (RAG) to deliver personalized, actionable feedback.

Key Capabilities

  • Real-time Speech Analysis: Process audio through advanced speech-to-text and acoustic feature extraction
  • Multi-Agent AI System: Specialized agents for communication, confidence, and personality analysis
  • RAG-Enhanced Reports: Knowledge-augmented insights using vector database retrieval
  • Quality Assurance: Built-in evaluation framework using LangChain evaluators
  • Full-Stack Solution: FastAPI backend + React frontend for seamless user experience

✨ Features

Backend Features

  • 🎀 Audio Recording & Processing: Multi-format audio support with noise reduction
  • πŸ“ Speech-to-Text: High-accuracy transcription using Faster-Whisper
  • πŸ“Š Acoustic Analysis: Comprehensive feature extraction (pitch, energy, pauses, speech rate)
  • 🧠 Multi-Agent AI: Specialized agents for different analysis aspects
  • πŸ” RAG System: ChromaDB-powered knowledge retrieval
  • πŸ›‘οΈ Guardrails AI: Input/output validation and safety checks
  • πŸ“ˆ Evaluation Framework: Quality assessment using LangChain evaluators
  • πŸš€ REST API: FastAPI-powered endpoints for frontend integration

Frontend Features

  • ⚑ React + TypeScript: Modern, type-safe UI development
  • 🎨 Responsive Design: Works on desktop and mobile devices
  • πŸ“€ File Upload: Support for various audio formats
  • πŸ“Š Results Visualization: Interactive display of analysis results
  • ⏱️ Real-time Feedback: Instant processing status updates

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Frontend                             β”‚
β”‚                    (React + TypeScript)                      β”‚
β”‚                 Vite Dev Server / Build                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ HTTP/REST API
                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Backend API                             β”‚
β”‚                     (FastAPI + Uvicorn)                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                    Speech Processing Pipeline                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Audio   β”‚β†’ β”‚  Speech   β”‚β†’ β”‚  Feature β”‚β†’ β”‚   Multi-   β”‚ β”‚
β”‚  β”‚ Recordingβ”‚  β”‚  to Text  β”‚  β”‚Extractionβ”‚  β”‚Agent Systemβ”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                      β”‚        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚            RAG System (ChromaDB + Ollama)              β”‚ β”‚
β”‚  β”‚  - Communication Knowledge    - Confidence Psychology  β”‚ β”‚
β”‚  β”‚  - Personality Traits         - Improvement Tips       β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚           Evaluation & Guardrails                      β”‚ β”‚
β”‚  β”‚  - LangChain Evaluators  - Input/Output Validation    β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚    Ollama    β”‚
                  β”‚ Local LLM    β”‚
                  β”‚  (mistral)   β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Breakdown

  1. Speech Processing: Audio recording, preprocessing, and transcription
  2. Feature Extraction: Acoustic analysis (openSMILE, PyAnnote)
  3. Multi-Agent System:
    • Communication Agent: Analyzes clarity, fluency, structure
    • Confidence Agent: Evaluates vocal confidence and emotional tone
    • Personality Agent: Maps communication patterns to personality traits
  4. RAG System: Retrieves relevant expert knowledge for enhanced insights
  5. Report Generation: LLM-powered personalized feedback reports
  6. Evaluation Framework: Quality assessment and validation

πŸ’» System Requirements

Minimum Requirements

  • OS: Linux, macOS, or Windows 10+
  • Python: 3.8 or higher
  • RAM: 8GB (16GB recommended)
  • Storage: ~5GB for models and dependencies
  • Node.js: 18.x or higher (for frontend)
  • Microphone: Required for audio recording

Recommended Requirements

  • RAM: 16GB or more
  • GPU: CUDA-compatible GPU (optional, for faster inference)
  • Storage: SSD with 10GB+ free space

πŸš€ Quick Start

Option 1: Full Stack Development

# Clone the repository
git clone https://github.com/GSMPRANEETH/TEAM-5.git
cd TEAM-5

# Backend setup
cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Install and start Ollama
ollama pull mistral

# Start backend API
uvicorn api:app --reload --port 8000

# In a new terminal - Frontend setup
cd ../frontend
npm install
npm run dev

Option 2: Backend Only

cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py  # Run standalone pipeline

πŸ“¦ Installation

1. Clone Repository

git clone https://github.com/GSMPRANEETH/TEAM-5.git
cd TEAM-5

2. Backend Setup

Create Virtual Environment

cd backend
python -m venv venv

# Activate virtual environment
# Linux/macOS:
source venv/bin/activate
# Windows:
venv\Scripts\activate

Install Dependencies

pip install -r requirements.txt

3. Install Ollama (Required for LLM)

Ollama provides local LLM inference.

Linux/macOS:

curl -fsSL https://ollama.ai/install.sh | sh

Windows:

  1. Download from ollama.ai/download
  2. Run the installer
  3. Add Ollama to PATH:
    • Open "Edit environment variables for your account"
    • Add C:\Users\%USERNAME%\AppData\Local\Programs\Ollama to PATH
    • Restart terminal

Verify Installation:

ollama --version

Pull LLM Model

ollama pull mistral

4. Frontend Setup (Optional)

cd ../frontend
npm install

5. Configuration

Edit backend configuration files as needed:

  • backend/llm1/llm_config.py - LLM settings (model, temperature, max tokens)
  • backend/rag/config.py - RAG system configuration
  • backend/evals/eval_config.py - Evaluation criteria

🎯 Usage

Running the Full Stack

Start Backend API

cd backend
source venv/bin/activate
uvicorn api:app --reload --port 8000

API will be available at: http://localhost:8000 API docs: http://localhost:8000/docs

Start Frontend

cd frontend
npm run dev

Frontend will be available at: http://localhost:5173

Running Backend Standalone

cd backend
python main.py

This will:

  1. Record 45 seconds of audio
  2. Process and analyze speech
  3. Generate comprehensive report
  4. Display results in terminal

Using the API

import requests

# Upload audio file
with open("audio.wav", "rb") as f:
    response = requests.post(
        "http://localhost:8000/analyze",
        files={"file": f}
    )

result = response.json()
print(result)

Running Tests

cd backend

# Test LLM connection
python test_llm_step5.py

# Test RAG system
python test_rag.py

# Run evaluations
python -m evals.test_evals

πŸ“ Project Structure

TEAM-5/
β”œβ”€β”€ backend/                      # Python backend
β”‚   β”œβ”€β”€ api.py                   # FastAPI application
β”‚   β”œβ”€β”€ main.py                  # Standalone pipeline
β”‚   β”œβ”€β”€ link.py                  # Pipeline orchestration
β”‚   β”œβ”€β”€ requirements.txt         # Python dependencies
β”‚   β”œβ”€β”€ README.md                # Backend documentation
β”‚   β”‚
β”‚   β”œβ”€β”€ agents/                  # Multi-agent system
β”‚   β”‚   β”œβ”€β”€ communication_agent.py
β”‚   β”‚   β”œβ”€β”€ confidence_agent.py
β”‚   β”‚   └── personality_agent.py
β”‚   β”‚
β”‚   β”œβ”€β”€ llm/                     # LLM wrapper (agents)
β”‚   β”‚   └── local_llm.py
β”‚   β”‚
β”‚   β”œβ”€β”€ llm1/                    # LLM config & reporting
β”‚   β”‚   β”œβ”€β”€ llm_config.py
β”‚   β”‚   β”œβ”€β”€ local_llm.py
β”‚   β”‚   β”œβ”€β”€ prompt_templates.py
β”‚   β”‚   └── report_generator.py
β”‚   β”‚
β”‚   β”œβ”€β”€ rag/                     # RAG system
β”‚   β”‚   β”œβ”€β”€ config.py
β”‚   β”‚   β”œβ”€β”€ retriever.py
β”‚   β”‚   β”œβ”€β”€ knowledge_base.py
β”‚   β”‚   β”œβ”€β”€ rag_pipeline.py
β”‚   β”‚   └── documents/
β”‚   β”‚
β”‚   β”œβ”€β”€ evals/                   # Evaluation framework
β”‚   β”‚   β”œβ”€β”€ eval_config.py
β”‚   β”‚   β”œβ”€β”€ eval_runner.py
β”‚   β”‚   β”œβ”€β”€ eval_refinement.py
β”‚   β”‚   └── test_evals.py
β”‚   β”‚
β”‚   β”œβ”€β”€ utils/                   # Utilities
β”‚   β”‚   β”œβ”€β”€ parser.py
β”‚   β”‚   └── feature_scoring.py
β”‚   β”‚
β”‚   β”œβ”€β”€ speech_to_text.py        # Whisper transcription
β”‚   β”œβ”€β”€ speech_features.py       # Acoustic analysis
β”‚   β”œβ”€β”€ record_audio.py          # Audio recording
β”‚   β”œβ”€β”€ preprocess_audio.py      # Audio preprocessing
β”‚   β”œβ”€β”€ guardrails_config.py     # Safety & validation
β”‚   └── agent.py                 # Agent orchestrator
β”‚
β”œβ”€β”€ frontend/                     # React frontend
β”‚   β”œβ”€β”€ src/                     # Source code
β”‚   β”‚   β”œβ”€β”€ App.tsx
β”‚   β”‚   └── components/
β”‚   β”œβ”€β”€ public/                  # Static assets
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ vite.config.ts
β”‚   └── README.md
β”‚
└── README.md                     # This file

πŸ“š API Documentation

Endpoints

POST /analyze

Analyze uploaded audio file.

Request:

POST /analyze HTTP/1.1
Content-Type: multipart/form-data

file: <audio_file>

Response:

{
  "transcript": "...",
  "audio_features": { ... },
  "communication_analysis": { ... },
  "confidence_emotion_analysis": { ... },
  "personality_analysis": { ... },
  "final_report": "..."
}

Interactive API Docs

When the backend is running, visit:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

βš™οΈ Configuration

LLM Configuration (backend/llm1/llm_config.py)

LLM_MODEL_NAME = "mistral"  # Change model
TEMPERATURE = 0.3            # Creativity (0.0-1.0)
MAX_TOKENS = 512            # Max response length

RAG Configuration (backend/rag/config.py)

CHROMA_PERSIST_DIR = "./chroma_db"  # Vector DB storage
TOP_K_RESULTS = 3                    # Documents to retrieve

Recording Configuration (backend/main.py)

DURATION = 45        # Recording duration (seconds)
SAMPLE_RATE = 16000  # Required for Whisper
CHANNELS = 1         # Mono audio

πŸ› οΈ Development

Backend Development

cd backend
source venv/bin/activate

# Run with auto-reload
uvicorn api:app --reload

# Run tests
python -m pytest

# Format code
black .
flake8 .

Frontend Development

cd frontend

# Development server
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

# Lint
npm run lint

πŸ§ͺ Testing & Evaluation

Built-in Evaluations

The system includes a comprehensive evaluation framework:

cd backend
python -m evals.test_evals

Evaluation Criteria:

  • Helpfulness, Relevance, Coherence
  • Actionability, Specificity, Accuracy
  • Completeness, Constructiveness

Manual Testing

# Test LLM connection
python test_llm_step5.py

# Test RAG retrieval
python test_rag.py

# Test full pipeline
python main.py

πŸ”§ Troubleshooting

Ollama Connection Issues

Error: Ollama not available

Solution:

  1. Ensure Ollama is running: ollama serve
  2. Check model is pulled: ollama list
  3. Pull model if needed: ollama pull mistral
  4. Windows: Verify Ollama is in PATH
  5. Linux/macOS: Check which ollama

Import Errors

Solution:

# Ensure virtual environment is activated
source venv/bin/activate  # or venv\Scripts\activate

# Reinstall dependencies
pip install -r requirements.txt

Memory Issues

Solution:

  • Use smaller LLM model
  • Reduce MAX_TOKENS in configuration
  • Close other applications
  • Upgrade to 16GB+ RAM

ChromaDB Issues

Solution:

# Clear database
rm -rf backend/chroma_db/

# Restart backend

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow PEP 8 for Python code
  • Use TypeScript for frontend code
  • Add tests for new features
  • Update documentation

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Faster-Whisper: OpenAI Whisper implementation
  • Ollama: Local LLM runtime
  • LangChain: LLM application framework
  • ChromaDB: Vector database for AI
  • FastAPI: Modern Python web framework
  • React: UI library

πŸ“ž Support

For issues and questions:

  • Open an issue on GitHub
  • Contact: [Project Team]

πŸ—ΊοΈ Roadmap

  • Support for additional LLM providers
  • Multi-language support
  • Real-time streaming analysis
  • Advanced visualization dashboards
  • Mobile app
  • Docker containerization
  • Cloud deployment guide

Built with ❀️ by TEAM-5 (chatgpt)[https://chatgpt.com/g/g-p-693cf63b3d608191a200ca21f1c5f7e2-tts/project] (perplexity)[https://www.perplexity.ai/spaces/tts-1gnsM.HoSV.NHB6HbWS31g#0]

About

Hackathon for JNTU Vijayanagaram - December 13, 2025

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors