An end-to-end AI-powered customer support backend built with FastAPI, combining offline ingestion, vector search (FAISS), intent detection, human feature extraction, LLM reasoning, and response strategy selection.
This system is designed using clean modular architecture, separating offline processing and online query execution for scalability and maintainability.
- 🔹 Offline document ingestion & preprocessing
- 🔹 Chunking + embeddings using Sentence Transformers
- 🔹 FAISS-based vector similarity search
- 🔹 Intent & emotion detection
- 🔹 Human behavior feature extraction
- 🔹 Context-aware retrieval routing
- 🔹 LLM-powered response generation
- 🔹 Answer validation to reduce hallucinations
- 🔹 Strategy-based response selection
- 🔹 FastAPI REST interface
- Load raw text data
- Clean & preprocess documents
- Chunk large documents
- Generate embeddings
- Enrich metadata
- Store vectors in FAISS index
- Preprocess user query
- Extract human behavior features
- Detect intent & emotion
- Route retrieval strategy
- Retrieve relevant chunks
- Assemble contextual prompt
- Generate response using LLM
- Validate answer confidence
- Return final response
backend/
├── app/
│ ├── main.py # Application entry point
│ ├── __init__.py
│
│ ├── ingestion/ # Offline ingestion pipeline
│ │ ├── data_load.py
│ │ ├── preprocessing.py
│ │ ├── embedding.py
│ │ ├── metadata_enricher.py
│ │ ├── ingestion_manager.py
│ │ ├── run_preprocessing.py
│ │ └── __init__.py
│
│ ├── intent_detection/ # Intent & emotion detection
│ │ ├── intent_classifier.py
│ │ ├── intent_features.py
│ │ └── __init__.py
│
│ ├── query_pipeline/ # Online query processing
│ │ ├── query_preprocess.py
│ │ ├── human_features.py
│ │ ├── query_embed.py
│ │ ├── context_assembler.py
│ │ ├── retrieval_router.py
│ │ └── __init__.py
│
│ ├── vector_store/ # Vector storage layer
│ │ ├── faiss_index.py
│ │ └── __init__.py
│
│ ├── reasoning/ # LLM reasoning
│ │ ├── llm_reasoner.py
│ │ ├── response_generator.py
│ │ └── __init__.py
│
│ ├── response_strategy/ # Response style selection
│ │ ├── response_router.py
│ │ ├── response_strategy.py
│ │ └── __init__.py
│
│ ├── validation/ # Answer validation
│ │ ├── answer_validator.py
│ │ └── __init__.py
│
│ └── data/
│ └── training_data.txt # Knowledge base
Handles offline data preparation:
- Reads large text files
- Cleans & chunks content
- Generates embeddings
- Enriches metadata
- Prepares data for vector storage
Detects:
- User intent (greeting, question, complaint, etc.)
- Emotional tone (angry, neutral, urgent)
Online query execution:
- Cleans user input
- Extracts human behavioral features
- Embeds queries
- Retrieves relevant context
- FAISS-based similarity search
- Efficient nearest-neighbor lookup
- Uses LLM to generate answers from retrieved context
- Applies system prompts dynamically
- Chooses response tone (polite, empathetic, concise, etc.)
- Adjusts based on intent & emotion
- Ensures answers are grounded in context
- Reduces hallucinations via confidence scoring
- Backend Framework: FastAPI
- Embeddings: Sentence Transformers (
all-MiniLM-L6-v2) - Vector Search: FAISS
- LLM: TinyLlama 1.1B Chat
- Data Processing: NumPy
- API Schema: Pydantic
pip install -r requirements.txtUpdate in app/main.py:
DATA_PATH = "/path/to/training_data.txt"uvicorn app.main:app --reload- Health Check
GET /- Query Chatbot
POST /query
Content-Type: application/json
{
"user_query": "How do I reset my password?"
}- User sends a query
- Intent + emotion detected
- Context retrieved from FAISS
- LLM generates response
- Validator checks confidence
- Final answer returned
- Streaming responses
- Multi-language support
- Persistent session memory
- Redis-based caching
- Async ingestion
- Hybrid search (BM25 + vectors)
Jenish Shekhada AI Engineer | GenAI | RAG Systems | FastAPI