FLIQ — AI Banking Assistant for Indian Customers

FLIQ is an AI-powered banking assistant built for Indian customers. It answers banking questions in multiple Indian languages, lets users upload their financial documents (bank statements, PAN cards, loan agreements) and ask questions about them, and supports both text and voice interaction.

Features

Multi-language support — responds in Hindi (Devanagari), Hinglish (auto-detected), English, and 9 other Indian languages
Document RAG — upload a PDF or image of a financial document and ask questions about it; answers are grounded strictly in the document
Banking knowledge base — built-in SBI knowledge base (accounts, loans, UPI, NEFT, RTGS, KYC, etc.) for questions without a document
Scanned document OCR — PyMuPDF + Tesseract extract text from scanned PDFs and image files (PAN cards, bank statements)
Voice input + output — record audio questions; get spoken responses via ElevenLabs TTS or browser Speech Synthesis fallback
Sentiment-aware responses — detects frustrated, confused, or satisfied tone and adjusts explanation depth accordingly
Scope enforcement — declines to answer questions unrelated to Indian banking; rejects non-banking documents
Automatic SBI knowledge refresh — APScheduler ingests fresh SBI website content, Wikipedia articles, RSS news, and PDFs nightly

Architecture

User (Browser)
    │
    ▼
Next.js Frontend (Port 3000)
    ├── /api/chat         → proxy → FastAPI /api/chat
    ├── /api/upload       → proxy → FastAPI /api/upload
    ├── /api/voice-chat   → proxy → FastAPI /api/speech/voice-chat
    └── /api/synthesize   → proxy → FastAPI /api/speech/synthesize
    │
    ▼
FastAPI Backend (Port 7000)
    │
    ├── Chat Endpoint
    │     ├── document_id present → RAG pipeline
    │     │     ├── Detect language (Unicode + Hinglish keywords)
    │     │     ├── Generate query embedding (Azure text-embedding-3-large)
    │     │     ├── Semantic search in Qdrant (BANKING_RAG_DOCUMENTS, filtered by document_id)
    │     │     └── LLM generates response grounded in document chunks
    │     │
    │     └── no document_id → General banking pipeline
    │           ├── Detect language
    │           ├── Search SBI_BANK_DATA collection in Qdrant
    │           └── LLM generates response from SBI knowledge base
    │
    ├── Upload Endpoint
    │     ├── PDF with selectable text → PyPDF2 extraction → chunk → embed → Qdrant
    │     └── Scanned PDF / image → PyMuPDF render → Tesseract OCR → chunk → embed → Qdrant
    │
    └── Speech Endpoints
          ├── STT: SpeechRecognition (Google API) → transcribed text
          ├── Chat: same as above (RAG or general)
          └── TTS: ElevenLabs multilingual v2 → Azure Blob Storage URL
    │
    ▼
External Services
    ├── Azure OpenAI    — chat completions (GPT-4o) + embeddings (text-embedding-3-large)
    ├── Qdrant          — vector database (BANKING_RAG_DOCUMENTS + SBI_BANK_DATA collections)
    ├── Azure Blob      — document and audio file storage
    ├── MongoDB Atlas   — document metadata (optional; degraded gracefully)
    └── ElevenLabs      — text-to-speech (browser SpeechSynthesis fallback if unavailable)

Tech Stack

Layer	Technology
Frontend	Next.js 15, React 19, TypeScript, Tailwind CSS 4
Backend	FastAPI, Python 3.9, Pydantic v2, APScheduler
LLM	Azure OpenAI GPT-4o (chat), text-embedding-3-large (3072-dim embeddings)
Vector DB	Qdrant (self-hosted or Qdrant Cloud)
OCR	PyMuPDF (fitz), Tesseract OCR (eng+hin+ben+tam+tel)
PDF parsing	PyPDF2, PyMuPDF
STT	SpeechRecognition (Google Speech API)
TTS	ElevenLabs multilingual v2 / browser `speechSynthesis` fallback
Storage	Azure Blob Storage
Database	MongoDB Atlas (optional metadata store)
Translation	Bhashini (MeitY — configured but optional)

Prerequisites

Python 3.9+
Node.js 18+
Tesseract OCR installed on the server with Hindi and other Indian language data packs
Qdrant instance (cloud or local Docker: docker run -p 6333:6333 qdrant/qdrant)
Azure subscription with:
- Azure OpenAI resource (GPT-4o deployment + text-embedding-3-large deployment)
- Azure Blob Storage account
ElevenLabs account (optional — browser TTS is the fallback)
MongoDB Atlas cluster (optional)

Setup

Backend Setup

cd backend/src

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate          # macOS/Linux
# venv\Scripts\activate           # Windows

# Install dependencies
pip install -r ../../requirements.txt

# Copy and fill in environment variables
cp .env.example .env
# Edit .env with your credentials (see Environment Variables section)

# Start the server
uvicorn main:app --host 0.0.0.0 --port 7000 --reload

The API will be available at http://localhost:7000.
Interactive docs: http://localhost:7000/api/vectoriser/openapi.json

Frontend Setup

cd frontend

# Install dependencies
npm install

# Copy and fill in environment variables
cp .env.example .env
# Set NEXT_PUBLIC_BACKEND_URL=http://localhost:7000

# Start the development server
npm run dev

The app will be available at http://localhost:3000.

Environment Variables

Backend (`backend/src/.env`)

Variable	Description	Required
`QDRANT_HOST_URL`	Qdrant server URL (e.g. `http://localhost:6333`)	Yes
`QDRANT_API_KEY`	Qdrant API key (leave blank for local)	No
`AZURE_OPENAI_ENDPOINT`	Azure OpenAI endpoint for embeddings	Yes
`AZURE_OPENAI_API_KEY`	Azure OpenAI API key for embeddings	Yes
`AZURE_OPENAI_CHAT_ENDPOINT`	Azure OpenAI endpoint for chat (can be same)	Yes
`AZURE_OPENAI_CHAT_API_KEY`	Azure OpenAI API key for chat	Yes
`AZURE_OPENAI_CHAT_DEPLOYMENT`	Chat model deployment name (e.g. `gpt-4o-2`)	Yes
`AZURE_STORAGE_CONNECTION_STRING`	Azure Blob Storage connection string	Yes
`AZURE_CONTAINER_NAME`	Blob container for documents (default: `banking-documents`)	No
`DOCUMENT_DB_CONNECTION_STRING`	MongoDB Atlas connection string	No
`ELEVENLABS_API_KEY`	ElevenLabs API key for TTS	No
`ELEVENLABS_VOICE_ID`	ElevenLabs voice ID	No
`BHASHINI_USER_ID`	Bhashini (MeitY) user ID for translation	No
`BHASHINI_API_KEY`	Bhashini API key	No

Frontend (`frontend/.env`)

Variable	Description	Required
`NEXT_PUBLIC_BACKEND_URL`	Backend base URL (e.g. `http://192.168.10.50:7000`)	Yes

API Reference

Chat

POST /api/chat
Content-Type: application/json

{
  "query": "SBI mein savings account kaise kholein?",
  "document_id": "optional-uuid-from-upload"
}

Response:

{
  "response": "SBI में सेविंग्स अकाउंट खोलना बहुत आसान है...",
  "mode": "rag | general",
  "detected_language": "hi",
  "language_name": "Hindi",
  "document_id": "uuid | null",
  "chunks_used": 30,
  "sentiment": "neutral | frustrated | confused | satisfied",
  "sentiment_confidence": 0.87
}

Upload Document

POST /api/upload/
Content-Type: multipart/form-data

file: <PDF, JPG, JPEG, or PNG>  (max 10 MB)

Response:

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "filename": "pan_card.pdf",
  "total_chunks": 4,
  "total_pages": 1,
  "status": "processing_complete",
  "processing_type": "ocr | pdf",
  "message": "Document processed successfully with 4 chunks using Enhanced OCR processing"
}

Voice Chat

POST /api/speech/voice-chat
Content-Type: multipart/form-data

file: <WebM/MP3/WAV audio>
document_id: optional-uuid
include_audio_response: true | false

Response:

{
  "transcribed_text": "SBI mein account kaise kholein",
  "chat_response": "SBI में अकाउंट खोलने के लिए...",
  "mode": "rag | general",
  "detected_language": "hi",
  "audio_url": "https://storage.blob.core.windows.net/.../response.mp3",
  "document_id": null,
  "errors": null
}

Text-to-Speech

POST /api/speech/synthesize
Content-Type: multipart/form-data

text: "Your banking question answer here"
language: en | hi | bn | ...

Response:

{ "audio_url": "https://storage.blob.core.windows.net/..." }

Health Check

GET /api/health-check

SBI Knowledge Base Admin

POST /api/bank-admin/trigger-ingestion   — manually trigger SBI data refresh
GET  /api/bank-admin/ingestion-status    — get last ingestion status and stats

How It Works

Document Q&A (RAG mode)

User uploads a financial document (PDF or image)
Backend extracts text:
- Selectable PDF — PyPDF2 direct extraction
- Scanned PDF / image — PyMuPDF renders each page at 2× zoom → Tesseract OCR
Text is chunked (512-token chunks, 150 overlap) and embedded with text-embedding-3-large
Embeddings are stored in Qdrant (BANKING_RAG_DOCUMENTS collection) tagged with the document_id
When the user asks a question, the query is embedded and searched against only that document's chunks
The LLM generates a response grounded strictly in the retrieved chunks
Non-banking documents are rejected — the assistant will not reveal their content

General Banking Q&A

User asks a banking question without uploading a document
Query is embedded and searched against the SBI_BANK_DATA Qdrant collection
LLM generates a response using retrieved SBI knowledge base context
Off-topic questions (science, technology, cooking, etc.) are declined politely

Language Handling

Language is detected via Unicode script range analysis
Hinglish (Hindi words in Roman script like "main", "kaise", "kya") is detected via a keyword set and treated as Hindi
If the user writes in Hindi or Hinglish, the LLM responds entirely in Hindi Devanagari script
Bhashini translation (MeitY API) is used when available for other Indian languages

Voice Pipeline

User audio → STT (Google Speech API) → transcribed text
→ Language detection → [RAG or general] LLM response
→ ElevenLabs TTS → Azure Blob → audio URL sent to browser
→ (fallback: browser speechSynthesis if ElevenLabs unavailable)

Supported Languages

Language	Code	Script
Hindi (+ Hinglish)	hi	Devanagari / Roman
English	en	Roman
Bengali	bn	Bengali
Tamil	ta	Tamil
Telugu	te	Telugu
Marathi	mr	Devanagari
Gujarati	gu	Gujarati
Kannada	kn	Kannada
Malayalam	ml	Malayalam
Punjabi	pa	Gurmukhi
Urdu	ur	Nastaliq

Project Structure

Kyndryl/
├── backend/
│   └── src/
│       ├── main.py                     # FastAPI app factory + CORS + lifespan
│       ├── scheduler.py                # APScheduler — nightly SBI data ingestion
│       ├── api/
│       │   ├── routes.py               # Router aggregator
│       │   └── endpoints/
│       │       ├── chat.py             # POST /api/chat
│       │       ├── upload.py           # POST /api/upload/
│       │       ├── speech.py           # /api/speech/* (STT, voice-chat, TTS)
│       │       ├── bank_admin.py       # /api/bank-admin/* (manual ingestion trigger)
│       │       └── endpoints.py        # Health check
│       ├── services/
│       │   ├── rag_service.py          # RAG pipeline orchestration
│       │   ├── llm_service.py          # Azure OpenAI chat completions
│       │   ├── embedding_service.py    # Azure OpenAI text-embedding-3-large
│       │   ├── qdrant_service.py       # Qdrant CRUD + semantic search
│       │   ├── azure_storage_service.py
│       │   ├── mongodb_service.py      # Optional metadata store
│       │   ├── speech_recognition_service.py
│       │   ├── elevenlabs_service.py   # TTS + markdown stripping
│       │   ├── sentiment_service.py    # Customer sentiment analysis
│       │   ├── bhashini_service.py     # MeitY translation API
│       │   ├── ocr/
│       │   │   ├── enhanced_ocr_service.py  # Main OCR orchestrator
│       │   │   ├── pdf_processor.py         # PyPDF2 text extraction
│       │   │   ├── image_processor.py       # Tesseract OCR
│       │   │   └── chunking_service.py      # Text chunking
│       │   └── bank_ingestion/
│       │       ├── bank_ingestion_service.py
│       │       ├── sbi_web_scraper.py
│       │       ├── sbi_pdf_downloader.py
│       │       ├── sbi_news_rss_service.py
│       │       ├── sbi_sitemap_service.py
│       │       └── sbi_wikipedia_service.py
│       ├── utils/
│       │   ├── language_detector.py    # Unicode + Hinglish keyword detection
│       │   ├── file_handler.py         # Upload validation
│       │   └── text_processor.py
│       └── configs/
│           └── config.py               # Pydantic settings (all env vars)
│
└── frontend/
    ├── app/
    │   ├── page.tsx                    # Home page (first message + file upload)
    │   ├── chat/[sessionId]/page.tsx   # Chat session page
    │   └── api/
    │       ├── chat/route.ts           # Proxy → /api/chat
    │       ├── upload/route.ts         # Proxy → /api/upload/
    │       ├── voice-chat/route.ts     # Proxy → /api/speech/voice-chat
    │       └── synthesize/route.ts     # Proxy → /api/speech/synthesize
    ├── components/
    │   ├── ChatInput.tsx               # Input bar (text, file, mic, TTS toggle)
    │   └── Sidebar.tsx
    ├── hooks/
    │   └── useChatMessage/index.ts     # Chat state, upload, TTS, voice logic
    ├── services/
    │   └── useChatService/index.ts     # API call wrappers
    └── types/
        └── chat.ts

Notes

Qdrant collections: BANKING_RAG_DOCUMENTS stores user-uploaded document chunks; SBI_BANK_DATA stores the SBI knowledge base. Both are created automatically on first startup.
MongoDB is optional: If MongoDB Atlas is unreachable, document metadata storage is skipped gracefully and the system continues to operate via Qdrant.
ElevenLabs is optional: If the API key is not configured or quota is exhausted, the frontend falls back to the browser's built-in speechSynthesis API automatically.
Tesseract is required for image/scanned-PDF processing: Without it, only selectable-text PDFs can be uploaded.
Document scope: The assistant only processes banking and finance documents. Research papers, technical manuals, or other non-banking uploads are rejected with a clear message.
Data privacy: Document embeddings are stored per document_id and are never cross-queried across different users' documents (each query filters strictly by the requester's document_id).

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
backend		backend
frontend		frontend
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLIQ — AI Banking Assistant for Indian Customers

Table of Contents

Features

Architecture

Tech Stack

Prerequisites

Setup

Backend Setup

Frontend Setup

Environment Variables

Backend (`backend/src/.env`)

Frontend (`frontend/.env`)

API Reference

Chat

Upload Document

Voice Chat

Text-to-Speech

Health Check

SBI Knowledge Base Admin

How It Works

Document Q&A (RAG mode)

General Banking Q&A

Language Handling

Voice Pipeline

Supported Languages

Project Structure

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FLIQ — AI Banking Assistant for Indian Customers

Table of Contents

Features

Architecture

Tech Stack

Prerequisites

Setup

Backend Setup

Frontend Setup

Environment Variables

Backend (backend/src/.env)

Frontend (frontend/.env)

API Reference

Chat

Upload Document

Voice Chat

Text-to-Speech

Health Check

SBI Knowledge Base Admin

How It Works

Document Q&A (RAG mode)

General Banking Q&A

Language Handling

Voice Pipeline

Supported Languages

Project Structure

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend (`backend/src/.env`)

Frontend (`frontend/.env`)

Packages