A basic LLMOps application using FastAPI, LangChain, ChromaDB, and Ollama for a simple insurance chatbot.
- FastAPI web API with chat endpoints
- Ollama integration for local LLM inference
- ChromaDB for vector storage and retrieval
- Document loading and indexing
- RAG (Retrieval-Augmented Generation) capabilities
- Health checks and monitoring
- Python 3.12
- Ollama installed and running
- macOS (tested) or Linux
-
Clone and setup:
git clone <your-repo> cd Introduction-to-LLMOps ./setup.sh
-
Add documents:
# Add your documents to data/documents/ # Supports .txt and .md files
-
Index documents:
python load_documents.py
-
Start the API:
uvicorn app.main:app --reload
-
Test the API:
- Visit: http://localhost:8000/docs
- Health check: http://localhost:8000/health
- Chat: POST to http://localhost:8000/chat
GET /- Root endpointGET /health- Health checkGET /info- System informationPOST /chat- Chat with the bot
{
"message": "How do I file an auto insurance claim?",
"use_context": true
}{
"response": "To file an auto insurance claim, you should...",
"sources": ["data/documents/insurance_faq.md"]
}Edit .env file to configure:
# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gemma2:2b
# Vector Store
CHROMA_PERSIST_DIRECTORY=./data/vector_store
# App Settings
API_TITLE=Simple Insurance Chatbot
API_VERSION=1.0.0├── app/
│ └── main.py # FastAPI application
├── data/
│ ├── documents/ # Place your documents here
│ └── vector_store/ # ChromaDB storage
├── load_documents.py # Document indexing script
├── setup.sh # Setup script
├── run.sh # Test script
├── requirements.txt # Python dependencies
└── .env # Configuration
Core packages:
- fastapi - Web framework
- langchain - LLM framework
- langchain_ollama - Ollama integration
- langchain_chroma - ChromaDB integration
- chromadb - Vector database
- python-dotenv - Environment management
- Test environment:
./run.sh - Start development server:
uvicorn app.main:app --reload - Add new documents: Add files to
data/documents/and runpython load_documents.py
-
Ollama not running:
ollama serve
-
Model not available:
ollama pull gemma2:2b
-
Dependencies issues:
pip install -r requirements.txt
-
Check health:
curl http://localhost:8000/health
This is a basic setup. You can extend it by:
- Adding more document types
- Implementing user authentication
- Adding conversation memory
- Implementing evaluation frameworks
- Adding web UI
- Deploying to production
MIT License