A Retrieval-Augmented Generation (RAG) based assistant that answers queries from an Operating Systems lab manual using local LLM inference with Ollama (Phi-3) and ChromaDB for semantic retrieval.
This project implements a context-aware AI assistant that retrieves relevant sections from an OS lab manual and generates precise, grounded responses.
Unlike naive LLM chat systems, this assistant:
- Retrieves relevant context chunks from a vector database
- Feeds them into a local LLM (Phi-3 via Ollama)
- Produces factually grounded answers
- Handles user queries via REST API
- Manages the RAG pipeline
- Integrates retrieval + generation
- Stores embeddings of OS lab manual
- Performs semantic similarity search
- Returns top-k relevant chunks
- Local LLM for inference
- Generates answers using retrieved context
- Ensures privacy (no external API calls)
- Operating Systems Lab Manual
- Preprocessed into text chunks
- Embedded and stored in ChromaDB
- User sends query to FastAPI
- Query is embedded and matched in ChromaDB
- Top-k relevant chunks are retrieved
- Context is passed to Phi-3 via Ollama
- LLM generates a grounded response
- Retrieval-Augmented Generation (RAG) pipeline
- Semantic search using vector embeddings
- Fully local LLM (Ollama Phi-3)
- Fast API-based interaction
- Domain-specific QA (OS lab manual)
- Backend: FastAPI
- LLM: Phi-3 (via Ollama)
- Vector DB: ChromaDB
- Embeddings: Sentence Transformers / Ollama embeddings
- Language: Python
bash id="z9f3r1" git clone https://github.com/yourusername/rag-assistant.git cd rag-assistant
bash id="d2l8xp" pip install -r requirements.txt
bash id="6c2mzk" ollama run phi3
bash id="d91kqf" uvicorn main:app --reload
id="x7v9al" http://localhost:8000/docs
id="n3k2pq" "What is deadlock and how can it be prevented?"
The system retrieves relevant sections and generates a context-aware answer.