Retrieval-Augmented Generation (RAG) with Llama 3 and Wikipedia | Local Open-Source LLM Chatbot
Llama3 RAG Wiki is a local, open-source Retrieval-Augmented Generation (RAG) chatbot built using Llama 3, Ollama, and Wikipedia.
It demonstrates how to:
- Combine LLMs + semantic search
- Reduce hallucinations using external knowledge retrieval
- Build a fully local RAG pipeline
- Implement bare-bones-style RAG in Python
This project is ideal for LLM engineers, AI researchers, students, and open-source contributors looking to understand or build RAG systems from scratch.
- 🧠 Local Llama 3 (8B) inference via Ollama
- 📚 Real-time Wikipedia-based knowledge retrieval
- 🔍 Semantic search using Sentence Transformers
- 🧩 Modular RAG architecture
- 📓 Step-by-step Jupyter Notebook tutorial
- 🖥️ Standalone Python CLI application
- 🔓 100% open-source and offline-friendly
This project follows a standard Retrieval-Augmented Generation pipeline:
- User submits a query
- Relevant Wikipedia articles are retrieved
- Text is chunked and embedded
- Semantic similarity search selects top context
- Context is injected into the LLM prompt
- Llama 3 generates a grounded response
graph LR
A[User Query] --> B[Wikipedia API]
B --> C[Wikipedia Articles]
C --> D[Text Chunking]
D --> E[Embedding Model<br/>gte-base-en-v1.5]
E --> F[Vector Similarity Search]
F --> G[Top-K Relevant Chunks]
G --> H[Prompt Augmentation]
H --> I[Llama 3 LLM<br/>via Ollama]
I --> J[Final Answer]
📖 LinkedIn Article: A beginner-friendly explanation of LLMs and RAG architecture:
👉 Explain LLM + RAG Like I’m 5
The repository includes two implementations:
- Step-by-step explanation of RAG internals
- Ideal for learning and experimentation
- End-to-end local RAG chatbot
- Suitable for real-world usage and demos
| Component | Model |
|---|---|
| LLM | Llama 3 (8B) |
| Embeddings | Alibaba-NLP/gte-base-en-v1.5 |
- ollama – v0.2.1
- sentence-transformers – v3.0.1
- numpy – v1.26.4
- Wikipedia-API – v0.6.0
- Python 3.9+
- Ollama installed locally
ollama pull llama3
ollama pull llama3.1python Llama3_RAG_Wiki.py- 🧪 Learning Retrieval-Augmented Generation
- 🤖 Building local AI chatbots
- 📚 Question answering over external knowledge
- 🛠️ LLM system design experimentation
- 🎓 AI education & workshops
- Vector database integration (FAISS / Chroma)
- Multi-document retrieval
- Query rewriting and reranking
- Streaming responses
- Web-based UI
- Demonstrates real-world RAG implementation
- Uses state-of-the-art open-source LLMs
- Runs entirely on your local machine
- Beginner-friendly yet production-aligned
If this project helped you, please consider giving it a ⭐!