Godot RAG System

This project was implemented for LLM Zoomcamp - a free course about LLMs and RAG.

Problem Description

The Challenge: Godot is a powerful open-source game engine, but its extensive documentation can be overwhelming for newcomers. Developers, especially those new to game engines, often struggle to find relevant information quickly when they encounter specific problems or need to implement particular features. Traditional documentation search is often inadequate - users may not know the exact terminology to search for, or they might need information scattered across multiple sections.

The Solution: This RAG (Retrieval-Augmented Generation) system transforms how developers interact with Godot documentation. Instead of manually searching through hundreds of pages, users can ask natural language questions and receive contextual answers along with direct links to the most relevant documentation sections. The system combines semantic search with AI-powered response generation to provide accurate, helpful answers that accelerate learning and development workflow.

Key Benefits:

Faster Problem Resolution: Get instant answers without browsing through extensive documentation
Contextual Learning: Receive targeted information relevant to your specific use case
Documentation Discovery: Find relevant sections you might not have discovered through traditional search
Beginner-Friendly: Ask questions in plain English without needing to know exact technical terminology

Dataset

The dataset used in this project contains comprehensive information from the official Godot Engine documentation, including:

Document Structure: Each chunk contains structured information from Godot's documentation hierarchy

Title: The main title of the documentation page (e.g., "RigidBody2D", "Creating Your First Scene")
File Path: The source documentation file location (e.g., "classes/class_rigidbody2d.rst.txt")
Section: The major section within the documentation (e.g., "Methods", "Properties", "Tutorials")
Subsection: Specific subsections for detailed organization (e.g., "Virtual Methods", "Constants")
Content: The actual documentation text containing explanations, code examples, and instructions

Content Categories: The documentation covers all aspects of Godot game development:

Class References: Detailed API documentation for all Godot classes and methods
Tutorials: Step-by-step guides for common game development tasks
Manual Pages: Comprehensive explanations of Godot's features and concepts
Code Examples: GDScript code snippets with explanations
Best Practices: Recommended approaches and patterns for game development

Processing Pipeline:

Download: Official Godot documentation is downloaded from the nightly builds
Extraction: Documentation is extracted and converted from .rst.txt files to structured text
Chunking: Large documents are intelligently split into smaller, semantically meaningful chunks
Embedding: Each chunk is converted into 384-dimensional vectors using the all-MiniLM-L6-v2 model
Storage: Vectors and metadata are stored in Qdrant vector database for fast semantic search

The processed dataset contains thousands of documentation chunks optimized for retrieval-augmented generation. Each chunk maintains its connection to the original Godot documentation through metadata, enabling users to trace answers back to their official sources.

You can find the processed data in data/chunked/chunks.json and the raw documentation can be found at https://nightly.link/godotengine/godot-docs/workflows/build_offline_docs/master/godot-docs-epub-stable.zip.

Project Demo Video

https://www.youtube.com/embed/-F6iT-kqeKw?si=ReaCcVhH0xVhkLIo

Screenshots

Prerequisites

Have the following packages and tools installed:

Python 3.12 or later
Docker
Docker Compose

optional but recommended

32GB+ RAM
NVIDIA GPU (for faster embedding computation, and LLM otherwise CPU will be used)

How to run

Choose one of the setup options below to get started. Both scripts will automatically configure Docker containers for the LLM, Qdrant database, and monitoring stack, as well as populate the vector database with embedded chunked documentation data.

Prerequisites: Ensure Docker is running before executing these commands.

Option 1: Pre-chunked Dataset (Recommended)

Uses pre-processed dataset from 8/17/2025 for faster setup. Choose this option for quick deployment with stable documentation.

./setup_pre_chunked.sh

Option 2: From Scratch

Downloads and processes the latest Godot documentation, then embeds it into the vector database. Only use this option if you specifically need the most current Godot documentation.

./setup_scratch.sh

Monitoring

Query Rate (QPS) - Real-time query volume
Average Response Time - Response latency gauge
LLM Evaluation Scores - AI quality metrics (Relevance, Accuracy, Completeness, Clarity, Faithfulness)
Vector Database Metrics - Qdrant performance and usage
Top Query Categories - Most common query types

Service URLs

📊 Grafana Dashboard: http://localhost:3000 (admin/admin)
🔍 Prometheus: http://localhost:9090
📈 Metrics: http://localhost:8000/metrics

Retrieval Evaluation

📊 View Complete Retrieval Evaluation Analysis →

Summary: After comprehensive testing of multiple retrieval methods against our actual Qdrant database, Aggressive MMR emerged as the optimal approach, providing the best balance of relevance, diversity, and performance. The system has been updated with optimized MMR parameters for improved results.

Evaluation

Cosine Similarity

Range: 0.0 to 1.0 (higher = more relevant)

Model:

all-MiniLM-L6-v2 (384-dimensional embeddings)

Distance Metric:

Cosine distance in vector space Process: Query Embedding: Your question → 384D vector

Document Embeddings:

384D vectors for all Godot docs

Similarity Search:

Cosine similarity between query and all documents

Ranking:

Top 5 most similar documents returned

Score Interpretation:

0.7-1.0: High relevance (excellent matches)
0.4-0.7: Medium relevance (related concepts)
<0.4: Low relevance (weakly related)

LLM as a Judge

The LLM (Large Language Model) is used as a judge to evaluate the quality of the answers generated by the RAG system. This involves using the LLM to assess various aspects of the answers, such as relevance, accuracy, completeness, clarity, and faithfulness.

Packages and Tools

Core Infrastructure

Docker & Docker Compose: Container orchestration for services
Python 3.12+: Main programming language
Bash Scripts: Automated setup and deployment scripts

RAG System Components

Vector Database & Embeddings

qdrant-client: Vector database client for semantic search
sentence-transformers: Pre-trained embedding models
langchain_huggingface: HuggingFace integration for embeddings
transformers: Deep learning models for NLP
torch: PyTorch for GPU-accelerated embedding computation

Language Model & Chain

langchain: RAG pipeline framework
langchain-core: Core LangChain functionality
langchain-community: Community integrations
langchain_qdrant: Qdrant vector store integration
langchain_ollama: Ollama LLM integration
Ollama: Local LLM inference server (llama3.2:1b model)
tiktoken: Token counting and text processing

Web Interface

streamlit: Interactive web application framework

Data Processing

requests: HTTP client for downloading documentation
bs4 (BeautifulSoup): parsing and web scraping
tqdm: Progress bars for data processing

Monitoring & Metrics Stack

Metrics Collection

prometheus_client: Python client for Prometheus metrics
fastapi: High-performance API framework for metrics endpoint
uvicorn: ASGI server for FastAPI applications
python-multipart: Multipart form data support

Monitoring Services (Docker)

Prometheus: Time-series metrics collection and storage
Grafana: Metrics visualization and dashboards
Qdrant: High-performance vector database

Development & Deployment Tools

Shell Scripts:
- setup_pre_chunked.sh: Quick setup with pre-processed data
- setup_scratch.sh: Full pipeline from raw documentation
- start_monitoring.sh: Launch monitoring stack
- import_dashboard.sh: Configure Grafana dashboards

Key Models & Algorithms

Embedding Model: all-MiniLM-L6-v2 (384-dimensional vectors)
LLM: llama3.2:1b (1 billion parameter model)
Similarity Search: Cosine similarity in vector space
Evaluation: LLM-as-a-Judge for answer quality assessment

Troubleshooting

Grafana Dashboard Issues

If the dashboard doesn't appear automatically:

Make sure you've logged into Grafana web interface first (admin/admin)
Run ./import_dashboard.sh manually
Check that all services are running with docker compose ps
Verify Grafana is accessible at http://localhost:3000

Future Considerations

Compare different LLM models (e.g. GPT-4, Llama 2, Falcon, Mistral)
Compare different chunking methods (e.g. overlapping chunks, hierarchical chunking)
Compare different databases (e.g. Pinecone, Weaviate, Milvus)
Explore additional data sources for improving knowledge base (e.g. Wikipedia, GitHub, Stack Overflow)
Use user feedback for continuous improvement and fine-tuning.
Deploy on cloud platforms for scalability and availability.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data/chunked		data/chunked
evaluation		evaluation
images		images
legacy		legacy
monitoring		monitoring
.gitignore		.gitignore
.python-version		.python-version
README.MD		README.MD
RETRIEVAL_EVALUATION.md		RETRIEVAL_EVALUATION.md
app.py		app.py
chunk.py		chunk.py
configure_grafana.sh		configure_grafana.sh
docker-compose.yml		docker-compose.yml
download_godot_docs.py		download_godot_docs.py
embed.py		embed.py
import_dashboard.sh		import_dashboard.sh
rag.py		rag.py
requirements.txt		requirements.txt
setup-ollama.sh		setup-ollama.sh
setup_pre_chunked.sh		setup_pre_chunked.sh
setup_scratch.sh		setup_scratch.sh
start_monitoring.sh		start_monitoring.sh

maxsg5/rag-system

Folders and files

Latest commit

History

Repository files navigation