This RAG system fetches information from your private Confluence as a CSV file, vectorizes and stores the embeddings in ChromaDB, and then uses it via Streamlit or as a Slack bot, interpreting the result with Llama 4 Scout or Qwen3.
GraphRAG mode adds a Neo4j knowledge graph for richer, relationship-aware retrieval — surfacing related pages, shared entities, and topic communities that flat vector search would miss.
flowchart LR
classDef confluence fill:#fdfd96,stroke:#333,stroke-width:2px;
classDef chroma fill:#e8f4f8,stroke:#333,stroke-width:2px;
classDef neo4j fill:#f0d9ff,stroke:#333,stroke-width:2px;
classDef chat fill:#e6f7e6,stroke:#333,stroke-width:2px;
subgraph Phase1 ["① Extraction — app_confluence.py"]
direction TB
A1[(Confluence)] --> A2[Fetch Pages & Content]
A2 --> A3[Filter 'internal_only' labels]
A3 --> A4[Extract Links & Hierarchy]
A4 --> A5[(kb.csv)]
A4 --> A6[(page_hierarchy.csv)]
A4 --> A7[(page_links.csv)]
end
class Phase1 confluence;
subgraph Phase2 ["② Vector Indexing — index_generator.py"]
direction TB
B1[Clean & Convert Schema] --> B2[Embed with bge-m3]
B2 --> B3[(ChromaDB)]
end
class Phase2 chroma;
subgraph Phase3 ["③ Graph Building — graph_builder.py"]
direction TB
C1[Page Nodes] --> C2[CHILD_OF edges]
C2 --> C3[LINKS_TO edges]
C3 --> C4[Llama4\nEntities & Relations]
C4 --> C5[Community Detection]
C5 --> C6[(Neo4j)]
end
class Phase3 neo4j;
subgraph Phase4 ["④ Hybrid Retrieval & Chat — chat.py / graph_retriever.py"]
direction TB
D1([User Question]) --> D2[Embed Question]
D2 --> D3[1· Vector Search\nChromaDB top-K]
D3 --> D4[2· Graph Expansion\nNeo4j 1-hop + Communities]
D4 --> D5[3· Merge Docs]
D5 --> D6{Reranker?}
D6 -- Yes --> D7[BGE Cross-Encoder]
D6 -- No --> D8[Top-N Context]
D7 --> D8
D8 --> D9[Build Prompt]
D9 --> D10((Llama4))
D10 --> D11([Response])
end
class Phase4 chat;
%% Inter-phase edges
A5 --> Phase2
A5 -.-> C1
A6 -.-> C2
A7 -.-> C3
B3 -.-> D3
C6 -.-> D4
- Python 3.8+
- Python packages listed in
requirements.txt .envfile (see.env.example)- Docker (for Neo4j, if using GraphRAG)
- If you are using a fully local installation, install qwen3:14b (it should require a good GPU in your system)
Docker bundles the Streamlit chat interface, all Python dependencies, and Neo4j into a single command — no Python installation required.
- Docker Desktop installed and running.
- A Groq API key (optional): console.groq.com/keys
Copy the example env file and fill in your credentials:
cp .env.example .envOpen .env and set at minimum:
GROQ_API_KEY="gsk_..." # optional — your free Groq key
NEO4J_PASSWORD="changeme" # optional — change if you want a custom passwordNote: You do not need to change
NEO4J_URI— Docker sets it automatically to reach the database container.
docker compose up --buildThe first run will download images and install Python packages (~5 minutes). Subsequent starts are fast.
You'll see the app is ready when you see:
baymax-streamlit | You can now view your Streamlit app in your browser.
baymax-streamlit | URL: http://0.0.0.0:8501
http://localhost:8501 — Baymax chat interface
http://localhost:7474 — Neo4j Browser (optional, for graph exploration)
# Stop all services (data is preserved in volumes)
docker compose down
# Start again without rebuilding (also re-reads .env)
docker compose up -d
# Rebuild the image (e.g. after updating source code or requirements.txt)
docker compose up --build -dChanged something in
.env? Usedocker compose up -d— it recreates the container and re-reads the.envfile.docker compose restartdoes not re-read.env; it reuses the environment frozen at container creation.
Your embeddings, source data, and backups are mounted from your local filesystem into the container:
| Local folder | What it stores |
|---|---|
./chroma_db/ |
Vector embeddings (ChromaDB) |
./data/ |
Fetched CSV files from Confluence / GitHub |
./backups/ |
Export/import zip archives |
Deleting the container does not delete these folders. Use the Data Manager in the Streamlit sidebar to export and import a full backup zip (including Neo4j graph data).
If you have a pre-built backup zip to share with a teammate:
- Place the
.zipfile inside thebackups/folder. - Open http://localhost:8501 and use the Data Manager sidebar → "Existing backups" → select and restore.
- The page will prompt you to restart — run
docker compose restart streamlit.
- CPU-only mode (default): The Docker image uses CPU-only PyTorch for broad compatibility. Inference is slightly slower than GPU, but fully functional for most teams.
- Ollama / local LLM: If you want to use a local Ollama model instead of Groq, Ollama must run on your host machine. Set
OLLAMA_BASE_URL=http://host.docker.internal:11434in your.envand leaveGROQ_API_KEYempty.
If you have an NVIDIA GPU and want faster embedding inference, use the CUDA override:
Host prerequisites (one-time setup):
- NVIDIA drivers installed
- nvidia-container-toolkit installed and Docker restarted
Build and run with GPU support:
docker compose -f docker-compose.yml -f docker-compose.cuda.yml up --buildThis installs the CUDA 12.8 PyTorch build and passes your GPU into the container. chat.py automatically detects CUDA via torch.cuda.is_available() — no code changes needed.
Clone the repository.
git clone git@github.com:ikarius6/baymax-rag-system.gitCreate your local enviroment
python -m venv venv
source venv/bin/activateInstall PyTorch with CUDA support (required for GPU acceleration, especially RTX 50-series Blackwell GPUs):
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128Install the remaining packages:
pip install -r requirements.txtCreate a .env file in the root directory with the required environment variables (see .env.example).
Start Neo4j with Docker:
docker compose up -dThis starts Neo4j Community 5 with the Graph Data Science plugin. Access the browser at http://localhost:7474.
Default credentials: neo4j / changeme (change NEO4J_PASSWORD in your .env).
Install Ollama in your system (https://github.com/ollama/ollama)
curl -fsSL https://ollama.com/install.sh | shDownload qwen3:14b and start the ollama server
ollama pull qwen3:14b
ollama serveNote: qwen3:14b is better than llama4 for local because llama4 is massive and requires a lot of VRAM
To use a remote version of Llama4, enable the Grop API by getting your own GROQ_API_KEY
- Go to https://console.groq.com/keys
- Generate a new token
- Ad it to
GROQ_API_KEYin your.env
To use Copilot instead, enable it in your .env file by setting the following variables:
COPILOT_API_KEY- Your GitHub Copilot API keyCOPILOT_BASE_URL- The base URL for the Copilot API (default: https://api.githubcopilot.com)COPILOT_MODEL- The model to use (default: gpt-4o)
To get your own CONFLUENCE_TOKEN
- Go to https://yourconfluence.com/plugins/personalaccesstokens/usertokens.action
- Generate a new token
- Add it to
CONFLUENCE_TOKENin your.env
To get your own GITHUB_TOKEN
- Go to https://github.twdcgrid.net/settings/tokens/new
- Generate a new token with 'public_repo' scope
- Add it to GITHUB_TOKEN in your .env
Import the slack_manifest.yml to your Slack App, then get your access tokens for your .env file.
For SLACK_SIGNING_SECRET go to Basic Information > App Credentials > Signing Secret
For SLACK_APP_TOKEN go to Basic Information > App-Level Tokens > Generate Token
For SLACK_BOT_TOKEN go to OAuth & Permissions > OAuth Tokens > Bot User OAuth Token
Make sure you have the cookie.txt file with the session to avoid SSO issues. The cookie can be extracted for any request in the confluence page.
python app_confluence.pyThis creates data/kb.csv, data/page_hierarchy.csv, and data/page_links.csv.
Require: Github Token
Run app_github.py to fetch readme data from Github and save it as a CSV file, you can change the GITHUB_ORG_NAME from CLU to another one in the .env file, the process could take a few minutes:
python app_github.pyThis process going to create data/github.csv file with all the necessary data for the next step.
Subsequent runs are incremental — only new/modified pages are fetched:
python app_confluence.py # Incremental (default)
python app_confluence.py --full # Force full re-download
python app_github.py # Incremental (default)
python app_github.py --full # Force full re-downloadUses BAAI/bge-m3 (configurable via EMBED_MODEL in .env):
python index_generator.py # Only embeds new documents
python index_generator.py --full # Re-embed everythingExtracts entities using Llama 4 Scout and builds the graph in Neo4j:
python graph_builder.py # Incremental (skips already-processed pages)
python graph_builder.py --full # Wipe graph and rebuild from scratchOpen the Neo4j Browser to explore:
MATCH (p:Page)-[r]->(q) RETURN p, r, q LIMIT 50Use data_manager.py to export/import all collected data (data/, chroma_db/, and Neo4j graph) as a single zip file. This lets other users skip the heavy scraping, embedding, and graph-building steps entirely.
python data_manager.py export # -> backups/baymax_backup_<timestamp>.zip
python data_manager.py export my_backup # -> backups/my_backup.zippython data_manager.py import backups/my_backup.zippython data_manager.py listYou can also export and import directly from the Streamlit sidebar (Data Manager section).
The recipient only needs to:
docker compose up -d
python data_manager.py import backups/<backup_name>.zip
streamlit run streamlit.py -- --use-graphstreamlit run streamlit.pyUSE_GRAPH=true streamlit run streamlit.pyOr pass the flag:
streamlit run streamlit.py -- --use-graphpython slack.pyBaymax also ships as a VS Code extension that brings the RAG chat directly into your editor — no browser tab required. Ask questions about your Confluence knowledge base, get source references, and check API health without leaving your IDE.
Key highlights:
- Activity Bar panel — open the chat from the sidebar like any built-in VS Code view.
- Inline source references — every answer links back to the original Confluence pages.
- Health indicator — a live status dot shows whether the Baymax API is reachable.
- Configurable — point it at any Baymax API instance via the
baymax.apiUrlsetting.
- Start the Baymax API server:
.\venv\Scripts\uvicorn api:app --host 127.0.0.1 --port 8888 --reload
- Open the
vscode-extension/folder in VS Code, compile, and press F5 2.5. (Optional) Or install the extension from the.vsixfile if you want to test it outside the development environment.
For full setup instructions, configuration options, and packaging details see the VS Code Extension README.
Fetches pages from Confluence, extracts content, page hierarchy, and cross-page links.
Fetches readme files from Github, extracts README content.
Generates embeddings using FlagEmbedding (bge-multilingual-gemma2) with batch processing and upserts to ChromaDB.
Builds a Neo4j knowledge graph: page nodes, hierarchy/link relationships, entity extraction via Llama 4 Scout, and community detection.
Hybrid retriever combining vector search + graph expansion + community context + optional BGE reranking.
Query logic with --use-graph toggle for switching between standard and GraphRAG retrieval.
Export/import all collected data (data/, chroma_db/, Neo4j graph) as a single zip file. CLI and Streamlit sidebar support.
Streamlit user interface with Data Manager sidebar for export/import.
Slack integration.
Helper methods for ChromaDB, embeddings, and response processing.
Docker Compose configuration for Neo4j and the Streamlit app. Run with docker compose up --build.
