ragx is a minimal and hackable Retrieval-Augmented Generation (RAG) CLI tool designed for the terminal. It embeds your local files (or stdin), retrieves relevant chunks with KNN search, and queries OpenAI-compatible LLMs (local or remote) via a CLI/TUI workflow.
Table of Contents (click to expand)
- Local first or API backed – run fully offline with Ollama, or connect to OpenAI/ChatGPT APIs.
- Minimal stack – Go + SQLite-vec + Ollama/OpenAI.
- Terminal native – query via CLI or lightweight TUI.
- Configurable – tweak system/user prompts and RAG parameters (chunk size, overlap, model, etc.).
Binaries are built for:
| OS | Architectures | Tested on |
|---|---|---|
| Linux | amd64, arm64 | ✅ Fedora 43, Debian 13 |
| macOS | amd64, arm64 | ❌ not tested |
| Windows | amd64, arm64 | ❌ not tested |
Important
Only Linux has been tested so far. Other platforms are built but unverified, feedback is welcome.
go install github.com/ladzaretti/ragx-cli/cmd/ragx@latestcurl -sSL https://raw.githubusercontent.com/ladzaretti/ragx-cli/main/install.sh | bashThis will auto detect your OS/arch, downloads the latest release, and installs ragx to /usr/local/bin.
Visit the Releases page for a list of available downloads.
- OpenAI API
v1compatible: pointragxat any compatible base URL (local Ollama or remote). - Per-provider/per-model overrides: control temperature and context length.
- TUI chat: a lightweight Bubble Tea interface for iterative querying.
- Terminal first: pipe text in, embed directories/files, and print results.
- Local knowledge bases: notes, READMEs, docs.
- Quick “ask my files” workflows.
flowchart TD
subgraph Ingest
A["Files / stdin"] --> B["Chunker"]
B --> C["Embedder"]
C --> D["Vector Database"]
end
subgraph Query
Q["User Query"] --> QE["Embed Query"]
QE --> D
D --> K["Top-K Chunks"]
K --> P["Prompt Builder (system + template + context)"]
P --> M["LLM"]
M --> R["Answer"]
end
$ ragx --help
{{USAGE}}The optional configuration file can be generated using ragx config generate command:
{{CONFIG}}- CLI flags
- Environment variables (if supported)
- OpenAI environment variables are auto-detected:
OPENAI_API_BASE,OPENAI_API_KEY
- OpenAI environment variables are auto-detected:
- Config file
- Defaults
$ ragx list
http://localhost:11434/v1
jina/jina-embeddings-v2-base-en:latest
gpt-oss:20b
qwen3:8b-fast
nomic-embed-text:latest
mxbai-embed-large:latest
llama3.1:8b
qwen2.5-coder:14b
deepseek-r1:8b
qwen3:8b
nomic-embed-text:v1.5
hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL$ ragx query readme.md \
--model qwen3:8b \
--embedding-model jina/jina-embeddings-v2-base-en:latest \
"how do i tune chunk_size and overlap for large docs?"
- Tune `chunk_size` (chars per chunk) and `overlap` (chars overlapped between chunks) via config or CLI flags. For large documents, increase `chunk_size` (e.g., 2000+ chars) but keep `overlap` < `chunk_size` (e.g., 200). Adjust based on your content type and retrieval needs. [1]
Sources:
[1] (chunk 2) /home/gbi/GitHub/Gabriel-Ladzaretti/ragx-cli/readme.mdThese are minimal examples to get you started.
For detailed usage and more examples, run each subcommand with --help.
Note
These examples assume you already have a valid config file with at least one provider, a default chat model, and an embedding model set.
Tip
Generate a starter config with: ragx config generate > ~/.ragx.toml.
# embed all .go files in current dir and query via --query/-q
ragx query . -M '\.go$' -q "<query>"
# embed a single file and provide query after flag terminator --
ragx query readme.md -- "<query>"
# embed stdin and provide query as the last positional argument
cat readme.md | ragx query "<query>"
# embed multiple paths with filter
ragx query docs src -M '(?i)\.(md|txt)$' -q "<query>"
# embed all .go files in current dir and start the TUI
ragx chat . -M '\.go$'
# embed multiple paths (markdown and txt) and start the TUI
ragx chat ./docs ./src -M '(?i)\.(md|txt)$'
# embed stdin and start the TUI
cat readme.md | ragx chat- Chunking is currently character based
- adjust
chunk_size/overlapfor your content and use case.
- adjust
- The vector database is ephemeral: created fresh per session and not saved to disk.
