QueryNest is a terminal-first, Python-based Retrieval Augmented Generation (RAG) application that allows users to ask natural language questions against external knowledge sources directly from the command line.
It is designed to be developer-friendly, fully self-hostable, and incrementally extensible, with a strong focus on local execution and minimal external dependencies.
- Installation
- CLI Usage
- Features
- Supported Data Sources
- Key Features In-Depth
- High-Level Architecture
- Technical Stack
- Memory Design
- Local Storage Structure
- Session Management
- Prompt Construction Strategy
- Roadmap
- Distribution
- Security Principles
- Engineering Principles
- License
- Status
QueryNest can be used either as a Python CLI (via PyPI) or as a Docker-based CLI.
QueryNest is distributed as a Python package and can be installed directly from PyPI.
- Python 3.10 or higher
pipinstalled and available in PATH- Internet access for first-time dependency installation
pip install querynest-cli==2.0.0This installs the querynest CLI in your environment.
querynest --helpIf installed correctly, you should see the available CLI commands.
Official PyPI release: https://pypi.org/project/querynest-cli/2.0.0/
QueryNest is also available as a Docker image, allowing you to use the CLI without installing Python or dependencies locally.
docker pull divyansh1552005/querynest:latestdocker run --rm divyansh1552005/querynest --helpdocker run --rm \
-e GEMINI_API_KEY=YOUR_API_KEY \
divyansh1552005/querynest chat --web "https://example.com"docker run -it --rm \
-e GEMINI_API_KEY=YOUR_API_KEY \
divyansh1552005/querynest chatDocker Scout may report OS-level CVEs inherited from the base image. QueryNest does not expose network services and is safe for CLI usage.
The CLI supports:
- Chatting with a single web page or a PDF (or folder of PDFs)
- Automatic session creation and resume
- Session inspection, search, rename, and deletion
- Viewing chat history
- Configuration management (API keys and LLM model selection)
After installation (editable or normal), the CLI is exposed as:
querynestInternally, this maps to:
querynest.cli.main:mainOn startup, the CLI:
- Runs the bootstrap process (ensures config and API key exist)
- Registers all subcommands
- Dispatches to the appropriate command handler
querynest
├── chat # Core chat functionality
├── config # Configuration management
├── history # View chat history
└── sessions # Session management
Each top-level command is isolated and does not share side effects with others.
The chat command is the primary entry point for QueryNest. It allows you to start or resume a conversational session with a single knowledge source.
- One web page URL
- One PDF file
- One folder containing multiple PDFs
Only one source is allowed per session.
# Start chat with a web page
querynest chat --web "https://example.com"
# Start chat with a single PDF
querynest chat --pdf "/path/to/file.pdf"
# Start chat with multiple PDFs in a folder
querynest chat --pdf "/path/to/folder/"
# Force rebuild the vector index (useful if the source has been updated)
querynest chat --web "https://example.com" --force
querynest chat --pdf "/path/to/file.pdf" --force- A deterministic session ID is generated from the source
- If a session already exists for the source, it is resumed automatically
- If not, a new session is created with rich progress feedback
- On first creation, the user is prompted for a session name
- Documents are loaded (with progress bars), split into chunks, embedded, and indexed using FAISS
- A conversational chat loop is started with real-time streaming responses
- Model used is shown on startup and determined by your current config (defaults to Gemini)
querynest chat --web "https://example.com" --forceForces a complete rebuild of the vector index even if a session already exists for the source. Use this when:
- The web page content has been updated
- The PDF has been modified
- You want a fresh index without resuming the old session
This clears the existing chat history and vector index for that source and starts fresh.
- Interactive REPL-style chat with streaming token-by-token responses
- Plain text responses with structured formatting (headings, lists) — no markdown symbols
- Sliding window memory for efficient conversation context
- Automatic persistence of chat and vectors
- Rich progress feedback during document processing
- Multi-model support — Use any LLM through LiteLLM
- Graceful handling of Ctrl+C and EOF
Type either of the following to end the chat:
exit
quit
Manage QueryNest configuration — API keys and LLM model selection.
querynest config set-gemini-key- Prompts securely for a new Gemini API key
- Used exclusively for embeddings (
text-embedding-004) - Updates the local configuration file
- Takes effect immediately
querynest config set-llm- Shows a curated menu of supported LLM providers and models
- Also supports entering a custom model string (e.g.
groq/llama-3.1-8b-instant) - Prompts for the provider API key (skipped if Gemini is selected as LLM)
- Available options:
1. Gemini 2.5 Flash (default)
2. OpenAI - GPT-4o
3. OpenAI - GPT-4o Mini
4. Anthropic - Claude Sonnet
5. Groq - Llama 3.3 70B
6. Mistral - Large
7. Enter custom model string
querynest config set-llm-key- Updates only the API key for the currently configured LLM provider
- Useful when rotating API keys without switching models
- If current LLM is Gemini, redirects to
set-gemini-key
querynest config show-models- Displays the currently configured embedding model and LLM
- Example output:
Current Configuration:
Embeddings : Google Gemini (text-embedding-004)
LLM : groq/llama-3.3-70b-versatile
View the chat history associated with a session.
History can be accessed in three mutually exclusive ways:
querynest history show --session-id <SESSION_ID>
querynest history show --web "https://example.com"
querynest history show --pdf "/path/to/file.pdf"- Exactly one of
--session-id,--web, or--pdfmust be provided - History is read-only
- Messages are shown in chronological order
Each message is displayed with its role:
USER: ...
ASSISTANT: ...
The sessions command provides full control and visibility over stored sessions.
querynest sessions listDisplays:
- Session ID
- Session name
- Source type (WEB / PDF)
querynest sessions list --allDisplays all metadata fields for every session.
Sorting flags are mutually exclusive:
querynest sessions list --recent # Sort by last_used_at (descending)
querynest sessions list --oldest # Sort by created_at (ascending)
querynest sessions list --name # Sort alphabetically by nameThe --all flag may be combined with any single sorting flag.
querynest sessions info <SESSION_ID>Displays detailed metadata for the specified session.
querynest sessions rename <SESSION_ID> "New Session Name"- Updates only the session metadata
- Does not affect vectors or chat history
querynest sessions delete <SESSION_ID>-
Requires confirmation
-
Permanently removes:
- Vector index
- Chat history
- Metadata
Search across stored sessions using metadata fields.
querynest sessions search "query"querynest sessions search "example.com" --sourcequerynest sessions search "pdf" --typequerynest sessions search "http" --allSearch is:
- Case-insensitive
- Partial match
- Metadata-only (no vector loading)
- One session corresponds to exactly one source
- Sessions are resumed automatically
- Multiple PDFs are supported only via a single folder
- JavaScript-rendered web pages are not supported
- Image-only documents are not supported
- Embedding model is fixed (Google Gemini) — changing it would invalidate existing indexes
- Terminal-based conversational interface with streaming responses for real-time feedback
- Multi-model LLM support — Seamlessly switch between Gemini, OpenAI, Claude, Groq, Mistral and 100+ providers via LiteLLM
- Rich progress bars for PDF loading, chunking, and embedding operations
- Streaming responses — Responses stream token-by-token in real-time
- Force re-indexing — Rebuild vector index on demand with
--force - Query external knowledge sources using natural language
- Support for multiple data sources:
- Website URLs (cleaned page content)
- PDF documents (local files or folders)
- Retrieval Augmented Generation (RAG) pipeline
- Conversational context awareness (sliding window memory)
- Deterministic session creation and automatic session resume
- Fully local storage of data and configuration
- Bring-your-own API key model
- No frontend, browser, or GUI dependency
- Accepts a website URL
- Fetches and cleans main page content
- Allows semantic querying over web pages
Limitations:
- JavaScript-rendered pages are NOT supported
- Image-only pages are NOT supported
- Login / paywall pages are NOT supported
- Accepts a local PDF file path or folder of PDFs
- Extracts document text with rich progress feedback
- Enables question answering over document content
QueryNest supports 100+ LLM models through LiteLLM integration. Embeddings always use Google Gemini (text-embedding-004) for consistency across sessions. The LLM is fully configurable:
# Default: Gemini
querynest chat --pdf "document.pdf"
# Switch to Groq (fast + free tier)
querynest config set-llm # select option 5
# Switch to OpenAI
querynest config set-llm # select option 2
# Check what's currently configured
querynest config show-modelsConfiguration is stored in ~/.querynest/config.json and persists across sessions.
Visual feedback during document processing:
- PDF Loading: Shows file processing status with filename and progress
- Embedding: Live progress bar for vector embedding operations (batched, 50 chunks at a time)
Example output:
Using Embeddings: Google Gemini (text-embedding-004)
Using LLM: groq/llama-3.3-70b-versatile
Loading documents...
⠸ Embedding chunks... ━━━━━━━━━━━━━━━ 45% 45/100 chunks
LLM responses stream token-by-token in real-time with clean formatted output:
You: What is machine learning?
Thinking...
Assistant
Machine learning is a subset of artificial intelligence that enables
systems to learn and improve from experience without being explicitly
programmed...
User (Terminal)
↓
QueryNest CLI
↓
Source Loader (Web / PDF)
↓
Text Cleaning & Normalization
↓
Text Chunking
↓
Embeddings (Google Gemini — fixed)
↓
Vector Store (FAISS)
↓
Similarity Search
↓
LLM (Configurable via LiteLLM)
↓
Terminal Response (Streamed)
- Python 3.10+
- LLM (via LiteLLM): Google Gemini (default), OpenAI, Anthropic, Groq, Mistral, and 100+ more
- Embeddings: Google Gemini
text-embedding-004(fixed — ensures index consistency)
- FAISS (CPU-based, default)
- Chroma (planned)
- Websites:
requests,beautifulsoup4,readability-lxml - PDFs:
pypdf
- Rich: Terminal formatting, live progress bars
- LiteLLM: Multi-model LLM abstraction layer
- tqdm: Progress bars for directory PDF loading
QueryNest separates memory into two independent systems:
- Stores embeddings of source content
- Used only for semantic retrieval
- Implemented using FAISS
- Stores user–assistant messages
- Maintains conversational continuity
- Sliding window of recent messages (last 4 exchanges)
- Stored as local JSON files
All persistent data is stored locally on the user's machine.
~/.querynest/
~/.querynest/
├── config.json
└── sessions/
└── <session_id>/
├── meta.json
├── chat.json
└── vectors.faiss
{
"gemini_api_key": "...",
"llm_model": "groq/llama-3.3-70b-versatile",
"llm_api_key": "..."
}API keys are never bundled in distributed artifacts.
- Sessions are deterministically generated using a SHA-256 hash of the input source
- Same source results in the same session and memory
- Enables automatic session resume without manual configuration
- Use
--forceto bypass resume and rebuild from scratch
Each LLM request includes:
- Retrieved context chunks from the vector store
- Recent conversation history (sliding window)
- Current user query
The LLM is explicitly instructed to:
- Answer only from the provided context
- Use plain text formatting (no markdown symbols)
- Respond with "I don't know" if the answer cannot be inferred
- Basic terminal-based interaction using input/output
- Support for Website and PDF sources
- Gemini embeddings and LLM integration
- FAISS (in-memory)
- No persistence
- Professional command-based CLI interface
- Local persistence (sessions, chat history, vectors)
- Improved prompt handling and error management
- Dockerfile and Docker Compose support
- Volume-mounted persistent storage
- Same CLI experience inside containers
- LiteLLM integration for 100+ LLM providers
- Curated model selection menu with custom model support
- Per-provider API key management
- Rich progress bars for embedding pipeline
- Streaming responses
- Force re-indexing with
--force
Distribution formats:
- Docker Image — primary self-host method
- pip package
- Windows executable —
.exevia PyInstaller - Linux packages —
.rpmand.deb - AppImage — packaging format research and build pipeline
- Tarball
Introduction website (TypeScript):
- Home — project intro, tagline, quick feature highlights
- About — what QueryNest is, how it works, the tech behind it
- Download — all distribution options listed clearly (pip, Docker,
.exe,.rpm,.deb, AppImage, Tarball) - Documentation — full usage guide, CLI reference, configuration options, and examples
- Build an optional Terminal UI using Textual (Python)
- Panels for chat history, input box, session sidebar, and model info
- All existing CLI logic reusable as-is — Textual acts as a presentation layer only
- CLI commands continue to work as-is — TUI is an alternative, not a replacement
- Run via:
querynest tui
QueryNest is distributed through multiple formats:
- Docker image (
divyansh1552005/querynest:latest) - pip package (
querynest-clion PyPI) - Windows executable (
.exevia PyInstaller) — planned - Linux packages (
.rpm,.deb) — planned
Secrets and API keys are never bundled in distributed artifacts.
- All data stored locally by default
- No telemetry or external logging
- No data shared externally except with the configured LLM provider
- Clear separation of concerns
- Incremental complexity
- No premature optimization
- Storage and memory abstractions for easy migration
QueryNest is licensed under the GNU General Public License v3 (GPL-3.0).
QueryNest is under active development. APIs, CLI commands, and internal architecture may evolve across releases.