A local-first desktop knowledge workspace with AI-powered RAG (Retrieval-Augmented Generation). Everything runs on your machine — no cloud, no telemetry, no accounts. Write notes, store materials, and ask questions entirely offline.
Built with PySide6 (Qt6) and QML for the interface, LangGraph for the AI query pipeline, and sqlite-vec for vector search. Supports multiple AI providers (Ollama, OpenAI, Anthropic, Google) and optional DuckDuckGo web search.
- Create, open, delete named workspaces ("vessels") — each vessel is an isolated directory with its own notes, files, and RAG database.
- Multi-vessel registry — persistent history of all your vessels stored at
~/.config/vessel/vessels_history.json(or%LOCALAPPDATA%/vesselon Windows). - Per-vessel isolation — each vessel has its own
Droplets/,Materials/,AI/directories and.vessel/metadata folder containing the RAG database, chat history, and calendar events.
- Live Markdown editor with split-pane render mode — write in Markdown, preview formatted output side-by-side.
- File tree browser — navigate your droplet directory structure with folder nesting, create new files/folders, rename, and delete.
- Auto-save — unsaved changes are persisted to disk on window close via a guaranteed shutdown handler.
- Multi-format — supports
.md,.html, and.txtfiles within the Droplets directory.
- Drag-and-drop file upload — copy files into the vessel's
Materials/directory with a single click. Files are automatically indexed for search. - Broad format support:
- Documents: PDF, DOC, DOCX, PPT, PPTX, HTML, TXT, CSV, JSON
- Images: PNG, JPG, JPEG, GIF, BMP, WebP, SVG
- Video: MP4, AVI, MOV, MKV, WebM
- Content preview — inline viewing for PDFs (via Qt's PDF engine), HTML, Markdown, TXT, CSV, and JSON. Other file types open with your system's default application.
- Automatic text extraction — uploaded files go through a LangGraph ingestion pipeline that extracts text, generates tags, and creates vector embeddings.
When a file is uploaded to Materials, a LangGraph pipeline processes it automatically:
- Format conversion — PPT/DOC files are converted to PDF via LibreOffice.
- Text extraction:
- PDF — extracted via
pdftotext(poppler-utils) with PyMuPDF fallback. - HTML — tag stripping via Python's stdlib
HTMLParser. - Images — OCR via Tesseract.
- Video — single-frame grab via ffmpeg + Tesseract OCR.
- PDF — extracted via
- Tag generation — 10-strategy aggressive tag extraction (file identity, file-name parts, word frequency, title-case phrases, ALL-CAPS acronyms, CamelCase identifiers, numbers, hyphenated compounds, TF signal boosting, short-jargon boost). Up to 30 tags per document.
- Vector embedding — 768-dimensional embeddings via Ollama (
nomic-embed-text) with sentence-transformers (all-MiniLM-L6-v2) fallback, stored in sqlite-vec. - Text persistence — extracted text is saved as
.txtin the vessel'sAI/content/directory. - FTS5 auto-sync — full-text search index is automatically kept in sync via SQLite triggers.
- Conversational RAG — ask questions about your notes and materials, get answers grounded in your own data.
- Multi-channel retrieval — queries are processed through a LangGraph state machine that runs retrievers in parallel:
- Vector search — semantic similarity via sqlite-vec (cosine distance).
- BM25 keyword search — SQLite FTS5 with ranked results.
- Tag search — inverted-index lookup via junction tables.
- Web search — optional DuckDuckGo integration for online augmentation.
- Query classification — automatically detects whether a query is a question (answer from context) or a summarize/generate request (produce study material, flashcards, key points). For "all documents" requests, every document is fetched; for specific topics, tag + vector search on the topic keyword is used.
- Code generation — queries requiring calculation or data processing trigger a code generation node that produces and executes Python scripts (stdlib only) in a sandboxed subprocess.
- Answer refinement loop — generated answers are scored on completeness, accuracy, and clarity (0–10). Low-scoring answers are automatically refined up to 3 times with specific feedback.
- Chat history — per-vessel persistent conversations stored as JSON files in
.vessel/chats/. Conversations are auto-titled from the first user message. - Multi-turn context — conversational context is threaded through the pipeline for follow-up questions.
- Ollama (default) — local LLMs running on your machine. Configurable model name.
- OpenAI — GPT models via API key.
- Anthropic — Claude models via API key.
- Google — Gemini models via API key.
- Seamless switching — change providers in the settings at any time. All providers share the same RAG pipeline.
- Embeddings — Ollama (
nomic-embed-text) for local setups; sentence-transformers fallback for non-Ollama providers.
- Automatic tagging — when files are ingested, the upload pipeline extracts tags using 10 strategies:
- File type classification
- File extension tags
- File-name word segmentation
- Lowercase word frequency
- Title-case terms (proper nouns)
- Multi-word title-case phrases
- ALL-CAPS acronyms
- CamelCase identifiers
- Numeric values
- Hyphenated compounds
- Inverted-index search — junction table (
document_tags) linking documents to tags enables fast tag-based retrieval. - False-positive tolerant — designed to over-generate tags; recall is preferred over precision for RAG retrieval.
- Per-vessel event tracking — lightweight JSON-based calendar stored in
.vessel/events.json. - Create and delete events — each event has a title, date, and auto-generated ID.
- Upcoming events panel — sorted list of events from today onward.
- Built-in PDF rendering — uses Qt's PDF engine (
QtQuick.Pdf) for native in-app viewing.
- Full color customization — 8 configurable color properties: background (dark, card, panel), border, text (primary, secondary), accent, and danger.
- Persistent theme config — saved to
~/.config/vessel/theme_config.json. - Real-time updates — theme changes apply immediately to the UI via Qt property bindings.
- 100% offline capable — everything runs on your machine. No cloud dependency, no data leaves your computer unless you enable web search.
- No telemetry, no accounts, no sign-ups.
- Your data, your files — vessels are standard directories on your filesystem. Documents are stored as plain text files in
AI/content/. The RAG database is a standard SQLite file.
Important
For pure local experience, ollama is required and it needs to run ollama serve in the background. Install ollama from here: Ollama Install
Pre-built binaries are available on the Releases page. Download the archive for your platform, extract it, and run the executable.
Prerequisites: Python 3.14.5 (nearby versions 3.12–3.13 may work but are untested). Optional dependencies for full file format support: LibreOffice, poppler-utils, Tesseract OCR, ffmpeg.
# Clone the repo
git clone https://github.com/iamsurjog/Vessel.git
cd Vessel
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Run
python main.py[!note] Python version Targets Python 3.14.5. Nearby versions (3.12, 3.13) may work, but have not been tested.
You can use pyinstaller to build binaries for your system. Command for using pyinstaller: For linux:
pyinstaller --onefile --windowed --add-data "main.qml:." main.pyFor windows:
pyinstaller --onefile --windowed --add-data "main.qml;." main.pyMake sure the main.qml file is added
- Launch the application — you'll see the Launcher screen.
- Create a new vessel — enter a name and choose a parent directory. A new vessel directory is created with
Droplets/,Materials/,AI/, and.vessel/subdirectories. - Open an existing vessel — select from the vessel registry or open any vessel directory.
- Delete a vessel — removes the vessel from the registry and optionally deletes the directory.
- Navigate the file tree in the sidebar to browse your Markdown notes.
- Click a file to open it in the editor. The editor supports live Markdown rendering.
- Right-click (or use the toolbar) to create new files/folders, rename, or delete.
- Files are saved automatically when you close the workspace.
- Click "Upload" to copy a file from your system into the current vessel's Materials directory.
- Files are automatically processed (text extraction, tagging, vector embedding — see the Upload Pipeline above).
- Click a file to preview it. PDFs open in the built-in PDF viewer; HTML/Markdown/TXT/CSV/JSON show their text content; other files open with your system default.
- Open a vessel, then navigate to the Chat panel.
- Ask questions about your notes and materials — the AI searches your documents and returns grounded answers.
- Toggle web search to also include DuckDuckGo results in the response.
- Conversation history is preserved per vessel. Start a new conversation or pick up where you left off.
- Add events with a title and date.
- View upcoming events — events from today onward are shown in a sorted panel.
- Delete events as needed.
File Upload
|
v
[filetype_router] ──▶ PPT/DOC ──▶ [convert_to_pdf] (LibreOffice)
| |
| PDF
| |
├──▶ PDF/TXT/HTML ──▶ [extract_text] ──┤
| (pdftotext / |
| HTMLParser / |
| direct read) |
| │
└──▶ Image/Video ──▶ [generate_description]
(Tesseract OCR / |
ffmpeg+OCR) |
v
┌────────────────────┐
│ Parallel Storage │
│ ┌──────────────┐ │
│ │ Vector Embed │ │ sqlite-vec (768d)
│ ├──────────────┤ │
│ │ Store as TXT │ │ AI/content/*.txt
│ ├──────────────┤ │
│ │ Tags (×30) │ │ tags + document_tags
│ └──────────────┘ │
└────────────────────┘
User Query
|
v
[classify_node] ──▶ answer_q ──▶ [parallel search]
| | ├── vector_search (sqlite-vec)
| | ├── keyword_search (FTS5 BM25)
| | └── web_search (DuckDuckGo, optional)
| |
| ├── [generate_py_script] (if calculation needed)
| |
| └── [generate_answer] ──▶ [quality_check]
| |
| (score < 7?) ──▶ [refine_answer] ──loop
| |
| (score ≥ 7) ──▶ END
|
└──▶ summarize ──▶ [parse_summarize_query]
|
┌──────┴──────┐
v v
[get_all_docs] [search_by_topic]
(tag + vector)
| |
└──────┬──────┘
v
[check_if_relevant]
|
(relevant?) ──▶ [generate_answer] → [quality_check] → END
|
(irrelevant) ──▶ END
Each vessel has a .vessel/vessel_rag.db SQLite database:
documents— primary document store:id,title,text_chunktags— unique tag names:id,namedocument_tags— many-to-many junction:doc_id,tag_idv_document_embeddings— sqlite-vec virtual table, 768-dimensional float vectorsdocuments_fts— FTS5 virtual table for BM25 full-text search (auto-synced via triggers)
| Provider | Default Model | Requires | Embeddings |
|---|---|---|---|
| Ollama | tinyllama:1.1b |
Ollama server running locally | nomic-embed-text (local) |
| OpenAI | gpt-4o-mini |
API key | sentence-transformers |
| Anthropic | claude-3-haiku-20240307 |
API key | sentence-transformers |
gemini-2.0-flash |
API key | sentence-transformers |
Configure providers in Settings → AI Provider.
Configuration files are stored at:
- Linux:
~/.config/vessel/ - macOS:
~/.config/vessel/ - Windows:
%LOCALAPPDATA%/vessel/
| File | Purpose |
|---|---|
vessels_history.json |
Vessel registry (list of known vessels) |
provider_config.json |
AI provider settings (provider, model, API keys) |
theme_config.json |
UI theme colors |
| Variable | Value | Effect |
|---|---|---|
VESSEL_LOG_LEVEL |
0 (default) |
Quiet — no extra output |
VESSEL_LOG_LEVEL |
1 |
General — API errors, status codes, missing keys |
VESSEL_LOG_LEVEL |
2 |
Highly specific — request/response bodies, pipeline timing |
Set it before running:
VESSEL_LOG_LEVEL=2 python main.py- UI Framework: PySide6 (Qt 6.11) + QML (QtQuick Controls)
- AI Pipeline: LangGraph (1.2.x) — state machine orchestration
- Vector Search: sqlite-vec (0.1.x) — 768-dim embeddings
- Full-Text Search: SQLite FTS5 with BM25 ranking
- Embedding Models: nomic-embed-text (Ollama) / all-MiniLM-L6-v2 (sentence-transformers)
- LLM Providers: Ollama, OpenAI, Anthropic, Google Gemini
- Web Search: DuckDuckGo (ddgs/duckduckgo_search)
- Document Processing: pdftotext, LibreOffice, Tesseract OCR, ffmpeg
- Python: 3.14+
See CONTRIBUTING.md for guidelines.