- Supabase client with configurable table prefix
- Ollama embedding integration
- Semantic search via RPC functions
- Hybrid search (semantic + keyword)
- Fallback client-side vector search
- Stats and health endpoints
- Chunker - Text splitting with overlap
- Paragraph/sentence/word break points
- Markdown-aware with header preservation
- Web Crawler - URL ingestion
- Single page or recursive
- Configurable depth (0-3)
- Rate limiting
- Sitemap support
- Document Parser - File processing
- Native: TXT, MD, RST, JSON, YAML, CSV, TSV
- Docling: PDF, DOCX, PPTX, XLSX, HTML
-
kb status- Check connections -
kb find- Search with options -
kb sources- List sources -
kb embed- Generate embeddings -
kb serve- Start web UI
- Dashboard with stats
- Live search with HTMX
- Sources list with delete
- Add Source modal (crawl/upload)
- Settings page
- Glassmorphism design (Archon-inspired)
- SKILL.md with usage instructions
- Search workflow examples
- Source refresh/re-crawl action
- Progress bar for long operations (Toast)
- Chunk preview in search results (with expand)
- Query term highlighting
- Tags for sources (add/remove/filter)
- Search filters (threshold)
- Type filter (All/Web/Documents)
- Export search results (JSON/Markdown/CSV)
- Source details page (with all chunks)
- Better error messages in UI
- Batch delete/operations
- Multiple embedding models
- Re-embed with different model
- Custom chunk sizes per source
- Scheduled re-crawls
- Webhooks for new content
- Proper pgvector search function
- Index optimization
- Caching for frequent queries
- Async embedding generation
| Component | Choice | Reason |
|---|---|---|
| Embeddings | Ollama + nomic-embed-text | Local, free, 768-dim |
| Vector DB | Supabase + pgvector | Self-hosted, SQL |
| Backend | FastAPI | Async, fast, typed |
| Frontend | HTMX + Tailwind | No build step |
| Interactivity | Alpine.js | Lightweight |
| Doc Parsing | Docling | IBM, comprehensive |
| Web Crawling | BeautifulSoup + html2text | Standard, reliable |
The codebase supports different Supabase schemas:
- Uses
schema.sqlprovided - Full feature support
- Different column names (
chunk_indexvschunk_number) - No
url/titlein chunks (stored in metadata) - Uses RPC functions for search
- Fallback to client-side search if needed
Set TABLE_PREFIX=jarvis in .env to use Archon tables.