111 lines (88 loc) · 2.98 KB

OpenClaw Knowledgebase - Development Plan

Status: v0.1.0 Released ✅

Completed Features

1. Core Infrastructure ✅

Supabase client with configurable table prefix
Ollama embedding integration
Semantic search via RPC functions
Hybrid search (semantic + keyword)
Fallback client-side vector search
Stats and health endpoints

2. Ingestion Pipeline ✅

Chunker - Text splitting with overlap
- Paragraph/sentence/word break points
- Markdown-aware with header preservation
Web Crawler - URL ingestion
- Single page or recursive
- Configurable depth (0-3)
- Rate limiting
- Sitemap support
Document Parser - File processing
- Native: TXT, MD, RST, JSON, YAML, CSV, TSV
- Docling: PDF, DOCX, PPTX, XLSX, HTML

3. CLI ✅

4. Web UI ✅

5. OpenClaw Skill ✅

SKILL.md with usage instructions
Search workflow examples

Future Improvements

High Priority ✅

Source refresh/re-crawl action
Progress bar for long operations (Toast)
Chunk preview in search results (with expand)
Query term highlighting

Medium Priority ✅

Tags for sources (add/remove/filter)
Search filters (threshold)
Type filter (All/Web/Documents)
Export search results (JSON/Markdown/CSV)
Source details page (with all chunks)

Still TODO

Better error messages in UI
Batch delete/operations

Low Priority

Performance

Proper pgvector search function
Index optimization
Caching for frequent queries
Async embedding generation

Tech Stack

Component	Choice	Reason
Embeddings	Ollama + nomic-embed-text	Local, free, 768-dim
Vector DB	Supabase + pgvector	Self-hosted, SQL
Backend	FastAPI	Async, fast, typed
Frontend	HTMX + Tailwind	No build step
Interactivity	Alpine.js	Lightweight
Doc Parsing	Docling	IBM, comprehensive
Web Crawling	BeautifulSoup + html2text	Standard, reliable

Schema Compatibility

The codebase supports different Supabase schemas:

New Schema (`kb_*`)

Uses schema.sql provided
Full feature support

Archon Schema (`jarvis_*`)

Different column names (chunk_index vs chunk_number)
No url/title in chunks (stored in metadata)
Uses RPC functions for search
Fallback to client-side search if needed

Set TABLE_PREFIX=jarvis in .env to use Archon tables.