Skip to content

Latest commit

 

History

History
111 lines (88 loc) · 2.98 KB

File metadata and controls

111 lines (88 loc) · 2.98 KB

OpenClaw Knowledgebase - Development Plan

Status: v0.1.0 Released ✅

Completed Features

1. Core Infrastructure ✅

  • Supabase client with configurable table prefix
  • Ollama embedding integration
  • Semantic search via RPC functions
  • Hybrid search (semantic + keyword)
  • Fallback client-side vector search
  • Stats and health endpoints

2. Ingestion Pipeline ✅

  • Chunker - Text splitting with overlap
    • Paragraph/sentence/word break points
    • Markdown-aware with header preservation
  • Web Crawler - URL ingestion
    • Single page or recursive
    • Configurable depth (0-3)
    • Rate limiting
    • Sitemap support
  • Document Parser - File processing
    • Native: TXT, MD, RST, JSON, YAML, CSV, TSV
    • Docling: PDF, DOCX, PPTX, XLSX, HTML

3. CLI ✅

  • kb status - Check connections
  • kb find - Search with options
  • kb sources - List sources
  • kb embed - Generate embeddings
  • kb serve - Start web UI

4. Web UI ✅

  • Dashboard with stats
  • Live search with HTMX
  • Sources list with delete
  • Add Source modal (crawl/upload)
  • Settings page
  • Glassmorphism design (Archon-inspired)

5. OpenClaw Skill ✅

  • SKILL.md with usage instructions
  • Search workflow examples

Future Improvements

High Priority ✅

  • Source refresh/re-crawl action
  • Progress bar for long operations (Toast)
  • Chunk preview in search results (with expand)
  • Query term highlighting

Medium Priority ✅

  • Tags for sources (add/remove/filter)
  • Search filters (threshold)
  • Type filter (All/Web/Documents)
  • Export search results (JSON/Markdown/CSV)
  • Source details page (with all chunks)

Still TODO

  • Better error messages in UI
  • Batch delete/operations

Low Priority

  • Multiple embedding models
  • Re-embed with different model
  • Custom chunk sizes per source
  • Scheduled re-crawls
  • Webhooks for new content

Performance

  • Proper pgvector search function
  • Index optimization
  • Caching for frequent queries
  • Async embedding generation

Tech Stack

Component Choice Reason
Embeddings Ollama + nomic-embed-text Local, free, 768-dim
Vector DB Supabase + pgvector Self-hosted, SQL
Backend FastAPI Async, fast, typed
Frontend HTMX + Tailwind No build step
Interactivity Alpine.js Lightweight
Doc Parsing Docling IBM, comprehensive
Web Crawling BeautifulSoup + html2text Standard, reliable

Schema Compatibility

The codebase supports different Supabase schemas:

New Schema (kb_*)

  • Uses schema.sql provided
  • Full feature support

Archon Schema (jarvis_*)

  • Different column names (chunk_index vs chunk_number)
  • No url/title in chunks (stored in metadata)
  • Uses RPC functions for search
  • Fallback to client-side search if needed

Set TABLE_PREFIX=jarvis in .env to use Archon tables.