Your second brain, wired for AI.
Textrawl is a personal knowledge server with persistent memory, searchable documents, and proactive insights. Import your emails, PDFs, and notes — then ask questions, recall past conversations, and discover connections you missed. Access it all through MCP, the web dashboard, or the REST API.
┌──────────────┐ ┌───────────────────────────────────────────┐
│ │ │ Your Second Brain │
│ MCP Client │◄───────►│ │
│ (Claude, │ MCP │ Documents Memory Conversations │
│ ChatGPT) │ │ ┌────────┐ ┌───────┐ ┌────────────┐ │
└──────────────┘ │ │ Emails │ │ Facts │ │Past sessions│ │
│ │ PDFs │ │People │ │ Summaries │ │
┌──────────────┐ │ │ Notes │ │ Links │ │ Context │ │
│ Dashboard │◄───────►│ └───┬────┘ └──┬────┘ └─────┬──────┘ │
│ (Web UI) │ REST │ └──────────┼──────────────┘ │
└──────────────┘ │ ▼ │
│ ┌───────────────────────────┐ │
│ │ Hybrid Search + Fusion │ │
│ └───────────────────────────┘ │
│ │ │
│ ▼ │
│ Insights · Daily Briefing │
└───────────────────────────────────────────┘
▲
│
┌──────────┴──────────┐
│ │
Desktop App CLI Tools
(drag & drop) (batch import)
Beyond keyword search. Most search tools only match exact words. Textrawl combines semantic understanding (finds "automobile" when you search "car") with traditional keyword matching — so you get relevant results without missing exact phrases.
Your data, your choice. Use OpenAI's embeddings for best accuracy, Google AI for multimodal support, or run locally with Ollama and local Postgres to keep document text and embeddings on your machine.
Import everything. Emails from Gmail exports, PDFs from your research, saved web pages, images, audio files, Google Takeout archives — Textrawl converts them into searchable knowledge where the relevant converter/provider is configured.
| Feature | Description |
|---|---|
| Hybrid Search | Vector similarity + full-text search with Reciprocal Rank Fusion |
| Persistent Memory | Remember facts about people, projects, and concepts across sessions |
| Conversation Recall | Save and query past conversation context across sessions |
| Proactive Insights | Automatically discover connections, patterns, and outliers in your knowledge |
| Daily Briefing | Summary of recent additions, new insights, and resurfaced knowledge |
| Unified RAG | ask tool searches documents, memory, and conversations in one query |
| Web Dashboard | Command center with knowledge explorer, timeline, agent orchestration, and applets |
| Multimodal | Process images (Claude vision) and audio (Whisper transcription) alongside documents |
| Desktop App | Drag-and-drop file conversion and upload (macOS, Windows, Linux) |
| Multi-Format | PDF, DOCX, XLSX, PPTX, HTML, MBOX/EML emails, Google Takeout |
| MCP + REST + WebSocket | MCP tools, REST API, and real-time WebSocket events |
| Agent Discovery | A2A protocol at /.well-known/agent.json for agent-to-agent interaction |
| Flexible Embeddings | OpenAI, Google AI, or Ollama (free, local) |
| Smart Chunking | Paragraph-aware splitting with overlap for context |
| CLI Tools | Batch processing for large archives |
| Cloud Ready | Deploy to Docker, Cloud Run, or any container platform; large uploads require GCS/Cloud Tasks configuration |
Textrawl is self-hosted, but data leaves your machine when you configure cloud services. Document text, chunks, embeddings, extracted memories, conversation summaries, images, or audio may be sent to providers such as OpenAI/Google embeddings, Anthropic/OpenAI/Google extraction, Neon/Supabase/RDS, Cloud Run, or GCS. For sensitive data, prefer Ollama/local Postgres and disable cloud LLM extraction/insights.
git clone https://github.com/jeffgreendesign/textrawl.git
cd textrawl
pnpm install
pnpm setup # Interactive setup for credentials
pnpm dev # Start the server-
Create a free project at neon.tech
-
Copy the pooled connection string from the Neon dashboard into
DATABASE_URLin.env -
Run the base schema against your database:
psql $DATABASE_URL -f scripts/setup-db.sql(Use
setup-db-ollama.sqlfor Ollama /setup-db-ollama-v2.sqlfornomic-embed-text-v2-moe/setup-db-google.sqlfor Google AI) -
(Optional) For memory tools:
psql $DATABASE_URL -f scripts/setup-db-memory.sql -
(Optional) For conversation tools:
psql $DATABASE_URL -f scripts/setup-db-conversation.sql -
(Optional) For Row Level Security hardening:
psql $DATABASE_URL -f scripts/security-rls.sql
Add to your Claude config (~/Library/Application Support/Claude/claude_desktop_config.json). Create this file if it doesn't exist:
{
"mcpServers": {
"textrawl": {
"command": "npx",
"args": [
"mcp-remote",
"http://localhost:3000/mcp",
"--header",
"Accept: application/json, text/event-stream"
]
}
}
}Note: Requires Node.js 22+. If using nvm, ensure your default is set: nvm alias default 22
If you've set API_BEARER_TOKEN in .env, add the auth header:
"--header",
"Authorization: Bearer <your-token>"Restart Claude Desktop - you'll now see Textrawl's tools available.
ChatGPT Desktop supports MCP servers natively (Pro/Plus required):
- Open Settings → Connectors → Advanced → Developer mode
- Add a new connector with your server URL:
http://localhost:3000/mcp - If using auth, add the
Authorization: Bearer <your-token>header
See OpenAI MCP documentation for details.
Imported documents, extracted memories, and conversation summaries are stored in your configured database/storage until deleted. Treat a Textrawl server as single-tenant unless you have added your own user isolation. Set API_BEARER_TOKEN, restrict CORS with ALLOWED_ORIGINS, and avoid importing third-party or private data without consent. Use forget_entity and delete_conversation to remove memory/conversation data, and list_documents/update_document to audit imported documents.
Option A: Desktop App (easiest)
pnpm desktop:devDrag files onto the window to convert and upload.
Option B: CLI (for batch imports)
pnpm convert -- mbox ~/Mail/archive.mbox
pnpm upload -- ./converted/| Guide | Description |
|---|---|
| Database Sizing | Vector dimensions, index counts, and storage estimates by embedding provider |
| CLI Tools | Batch conversion and upload from command line |
| Security | Row Level Security and access controls |
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | Neon (or any PostgreSQL) pooled connection string |
DATABASE_URL_UNPOOLED |
No | Direct connection for schema migrations (optional) |
EMBEDDING_PROVIDER |
No | openai (default), ollama, or google |
OPENAI_API_KEY |
If OpenAI | For text-embedding-3-small (1536d) |
OLLAMA_BASE_URL |
If Ollama | Default: http://localhost:11434 |
OLLAMA_MODEL |
If Ollama | Default: nomic-embed-text |
GOOGLE_AI_API_KEY |
If Google | For gemini-embedding-2-preview (3072d) |
GOOGLE_EMBEDDING_MODEL |
If Google | Default: gemini-embedding-2-preview |
API_BEARER_TOKEN |
Prod only | Min 32 chars (openssl rand -hex 32) |
PORT |
No | Default: 3000 |
LOG_LEVEL |
No | debug, info, warn, error |
ALLOWED_ORIGINS |
No | Comma-separated CORS origins |
ENABLE_MEMORY |
No | Enable memory tools (default: true); requires setup-db-memory.sql |
ENABLE_CONVERSATIONS |
No | Enable conversation memory tools (default: true); requires setup-db-conversation.sql |
ENABLE_INSIGHTS |
No | Enable proactive insight tools (default: true) |
ENABLE_MEMORY_EXTRACTION |
No | Enable LLM-based memory extraction (default: false) |
ANTHROPIC_API_KEY |
If extraction | Required for extract_memories tool |
EXTRACTION_MODEL |
No | Model for extraction (default: claude-haiku-4-5-20251001) |
INSIGHT_MODEL |
No | Model for insight synthesis (default: claude-sonnet-4-6) |
COMPACT_RESPONSES |
No | Token-efficient responses (default: true) |
CHUNKING_MODE |
No | fixed (default) or semantic (embedding-based splits) |
SEMANTIC_SIMILARITY_THRESHOLD |
No | Semantic split sensitivity 0–1 (default: 0.5) |
REDIS_URL |
No | Redis URL for shared rate limiting across instances |
GOOGLE_CLIENT_ID |
OAuth | OAuth 2.0 client ID (all four OAuth vars required together) |
GOOGLE_CLIENT_SECRET |
OAuth | OAuth 2.0 client secret |
OAUTH_JWT_SECRET |
OAuth | Min 32-char secret for JWT signing |
OAUTH_ALLOWED_EMAILS |
OAuth | Comma-separated email allowlist (optional) |
OAUTH_SERVER_URL |
OAuth | Public server URL for OAuth redirect |
PG_REPORT_DIR |
No | pg_analyze report directory (default: ./reports/pg-analysis) |
Read-only tools (search, get_document, list_documents, query_memory, query_conversations, get_stats, health_check) include outputSchema and return structuredContent for programmatic consumption alongside the text content response.
| Tool | Description |
|---|---|
search |
Hybrid semantic + full-text search. Set includeMemories/includeConversations for cross-source fusion. |
get_document |
Retrieve document by ID |
list_documents |
List with pagination and filtering |
update_document |
Update title and/or tags |
add_note |
Add markdown note to knowledge base |
Enable with ENABLE_MEMORY=true (default). Requires scripts/setup-db-memory.sql or setup-db-memory-ollama.sql.
| Tool | Description |
|---|---|
remember_fact |
Store facts about entities (people, projects, concepts) |
build_knowledge |
Store multiple facts and relations in a single batch call |
query_memory |
Query the memory graph (mode: 'search' | 'entity' | 'list') |
relate_entities |
Create relationships between entities |
forget_entity |
Delete an entity and all its memories |
extract_memories |
Extract entities and facts from text using LLM |
Enable with ENABLE_CONVERSATIONS=true (default). Requires running one of the conversation schema scripts:
scripts/setup-db-conversation.sql(OpenAI embeddings, 1536d)scripts/setup-db-conversation-ollama.sql(Ollama v1 - nomic-embed-text, 1024d)scripts/setup-db-conversation-ollama-v2.sql(Ollama v2 - nomic-embed-text-v2-moe, 768d)scripts/setup-db-conversation-google.sql(Google AI - gemini-embedding-2-preview, 3072d)
| Tool | Description |
|---|---|
save_conversation_context |
Save conversation summary and turns for recall |
query_conversations |
Query past conversations (mode: 'search' | 'get' | 'list') |
delete_conversation |
Delete a conversation session |
Enable with ENABLE_INSIGHTS=true (default).
| Tool | Description |
|---|---|
get_insights |
View discovered cross-source connections and patterns |
discover_connections |
Trigger an insight scan across the knowledge base |
dismiss_insight |
Dismiss an insight from the queue |
| Tool | Description |
|---|---|
get_stats |
Statistics across all features (scope: 'all' | 'knowledge' | 'memory' | 'conversations' | 'insights') |
health_check |
Check the health of all subsystems — returns pass/fail per component with overall status (verbose for latency and counts) |
| Tool | Description |
|---|---|
ask |
Unified RAG search across all knowledge sources (documents, memory, conversations, insights) |
daily_briefing |
Generate a daily briefing with recent additions, new insights, and resurfaced knowledge |
save_url |
Fetch a URL, convert HTML to markdown, extract metadata, and save as a document |
timeline |
Browse knowledge chronologically within a date range, optionally filtered by topic |
Enabled when DATABASE_URL is configured. Connects directly to Postgres.
| Tool | Description |
|---|---|
pg_analyze |
Run comprehensive Postgres health analysis with table stats, index health, vacuum status, bloat estimates, and Textrawl-specific checks |
pg_recommendations |
Get actionable optimization recommendations filtered by severity |
pg_report_history |
View past analysis reports and compare trends over time |
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | required | Search query |
limit |
number | 10 | Max results (1-50) |
fullTextWeight |
number | 1.0 | Keyword weight (0-2) |
semanticWeight |
number | 1.0 | Semantic weight (0-2) |
minScore |
number | 0 | Min relevance threshold (0-1) |
tags |
string[] | - | Filter by tags (AND logic) |
sourceType |
string | - | note, file, or url |
curl -X POST http://localhost:3000/api/upload \
-H "Authorization: Bearer <your-token>" \
-F "file=@document.pdf" \
-F "title=Optional Title" \
-F "tags=tag1,tag2"Limits: 10MB max file size, 10 uploads/min
Formats: .pdf, .docx, .txt, .md
Response:
{
"success": true,
"documentId": "uuid",
"title": "Document Title",
"tags": ["tag1", "tag2"],
"chunksCreated": 12
}GET /health- Basic healthGET /health/ready- Readiness probe (checks DB)GET /health/live- Liveness probe
docker-compose up -d
docker-compose logs -f# Create secrets in Secret Manager first
export GCP_PROJECT_ID=your-project-id
./scripts/deploy.shpnpm dev # Watch mode
pnpm build # Production build
pnpm start # Run production
pnpm typecheck # Type check
pnpm lint # Biome lint check
pnpm quality # Lint + typecheck combined
pnpm inspector # MCP Inspector
pnpm setup # Generate .env with secure token
pnpm desktop:dev # Run desktop app
pnpm docs:dev # Run docs siteRun PostgreSQL + pgvector locally:
# Start local Postgres with pgvector
docker-compose -f docker-compose.local.yml up -d
# Initialize the database schema
docker exec -i textrawl-postgres psql -U postgres -d textrawl < scripts/setup-db.sql
# Optional: Start pgAdmin at http://localhost:5050
docker-compose -f docker-compose.local.yml --profile tools up -dRun embeddings locally with Ollama instead of OpenAI:
# Start Postgres + Ollama
docker-compose -f docker-compose.local.yml --profile ollama up -d
# Pull the embedding model (~274MB)
docker exec textrawl-ollama ollama pull nomic-embed-text
# Use the Ollama-specific schema (1024 dimensions)
docker exec -i textrawl-postgres psql -U postgres -d textrawl < scripts/setup-db-ollama.sqlSet in .env:
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-textSupported Ollama models: nomic-embed-text (1024d), nomic-embed-text-v2-moe (768d, recommended for new installs), mxbai-embed-large (1024d)
Note: Each provider uses different embedding dimensions: OpenAI 1536d, Ollama 1024d (or 768d for v2-moe), Google AI 3072d. Use the matching schema:
setup-db.sql(OpenAI),setup-db-ollama.sql(Ollama 1024d),setup-db-ollama-v2.sql(Ollama 768d), orsetup-db-google.sql(Google AI). You cannot mix providers without re-embedding all documents.
| Issue | Solution |
|---|---|
| Can't connect to database | Check DATABASE_URL is set to your Neon pooled connection string |
| No search results | Check chunks table has embeddings; lower minScore |
| MCP tools not in Claude | Restart Claude Desktop; check curl http://localhost:3000/health |
| Rate limit exceeded | API: 100/min, Upload: 10/min |
| CodeQL Analyze job fails with SARIF/default setup error | This repo uses advanced CodeQL workflow (.github/workflows/codeql.yml); keep GitHub Code Scanning Default Setup disabled in repo settings |
Contributions welcome! See CONTRIBUTING.md for guidelines.
MIT - see LICENSE