-
Notifications
You must be signed in to change notification settings - Fork 1
Pre-Prod Task List — SOLVRO MCP (MVP Blockers/Improvements) #17
Copy link
Copy link
Open
Task
0 / 70 of 7 issues completed
Copy link
Labels
MCPMCP related taskMCP related task
Description
Reference: Priority order for MVP launch
1. Persistent User & Session Storage
- Replace in-memory SessionManager with a real database (PostgreSQL or MongoDB as a new
compose.stack.ymlservice) - Migrate
ConversationSessionandMessagemodels to SQLAlchemy / Motor async ORM - Store: user_id, session_id, messages[], created_at, is_active, metadata
- Add user registration/auth (JWT or API key) — currently any user_id string is accepted with zero validation
- Migrate
session_manager.pyto async DB calls; drop the threading.Lock
2. Multi-threaded / Concurrent Data Pipeline
- Make the Prefect pipeline process documents in parallel
- Replace sequential
for page in pages:loop inpipeline.pywithasyncio.gather()or Prefect'stask.submit()with thread/process pool - Each page: extract → generate Cypher → populate runs concurrently (configurable concurrency limit)
- Add idempotency: track processed documents (hash → DB)
- Write integration test: mock 10 pages and verify concurrent write without schema reflection race
3. Google Drive as Data Source
- Replace Azure Blob with Google Drive in
data_acquisition.py - Authenticate via Google Drive API (service account JSON, secret-managed)
- List files from configured Drive folder (support PDF, DOCX, TXT)
- Download to temp dir, pass paths to OCR extraction
- Add
GOOGLE_DRIVE_FOLDER_IDandGOOGLE_SERVICE_ACCOUNT_JSONto.env.example - Handle pagination for large folders
4. Neo4j Graph State Snapshot (Data Dump)
- Export full graph after pipeline via
CALL apoc.export.cypher.all()to.cypherdump (stored in cloud) - On pipeline startup, check for dump and import if present, skipping LLM extraction
- Add
just dump-graphandjust restore-graphrecipes - Track pipeline_run metadata in Neo4j, only process new/changed files
5. Frontend Improvements
- Conversation naming (LLM-powered title after first reply; store in
session.metadata["title"]) - Conversation list sidebar (titles, timestamp, rename/delete)
- Message streaming (SSE for
/api/chat) - Empty and error states (suggested questions, API error surfacing)
6. Component & Integration Tests
- Core logic test coverage is near zero; see recommended coverage targets for:
- Guardrails, schema caching, cypher generation, pipeline execution
- Full API/graph integration tests with real or mocked backends
7. (Extra) Query Caching & Rate Limiting
- Add semantic query cache (Redis + pgvector); check for near-duplicates before running LLM/graph
- Rate limiting on
/api/chat(slowapi middleware), default 20 req/min per user - Both prod features toggleable in
.env
Blocking/effort tags:
- Graph dump/restore — Low effort, blocks prod (MVP must-have)
- Google Drive source — Medium effort, blocks prod
- Tests — Medium effort, deosn't blocks prod
- Persistent storage — High effort, blocks prod (sessions lost on restart!)
- Concurrent pipeline — Medium, not a blocker
- Frontend UX — Medium, not a blocker
- Cache/rate limit — Low, not a blocker
See full task list above for details. Prioritize by block/effort.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
MCPMCP related taskMCP related task