Pre-Prod Task List — SOLVRO MCP (MVP Blockers/Improvements)

**Reference: Priority order for MVP launch**

---

### 1. Persistent User & Session Storage
- Replace in-memory SessionManager with a real database (PostgreSQL or MongoDB as a new `compose.stack.yml` service)
- Migrate `ConversationSession` and `Message` models to SQLAlchemy / Motor async ORM
- Store: user_id, session_id, messages[], created_at, is_active, metadata
- Add user registration/auth (JWT or API key) — currently any user_id string is accepted with zero validation
- Migrate `session_manager.py` to async DB calls; drop the threading.Lock

### 2. Multi-threaded / Concurrent Data Pipeline
- Make the Prefect pipeline process documents in parallel
- Replace sequential `for page in pages:` loop in `pipeline.py` with `asyncio.gather()` or Prefect's `task.submit()` with thread/process pool
- Each page: extract → generate Cypher → populate runs concurrently (configurable concurrency limit)
- Add idempotency: track processed documents (hash → DB)
- Write integration test: mock 10 pages and verify concurrent write without schema reflection race

### 3. Google Drive as Data Source
- Replace Azure Blob with Google Drive in `data_acquisition.py`
- Authenticate via Google Drive API (service account JSON, secret-managed)
- List files from configured Drive folder (support PDF, DOCX, TXT)
- Download to temp dir, pass paths to OCR extraction
- Add `GOOGLE_DRIVE_FOLDER_ID` and `GOOGLE_SERVICE_ACCOUNT_JSON` to `.env.example`
- Handle pagination for large folders

### 4. Neo4j Graph State Snapshot (Data Dump)
- Export full graph after pipeline via `CALL apoc.export.cypher.all()` to `.cypher` dump (stored in cloud)
- On pipeline startup, check for dump and import if present, skipping LLM extraction
- Add `just dump-graph` and `just restore-graph` recipes
- Track pipeline_run metadata in Neo4j, only process new/changed files

### 5. Frontend Improvements
- Conversation naming (LLM-powered title after first reply; store in `session.metadata["title"]`)
- Conversation list sidebar (titles, timestamp, rename/delete)
- Message streaming (SSE for `/api/chat`)
- Empty and error states (suggested questions, API error surfacing)

### 6. Component & Integration Tests
- Core logic test coverage is near zero; see recommended coverage targets for:
  - Guardrails, schema caching, cypher generation, pipeline execution
  - Full API/graph integration tests with real or mocked backends

### 7. (Extra) Query Caching & Rate Limiting
- Add semantic query cache (Redis + pgvector); check for near-duplicates before running LLM/graph
- Rate limiting on `/api/chat` (slowapi middleware), default 20 req/min per user
- Both prod features toggleable in `.env`

---
**Blocking/effort tags:**
- Graph dump/restore — Low effort, blocks prod (MVP must-have)
- Google Drive source — Medium effort, blocks prod
- Tests — Medium effort, deosn't blocks prod
- Persistent storage — High effort, blocks prod (sessions lost on restart!)
- Concurrent pipeline — Medium, not a blocker
- Frontend UX — Medium, not a blocker
- Cache/rate limit — Low, not a blocker

---
**See full task list above for details. Prioritize by block/effort.**

---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-Prod Task List — SOLVRO MCP (MVP Blockers/Improvements) #17

1. Persistent User & Session Storage

2. Multi-threaded / Concurrent Data Pipeline

3. Google Drive as Data Source

4. Neo4j Graph State Snapshot (Data Dump)

5. Frontend Improvements

6. Component & Integration Tests

7. (Extra) Query Caching & Rate Limiting

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pre-Prod Task List — SOLVRO MCP (MVP Blockers/Improvements) #17

Description

1. Persistent User & Session Storage

2. Multi-threaded / Concurrent Data Pipeline

3. Google Drive as Data Source

4. Neo4j Graph State Snapshot (Data Dump)

5. Frontend Improvements

6. Component & Integration Tests

7. (Extra) Query Caching & Rate Limiting

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions