Deal-intelligence system for tracking lead investment signals from target VC funds, extracting structured funding data, and serving that data to a frontend dashboard.
- Backend ingestion + extraction pipeline for funding/deal signals.
- Structured persistence with migration support.
- Frontend dashboard for review and drill-down.
- Test suite around extraction/storage flows.
Raw funding content is inconsistent across sources. This project standardizes extraction and tracks lead-signal quality so downstream analysis is reliable.
- Ingestion/scraping scripts for source collection.
- Extraction layer for structured deal parsing.
- Storage layer (
src/archivist) for normalized persistence. - API + frontend (
frontend) for visibility.
-
Structured extraction with strict schemas: Reduces bad data drift; tradeoff is extra handling for ambiguous articles.
-
Source-specific heuristics + shared normalization: Improves accuracy on noisy inputs; tradeoff is ongoing heuristic maintenance.
-
Clear separation of backend/frontend workspaces: Keeps deployment and debugging cleaner; tradeoff is more coordination across stacks.
Prerequisites: Python 3.11+ (project requires >=3.11).
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtPrerequisites: Node.js 18+ and npm dependencies installed.
cd frontend
npm install
npm run devPrerequisites: Python 3.11+ and project dependencies installed.
python3 -m pytest tests -q- Empty dashboard: verify backend data ingest path and API base config.
- Extraction quality drops: review source-specific parsing and confidence thresholds.
- Migration issues: check
alembicrevision state before rerunning ingestion.
- How I chose schema strictness vs extraction flexibility.
- How I debugged lead-signal misclassification.
- Why I split pipeline modules instead of one extraction script.
DECISIONS.mdBUILD_LOG.mdKNOWN_LIMITATIONS.mdDEMO.mdSECURITY.md