Women's health M&A diligence stack — verified deals, genomics governance, cited analytics
Curated dataset · n=58 verified deals · Not live market data · Scores are descriptive, not forecasts.
The live app reads src/data/dataset.verified.json by default. Server-side LLM calls use Vercel AI Gateway only. TensorFlow code is quarantined (not in the app). See MODEL_CARD.md before citing any score.
- Overview
- What is Lacuna?
- Live Demo
- Core Features
- Descriptive analytics
- Health Equity context
- Clinical Trials
- Genomics variant store
- Academic frameworks
- Typography
- Technology Stack
- Quick Start
- Data Curation
- Documentation
- License
Lacuna is an investment research stack — a diligence infrastructure prototype with a curated, source-linked snapshot of women's health M&A (58 verified deals), rendered as D3 network views and descriptive analytics with published methodology.
| Claim | Reality |
|---|---|
| Deal data | Static dataset.verified.json (manual verification from SEC, press, filings) |
| Scores & "predictors" | Deterministic rules and small-n statistics — MODEL_CARD.md |
| "ML" / TensorFlow | Quarantined under src/lib/ml/_quarantine/ — not imported by the app |
| Server LLM | INFERENCE.md — Vercel AI Gateway (+ OpenAI fallback for local dev) |
| Clinical trials panel | Live ClinicalTrials.gov search; M&A panels still use the curated dataset |
| Production intelligence | No — not PitchBook, not a data SLA, not investment advice |
Open source under BSL 1.1 for corp VC diligence workflows, portfolio review, and self-hosted exploration. Commercial competitive products need a separate license — mps5cy@virginia.edu.
Portfolio project by Mae Kass.
Deployment: The analytics product runs on Vercel (this repo). A separate Framer site is for brand and narrative only, with one primary CTA into the live demo — see SITE_ARCHITECTURE.md and the framer/ build kit.
An open-source diligence prototype for corporate VC and healthcare investors exploring verified women's health / FemTech M&A:
- D3.js force-directed acquirer–target graphs
- Deal flow & valuation charts (curated counts and disclosed values)
- Descriptive scoring (factor weights, cosine similarity, k-means — no trained forecast models in the UI)
- ClinicalTrials.gov lookup (live API; separate from deal JSON)
- Health-equity context with cited disparity statistics (descriptive, not allocation advice)
Every analytical panel in the app shows the provenance line above.
| Resource | Link |
|---|---|
| Application | lacuna-maekass.vercel.app |
| Repository | github.com/maekass/Lacuna |
| Methodology | docs/MODEL_CARD.md |
| License | BSL 1.1 → Apache 2.0 May 2030 |
- 58 verified acquisitions (fertility, oncology, diagnostics, menopause, pelvic health, precision medicine)
- Acquirers include Hologic, KKR, Pfizer, Gilead, Boston Scientific, and others named in sources
- Dataset v5 · updated per
provenance.lastUpdatedin JSON - Sources: SEC EDGAR, press releases, investor relations (see DATA_CURATION_CHECKLIST.md)
D3 force-directed graph: sector colors, deal-type edges, valuation-scaled nodes. Methodology: NETWORK_ANALYSIS_METHODOLOGY.md.
Year-over-year counts from verified announcedDate — animated bars, no
synthetic deal generator.
Sector × stage heatmap using disclosed values only; cells show company counts and averages.
New: Heuristic valuation and exit-likelihood section with:
- ValuationEngine — bounded comparable multiples (EV/Revenue, EV/EBITDA) with uncertainty disclosures
- AcquisitionPredictor — sector-stage acquisition probability estimates (15/75 coverage noted)
- HealthImpactModeler — lives-saved modeling with Cohen's d bounds (not a rate)
- PortfolioOptimizer — stage-varying risk-adjusted ROI optimizer
- Verified-fields-only adapter —
adaptQuantCompanyuses only validated dataset fields; absent inputs remain undefined per provenance rules
See MODEL_CARD.md for methodology and caveats.
Curated dataset · n=58 verified deals · Not live market data · Scores are descriptive, not forecasts.
Transparent factor scoring for non-acquired companies in the verified set. Fixed weights, full disclosure in UI and MODEL_CARD.md. Not a predictive model; no TensorFlow.
8-D feature vectors, inline cosine similarity — "companies like this" for exploration.
k-means on valuation × employees — descriptive segments (Emerging / Growth / Late-stage labels).
Optional server narratives (INFERENCE.md)
- UI blurbs via
POST /api/ai/insights→ Vercel AI Gateway (anthropic/claude-sonnet-4slug). - Exploratory copy only — heuristic scores on the curated dataset remain authoritative.
Descriptive context on disease areas with documented disparities and public market-size estimates — for learning, not buy/sell recommendations:
| Disease | Disparity (cited in docs) | Public market-size estimate |
|---|---|---|
| Maternal Health | Higher mortality disparity | $12B |
| Uterine Fibroids | High prevalence | $34B |
| Lupus | Higher prevalence | $8B |
| Sickle Cell Disease | Population concentration | $5B |
| Cardiovascular Disease | Higher mortality | $15B |
See OAIS_METHODOLOGY.md for scoring limits.
- Live:
/api/clinical-trials→ ClinicalTrials.gov API v2 (search, batch lookup) - Curated M&A: unchanged — still
dataset.verified.json
Do not conflate live trial search volume with verified deal coverage.
Large VCF/gVCF call sets use a two-tier layout (off by default on Vercel):
| Tier | Technology | Contents |
|---|---|---|
| Object storage | Local data/variants/ or S3 |
Multi-GB raw VCF blobs |
| Variant catalog | ClickHouse | Callset metadata + queryable variant summaries |
- Dashboard:
VariantCallsetBrowser— browse callsets, filter by gene, presigned S3 download when configured - APIs:
/api/genomics/callsets,/api/genomics/variants,/api/genomics/callsets/{id}/object - Ingest:
npm run clickhouse:ingest-vcf— stream parser → object storage → batch INSERT - Docs: GENOMICS_VARIANT_STORE.md
docker compose up -d clickhouse
# .env.local: LACUNA_VARIANT_STORE=clickhouse, CLICKHOUSE_URL=http://lacuna:lacuna@localhost:8123
npm run clickhouse:migrate && npm run clickhouse:seed
npm run devNot clinical-grade genomics infrastructure — infrastructure demo with honest provenance labels.
Six frameworks with explicit small-n limits documented in docs/ (causal
DAG, fairness audit, network concentration, etc.). We state what cannot be
claimed with n≈58 deals — see methodology files linked from the app.
The live app loads Playfair Display (Didone serif) via next/font/google
and applies it app-wide — body copy, headings, and font-mono utilities share
the same family for a high-contrast editorial look.
GitHub does not load custom web fonts. This README uses a Didone fallback
stack (Didot, Bodoni MT, Georgia) so the page reads closer to the product on
github.com. Only the live demo renders true
Playfair Display.
| Layer | Used in production UI |
|---|---|
Playfair Display (next/font/google) |
App-wide Didone serif typography |
| Next.js 16, React 19, Tailwind v4 | App shell |
| D3.js v7, Framer Motion | Visualization |
| simple-statistics | Descriptive stats / similarity / quant engine |
Verified JSON (getVerifiedDataset()) |
Default data path; static import for Vercel serverless |
| PostgreSQL | Optional LACUNA_DATA_MODE=db |
| ClickHouse + S3/local object storage | Optional variant call-set catalog (LACUNA_VARIANT_STORE=clickhouse) |
| Vercel AI Gateway + AI SDK | Optional narratives + SEC classification (INFERENCE.md) |
| TensorFlow.js | Quarantined — devDependency for Vitest only |
| Deno (CI) | deno fmt and deno lint in GitHub Actions |
CI Status: deno fmt, deno lint, eslint, vitest (297 tests),
next build + tsc all green on main.
git clone https://github.com/maekass/Lacuna.git
cd Lacuna
npm install
npm run dev
npm run validate:dataset
npm run infra:check
npm testOpen http://localhost:3000. Data loads from src/data/dataset.verified.json
unless LACUNA_DATA_MODE=db is set and Postgres is provisioned.
Optional local Postgres: docker compose up -d → copy
.env.example to .env.local →
npm run db:migrate && npm run db:import. See
INFRASTRUCTURE.md.
Optional variant store: docker compose up -d clickhouse → set
LACUNA_VARIANT_STORE=clickhouse →
npm run clickhouse:migrate && npm run clickhouse:seed. See
GENOMICS_VARIANT_STORE.md.
Manual verification — no synthetic maDeals. Workflow:
DATA_CURATION_CHECKLIST.md,
npm run validate:dataset, optional npm run sec:scan.
| Doc | Purpose |
|---|---|
| MODEL_CARD.md | Start here — what each score is and is not |
| INFERENCE.md | Server-side LLM (AI Gateway) |
| DATA_CURATION_CHECKLIST.md | Schema, validation, staging |
| NETWORK_ANALYSIS_METHODOLOGY.md | Graph metrics, small-n |
| OAIS_METHODOLOGY.md | Health impact scoring limits |
| INFRASTRUCTURE.md | CI, Vercel, Postgres, cron, /api/health |
| PERFORMANCE.md | Bundle, caching, probe split, fan-out limits |
| GENOMICS_VARIANT_STORE.md | ClickHouse + object storage for large VCF catalogs |
| MONITORING.md | Uptime URL: /api/health only (not /ready) |
| PRODUCTION_SETUP.md | Vercel env vars and migrations |
| SEC_INGESTION.md | SEC EDGAR cron pipeline |
| SITE_ARCHITECTURE.md | Vercel product vs Framer marketing (no analytics in Framer) |
| framer/BUILD_GUIDE.md | Framer marketing site — copy, tokens, HTML prototype |
| AGENTS.md | Contributor conventions |
BSL 1.1 — research/education production use allowed; Competitive Offerings (commercial women's-health M&A intelligence products) require a separate agreement. Converts to Apache 2.0 May 2030.
mps5cy@virginia.edu for commercial licensing.
Mae Kass — open investment-research tools for women's health data literacy and honest analytics.