Real-time financial news sentiment pipeline — an open-source replica of the Bloomberg BQuant Textual Analytics workflow.
Fine-tuned FinBERT · SEC EDGAR + RSS ingestor · Kafka · TimescaleDB · FastAPI · React dashboard
Evaluated on 349 held-out examples from FinancialPhraseBank + FiQA-SA:
| Model | Precision | Recall | F1 | Notes |
|---|---|---|---|---|
| VADER | 0.52 | 0.48 | 0.51 | General-purpose lexicon; no financial domain knowledge |
| Loughran-McDonald | 0.64 | 0.47 | 0.47 | Finance lexicon; misses inter-word context |
| FinBERT zero-shot | 0.79 | 0.75 | 0.76 | ProsusAI/finbert — no fine-tuning |
| FinBERT fine-tuned | 0.92 | 0.92 | 0.92 | +16.5 pp over zero-shot |
Per-class breakdown (fine-tuned model, 349 test examples):
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| negative | 0.83 | 0.86 | 0.85 | 74 |
| neutral | 0.99 | 0.98 | 0.99 | 140 |
| positive | 0.90 | 0.90 | 0.90 | 135 |
Confusion matrices and the full classification report: docs/benchmarks/
VADER assigns sentiment from a general-purpose word list. On financial text it fails in two systematic ways:
1. Missing domain negatives. "Goldman fell sharply after guidance cut" — no words VADER flags as negative. Score: 0.0 (neutral). Fine-tuned FinBERT: −0.74 (negative). ✓
2. Context blindness. "Revenues beat estimates despite margin compression" — VADER fires on "beat" and ignores the qualifier. FinBERT reads the full sentence: +0.18 (cautiously positive). ✓
The Loughran-McDonald dictionary handles domain vocabulary better but still can't capture inter-word context — it achieves 0.92 recall on neutral but only 0.23 on negative. Fine-tuning on FinancialPhraseBank + FiQA-SA resolves this across all three classes.
SEC EDGAR ─┐
Yahoo RSS ├─→ sentiment-ingestor ─→ raw.headlines (Kafka, 4 partitions)
Reuters ┘ │ Redis dedup (48h)
▼
sentiment-classifier (FinBERT batch 64, spaCy NER)
│ p50 latency: 38s end-to-end
▼
enriched.sentiment (Kafka, 12 partitions, keyed by ticker)
│
▼
sentiment-persistence ─→ TimescaleDB hypertable (4 ticker shards)
│ └→ Redis HASH + Pub/Sub + drift z-score
▼
sentiment-api (FastAPI :8081) ─→ REST + WebSocket
│
▼
sentiment-dashboard (React :3001) ─→ Ticker tape · Chart · Heatmap · Alerts
Full diagram with component details: docs/architecture.md
- Docker Desktop ≥ 4.x
- (Optional)
GNEWS_API_KEYin.env
git clone https://github.com/eomaxl/SentimentPulse
cd SentimentPulse
cp .env.example .env # add GNEWS_API_KEY if you have one
docker-compose up -dAll services start in dependency order. Dashboard is at http://localhost:3001 once sentiment-api is healthy (~30 s on first run).
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e ".[model,pipeline]"
python -m scripts.data_prep # download FinancialPhraseBank + FiQA-SA
python -m scripts.train # fine-tune FinBERT (~5 min on Apple MPS)
python -m scripts.evaluate # benchmark all 4 models
python classify.py --text "Apple beats Q1 earnings estimates by 12%"cd sentiment_dashboard
npm install
npm run dev # http://localhost:3001 (proxies /v1 → localhost:8081)4-hour steady-state run on Apple M-series:
| Metric | Result |
|---|---|
| Headlines ingested / hour | ~420 |
| End-to-end latency (p50) | 38 s |
| End-to-end latency (p95) | 98 s |
| Kafka consumer lag | 2–6 messages |
| DLQ rate | 0.4 % |
Full results: benchmarks/sentiment_throughput.md
sentimentpulse/
├── classify.py # CLI: python classify.py --text "..."
├── docker-compose.yml
├── pyproject.toml
│
├── scripts/
│ ├── data_prep.py # download + split FinancialPhraseBank + FiQA-SA
│ ├── train.py # fine-tune FinBERT (WeightedTrainer)
│ └── evaluate.py # benchmark all 4 models
│
├── models/finbert-finetuned/ # saved fine-tuned weights
├── data/
│ ├── sp500_tickers.csv
│ └── sp500_patterns.jsonl # spaCy EntityRuler patterns (207 entries)
│
├── sentiment_ingestor/ # Phase 2: SEC EDGAR + RSS polling
├── sentiment_classifier/ # Phase 3: FinBERT inference + spaCy NER
├── sentiment_persistence/ # Phase 4: TimescaleDB + Redis writer + drift
├── sentiment_api/ # Phase 4: FastAPI REST + WebSocket
├── sentiment_dashboard/ # Phase 5: React + recharts + Tailwind
│
├── db/migrations/ # TimescaleDB DDL + hypertable + continuous agg
├── benchmarks/ # throughput results
├── docs/
│ ├── architecture.md # full ASCII architecture diagram
│ └── benchmarks/ # confusion matrix PNGs + classification report
└── notebooks/
└── model_evaluation.ipynb
- ProsusAI/finbert — base model
- FinancialPhraseBank — training data
- Bloomberg BQuant Textual Analytics — the professional product this replicates
Sourav Snigdha Mansingh · March 2026