Skip to content

Eomaxl/SentimentPulse

Repository files navigation

SentimentPulse

Real-time financial news sentiment pipeline — an open-source replica of the Bloomberg BQuant Textual Analytics workflow.

Fine-tuned FinBERT · SEC EDGAR + RSS ingestor · Kafka · TimescaleDB · FastAPI · React dashboard

Python 3.11 TypeScript


Model Benchmark

Evaluated on 349 held-out examples from FinancialPhraseBank + FiQA-SA:

Model Precision Recall F1 Notes
VADER 0.52 0.48 0.51 General-purpose lexicon; no financial domain knowledge
Loughran-McDonald 0.64 0.47 0.47 Finance lexicon; misses inter-word context
FinBERT zero-shot 0.79 0.75 0.76 ProsusAI/finbert — no fine-tuning
FinBERT fine-tuned 0.92 0.92 0.92 +16.5 pp over zero-shot

Per-class breakdown (fine-tuned model, 349 test examples):

Class Precision Recall F1 Support
negative 0.83 0.86 0.85 74
neutral 0.99 0.98 0.99 140
positive 0.90 0.90 0.90 135

Confusion matrices and the full classification report: docs/benchmarks/


Why FinBERT beats VADER on financial text

VADER assigns sentiment from a general-purpose word list. On financial text it fails in two systematic ways:

1. Missing domain negatives. "Goldman fell sharply after guidance cut" — no words VADER flags as negative. Score: 0.0 (neutral). Fine-tuned FinBERT: −0.74 (negative). ✓

2. Context blindness. "Revenues beat estimates despite margin compression" — VADER fires on "beat" and ignores the qualifier. FinBERT reads the full sentence: +0.18 (cautiously positive). ✓

The Loughran-McDonald dictionary handles domain vocabulary better but still can't capture inter-word context — it achieves 0.92 recall on neutral but only 0.23 on negative. Fine-tuning on FinancialPhraseBank + FiQA-SA resolves this across all three classes.


Architecture

SEC EDGAR ─┐
Yahoo RSS  ├─→ sentiment-ingestor ─→ raw.headlines (Kafka, 4 partitions)
Reuters    ┘         │ Redis dedup (48h)
                     ▼
          sentiment-classifier (FinBERT batch 64, spaCy NER)
                     │  p50 latency: 38s end-to-end
                     ▼
          enriched.sentiment (Kafka, 12 partitions, keyed by ticker)
                     │
                     ▼
          sentiment-persistence ─→ TimescaleDB hypertable (4 ticker shards)
                     │          └→ Redis HASH + Pub/Sub + drift z-score
                     ▼
          sentiment-api (FastAPI :8081) ─→ REST + WebSocket
                     │
                     ▼
          sentiment-dashboard (React :3001) ─→ Ticker tape · Chart · Heatmap · Alerts

Full diagram with component details: docs/architecture.md


Quick Start

Prerequisites

  • Docker Desktop ≥ 4.x
  • (Optional) GNEWS_API_KEY in .env
git clone https://github.com/eomaxl/SentimentPulse
cd SentimentPulse
cp .env.example .env        # add GNEWS_API_KEY if you have one

docker-compose up -d

All services start in dependency order. Dashboard is at http://localhost:3001 once sentiment-api is healthy (~30 s on first run).

Local development (Python)

python3.11 -m venv .venv && source .venv/bin/activate
pip install -e ".[model,pipeline]"

python -m scripts.data_prep    # download FinancialPhraseBank + FiQA-SA
python -m scripts.train        # fine-tune FinBERT (~5 min on Apple MPS)
python -m scripts.evaluate     # benchmark all 4 models

python classify.py --text "Apple beats Q1 earnings estimates by 12%"

Local development (React dashboard)

cd sentiment_dashboard
npm install
npm run dev      # http://localhost:3001 (proxies /v1 → localhost:8081)

Pipeline Throughput

4-hour steady-state run on Apple M-series:

Metric Result
Headlines ingested / hour ~420
End-to-end latency (p50) 38 s
End-to-end latency (p95) 98 s
Kafka consumer lag 2–6 messages
DLQ rate 0.4 %

Full results: benchmarks/sentiment_throughput.md


Repository Structure

sentimentpulse/
├── classify.py                    # CLI: python classify.py --text "..."
├── docker-compose.yml
├── pyproject.toml
│
├── scripts/
│   ├── data_prep.py               # download + split FinancialPhraseBank + FiQA-SA
│   ├── train.py                   # fine-tune FinBERT (WeightedTrainer)
│   └── evaluate.py                # benchmark all 4 models
│
├── models/finbert-finetuned/      # saved fine-tuned weights
├── data/
│   ├── sp500_tickers.csv
│   └── sp500_patterns.jsonl       # spaCy EntityRuler patterns (207 entries)
│
├── sentiment_ingestor/            # Phase 2: SEC EDGAR + RSS polling
├── sentiment_classifier/          # Phase 3: FinBERT inference + spaCy NER
├── sentiment_persistence/         # Phase 4: TimescaleDB + Redis writer + drift
├── sentiment_api/                 # Phase 4: FastAPI REST + WebSocket
├── sentiment_dashboard/           # Phase 5: React + recharts + Tailwind
│
├── db/migrations/                 # TimescaleDB DDL + hypertable + continuous agg
├── benchmarks/                    # throughput results
├── docs/
│   ├── architecture.md            # full ASCII architecture diagram
│   └── benchmarks/                # confusion matrix PNGs + classification report
└── notebooks/
    └── model_evaluation.ipynb

Related


Sourav Snigdha Mansingh · March 2026

About

Real-time financial news sentiment analysis pipeline using fine-tuned FinBERT. Open-source replica of Bloomberg BQuant Textual Analytics with 92% F1 score on financial text classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors