SentimentPulse

Real-time financial news sentiment pipeline — an open-source replica of the Bloomberg BQuant Textual Analytics workflow.

Fine-tuned FinBERT · SEC EDGAR + RSS ingestor · Kafka · TimescaleDB · FastAPI · React dashboard

Model Benchmark

Evaluated on 349 held-out examples from FinancialPhraseBank + FiQA-SA:

Model	Precision	Recall	F1	Notes
VADER	0.52	0.48	0.51	General-purpose lexicon; no financial domain knowledge
Loughran-McDonald	0.64	0.47	0.47	Finance lexicon; misses inter-word context
FinBERT zero-shot	0.79	0.75	0.76	`ProsusAI/finbert` — no fine-tuning
FinBERT fine-tuned	0.92	0.92	0.92	+16.5 pp over zero-shot

Per-class breakdown (fine-tuned model, 349 test examples):

Class	Precision	Recall	F1	Support
negative	0.83	0.86	0.85	74
neutral	0.99	0.98	0.99	140
positive	0.90	0.90	0.90	135

Confusion matrices and the full classification report: docs/benchmarks/

Why FinBERT beats VADER on financial text

VADER assigns sentiment from a general-purpose word list. On financial text it fails in two systematic ways:

1. Missing domain negatives. "Goldman fell sharply after guidance cut" — no words VADER flags as negative. Score: 0.0 (neutral). Fine-tuned FinBERT: −0.74 (negative). ✓

2. Context blindness. "Revenues beat estimates despite margin compression" — VADER fires on "beat" and ignores the qualifier. FinBERT reads the full sentence: +0.18 (cautiously positive). ✓

The Loughran-McDonald dictionary handles domain vocabulary better but still can't capture inter-word context — it achieves 0.92 recall on neutral but only 0.23 on negative. Fine-tuning on FinancialPhraseBank + FiQA-SA resolves this across all three classes.

Architecture

SEC EDGAR ─┐
Yahoo RSS  ├─→ sentiment-ingestor ─→ raw.headlines (Kafka, 4 partitions)
Reuters    ┘         │ Redis dedup (48h)
                     ▼
          sentiment-classifier (FinBERT batch 64, spaCy NER)
                     │  p50 latency: 38s end-to-end
                     ▼
          enriched.sentiment (Kafka, 12 partitions, keyed by ticker)
                     │
                     ▼
          sentiment-persistence ─→ TimescaleDB hypertable (4 ticker shards)
                     │          └→ Redis HASH + Pub/Sub + drift z-score
                     ▼
          sentiment-api (FastAPI :8081) ─→ REST + WebSocket
                     │
                     ▼
          sentiment-dashboard (React :3001) ─→ Ticker tape · Chart · Heatmap · Alerts

Full diagram with component details: docs/architecture.md

Quick Start

Prerequisites

Docker Desktop ≥ 4.x
(Optional) GNEWS_API_KEY in .env

git clone https://github.com/eomaxl/SentimentPulse
cd SentimentPulse
cp .env.example .env        # add GNEWS_API_KEY if you have one

docker-compose up -d

All services start in dependency order. Dashboard is at http://localhost:3001 once sentiment-api is healthy (~30 s on first run).

Local development (Python)

python3.11 -m venv .venv && source .venv/bin/activate
pip install -e ".[model,pipeline]"

python -m scripts.data_prep    # download FinancialPhraseBank + FiQA-SA
python -m scripts.train        # fine-tune FinBERT (~5 min on Apple MPS)
python -m scripts.evaluate     # benchmark all 4 models

python classify.py --text "Apple beats Q1 earnings estimates by 12%"

Local development (React dashboard)

cd sentiment_dashboard
npm install
npm run dev      # http://localhost:3001 (proxies /v1 → localhost:8081)

Pipeline Throughput

4-hour steady-state run on Apple M-series:

Metric	Result
Headlines ingested / hour	~420
End-to-end latency (p50)	38 s
End-to-end latency (p95)	98 s
Kafka consumer lag	2–6 messages
DLQ rate	0.4 %

Full results: benchmarks/sentiment_throughput.md

Repository Structure

sentimentpulse/
├── classify.py                    # CLI: python classify.py --text "..."
├── docker-compose.yml
├── pyproject.toml
│
├── scripts/
│   ├── data_prep.py               # download + split FinancialPhraseBank + FiQA-SA
│   ├── train.py                   # fine-tune FinBERT (WeightedTrainer)
│   └── evaluate.py                # benchmark all 4 models
│
├── models/finbert-finetuned/      # saved fine-tuned weights
├── data/
│   ├── sp500_tickers.csv
│   └── sp500_patterns.jsonl       # spaCy EntityRuler patterns (207 entries)
│
├── sentiment_ingestor/            # Phase 2: SEC EDGAR + RSS polling
├── sentiment_classifier/          # Phase 3: FinBERT inference + spaCy NER
├── sentiment_persistence/         # Phase 4: TimescaleDB + Redis writer + drift
├── sentiment_api/                 # Phase 4: FastAPI REST + WebSocket
├── sentiment_dashboard/           # Phase 5: React + recharts + Tailwind
│
├── db/migrations/                 # TimescaleDB DDL + hypertable + continuous agg
├── benchmarks/                    # throughput results
├── docs/
│   ├── architecture.md            # full ASCII architecture diagram
│   └── benchmarks/                # confusion matrix PNGs + classification report
└── notebooks/
    └── model_evaluation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SentimentPulse

Model Benchmark

Why FinBERT beats VADER on financial text

Architecture

Quick Start

Prerequisites

Local development (Python)

Local development (React dashboard)

Pipeline Throughput

Repository Structure

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
data		data
db/migrations		db/migrations
docs		docs
models/finbert-finetuned		models/finbert-finetuned
notebooks		notebooks
scripts		scripts
sentiment_api		sentiment_api
sentiment_classifier		sentiment_classifier
sentiment_dashboard		sentiment_dashboard
sentiment_ingestor		sentiment_ingestor
sentiment_persistence		sentiment_persistence
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
classify.py		classify.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

SentimentPulse

Model Benchmark

Why FinBERT beats VADER on financial text

Architecture

Quick Start

Prerequisites

Local development (Python)

Local development (React dashboard)

Pipeline Throughput

Repository Structure

Related

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages