A full-stack financial intelligence dashboard that aggregates breaking news from major outlets, classifies articles by type, performs real-time NLP sentiment analysis, tracks correlations with market indicators, and collects structured data for future supervised learning models.
Live Demo: news-sentiment.imshaon.com
Article: Building a Real-Time Financial Sentiment Intelligence Dashboard (Medium — covers the initial build; the project has since added news type classification, correlation matrix, daily data collection, and automatic pipeline scheduling)
| Dashboard |
|---|
![]() |
This project combines financial news aggregation, NLP-driven sentiment analysis, live market data, and a data collection pipeline designed for future machine learning.
Articles are pulled from 14 RSS feeds spanning Bloomberg, CNBC, CNN, BBC, ABC News, Reuters, Yahoo Finance, Dow Jones, and Trump's Truth Social. Each article is filtered for financial and political relevance, classified into one of 12 news types (Financial, Political, China, Europe, War/Military, Tariff/Trade, etc.), scored using VADER NLP on the headline+summary concatenation, and persisted to a CSV archive with its type label.
Every pipeline run also snapshots the overall and per-source sentiment alongside live market closes (S&P 500, Gold, VIX, BTC) and Fear & Greed indices, building a growing time-series dataset for correlation analysis and future supervised model training.
- 14 curated RSS feeds — Bloomberg, CNBC, CNN (Top/Politics/Business), BBC (News/Business), ABC News (Top/Politics), Reuters, Yahoo Finance, Dow Jones, Trump Truth Social
- Smart filtering — Only financial and political news passes through; preferred sources ranked higher; Jim Cramer and Trump content boosted
- Truth Social filter — Trump posts filtered to only market-moving topics: tariffs, war, geopolitics, China, India, Iran, Europe, UK, Greenland, Epstein, midterms, Fed, crypto, rare earth
- 12 news type labels — Each article classified as Financial, Political, Trump Post, War/Military, China, Europe, India, Middle East, Tariff/Trade, Crypto, Fed/Monetary, US Local, or General
- Null headline filtering — Articles without headlines are excluded
- VADER NLP pipeline — Compound scoring on headline + summary concatenation for better accuracy on terse financial titles
- Percentage breakdown — Per-run positive / negative / neutral article distribution
- Trend detection — Half-split algorithm classifies sentiment trajectory as improving, declining, or stable over the last 24 snapshots
- Persistent history — Rolling 500-snapshot JSON sentiment history
- Candlestick charts — Financial-grade interactive charts for S&P 500, Gold, and VIX with 90-day OHLC data
- BTC tracker — Real-time Bitcoin price with 24h change and 90-day OHLC history
- Crypto Fear & Greed Index — Live crypto market sentiment via Alternative.me (free, no API key)
- Wall Street Fear & Greed Index — CNN stock market sentiment via RapidAPI (optional
RAPIDAPI_KEY)
- Sentiment vs market correlation — Heatmap showing Pearson correlation between news sentiment (overall and per source) and market indicators
- 8 time periods — 1 Day, 3 Days, 7 Days, 15 Days, 1 Month, 1 Quarter, 6 Months, 1 Year
- Per-source breakdown — Separate correlations for Bloomberg, CNBC, CNN, BBC, ABC News, Trump Truth Social, etc.
- Data collection start date displayed in the dashboard
- Three-tier auto-refresh — All run on the same web service (no extra cost). Set
DISABLE_SCHEDULED_PIPELINE=trueto turn off.- Every 15 min — Market trackers (S&P 500, Gold, VIX, BTC) + Fear & Greed indices (Crypto & Wall Street) refresh
- Every 1 hour — News pipeline fetches, filters, analyzes, and saves fresh articles + sentiment
- Daily after market close (21:30 UTC / 4:30 PM ET) — Snapshot saved to
daily_tracker.csvfor ML/correlation training
- Daily tracker (
data/daily_tracker.csv) — After each market close: date, sentiment score/label/percentages, article count, Crypto F&G, Wall Street F&G, S&P 500 close, Gold close, VIX close, BTC close - Source sentiment tracker (
data/source_sentiment.csv) — Per-source-category average sentiment with corresponding market data for correlation computation - News archive (
data/news_archive.csv) — All articles with sentiment scores, labels, source, URL, andnews_typeclassification — ready for supervised learning - Multi-label classification available —
classify_news_types_multi()returns all matching types per article for multi-label ML tasks
| Layer | Technology |
|---|---|
| Backend | Python 3.10+, Flask 3.0 |
| NLP | VADER (vaderSentiment) |
| News Classification | Regex-based multi-pattern classifier (12 types) |
| Market Data | yfinance (S&P 500, Gold, VIX, BTC) |
| Crypto Fear & Greed | Alternative.me (stdlib urllib) |
| Wall Street Fear & Greed | RapidAPI (optional) |
| News Ingestion | feedparser (14 RSS feeds), NewsAPI (optional) |
| Frontend | React 18.3, Vite 5.4 |
| Charts | lightweight-charts 4.2 |
| Containerisation | Docker |
| Deployment | Render (render.yaml blueprint) |
┌──────────────────────────────────────────────────────────────┐
│ React Frontend │
│ App.jsx → SentimentGauge · CorrelationMatrix · NewsList │
│ Sp500Chart · GoldChart · VixChart · FearGreedCard · BTC │
│ Vite 5 build → served by Flask in production │
└────────────────────────┬─────────────────────────────────────┘
│ HTTP REST
┌────────────────────────▼─────────────────────────────────────┐
│ Flask REST API │
│ api/app.py │
│ 17 routes · CORS · Cache-Control · Static serving │
└──┬──────────┬──────────┬─────────────────────────────────────┘
│ │ │
┌──▼──────┐ ┌▼────────┐ ┌▼─────────────────────────────────────┐
│ src/ │ │ data/ │ │ External Data Sources │
│ │ │ │ │ │
│ news_ │ │ news_ │ │ RSS: Bloomberg, CNBC, CNN, BBC, │
│ extrac │ │ archive │ │ ABC, Reuters, Yahoo, Dow Jones,│
│ tor.py │ │ .csv │ │ Trump Truth Social │
│ │ │ │ │ │
│ news_ │ │ sentim. │ │ Market: yfinance (sequential fetch) │
│ filter │ │ history │ │ Crypto F&G: Alternative.me │
│ .py │ │ .json │ │ WS F&G: RapidAPI (optional) │
│ │ │ │ │ NewsAPI: newsapi.org (optional) │
│ sentim. │ │ daily_ │ └──────────────────────────────────────┘
│ analyz │ │ tracker │
│ er.py │ │ .csv │
│ │ │ │
│ sentim. │ │ source_ │
│ track │ │ sentim. │
│ er.py │ │ .csv │
│ │ └──────────┘
│ daily_ │
│ track │
│ er.py │
│ │
│ market_ │
│ data.py │
│ │
│ fear_ │
│ greed │
│ .py │
│ │
│ wall_st │
│ _fg.py │
│ │
│ trump_ │
│ tweets │
│ .py │
└─────────┘
breaking_news_market_sentiment/
│
├── api/
│ └── app.py # Flask REST API — 17 endpoints, pipeline orchestration, static serving
│
├── frontend/
│ ├── src/
│ │ ├── App.jsx # Dashboard: gauges, correlation matrix, news table, charts
│ │ ├── App.css # Dark theme, responsive layout, type badges, heatmap styles
│ │ ├── Sp500Chart.jsx # S&P 500 candlestick chart (lightweight-charts)
│ │ ├── GoldChart.jsx # Gold candlestick chart
│ │ └── VixChart.jsx # VIX volatility chart
│ └── package.json # React 18.3, Vite 5.4, lightweight-charts 4.2
│
├── src/
│ ├── news_extractor.py # 14 RSS feeds + optional NewsAPI; Trump Truth Social via RSS
│ ├── news_filter.py # Financial/political filter, Truth Social topic filter,
│ │ # news_type classifier (12 types), multi-label support
│ ├── trump_tweets.py # Optional Trump X tweets (paid API); Truth Social RSS is free
│ ├── sentiment_analyzer.py # VADER NLP; headline+summary concat; percentage breakdown
│ ├── sentiment_tracker.py # CSV archive; JSON history (500-cap); trend detection
│ ├── daily_tracker.py # Daily snapshots: sentiment + all market indicators + per-source
│ │ # breakdown; correlation matrix computation (8 time periods)
│ ├── market_data.py # yfinance OHLC with threading lock; Pearson correlation engine
│ ├── fear_greed.py # Crypto Fear & Greed (Alternative.me, stdlib)
│ └── wall_street_fear_greed.py # Wall Street Fear & Greed (RapidAPI, optional)
│
├── data/ # Auto-created; gitignored
│ ├── news_archive.csv # All articles with sentiment + news_type labels
│ ├── sentiment_history.json # Rolling 500-snapshot sentiment time series
│ ├── daily_tracker.csv # Daily sentiment vs market indicators
│ └── source_sentiment.csv # Per-source sentiment vs market data (for correlations)
│
├── screenshots/
│ ├── dashboard.png
│ └── dashboard-hero.png
│
├── Dockerfile # Python 3.12 + Node.js; gunicorn with 120s timeout
├── render.yaml # Render blueprint (one-click deploy)
├── requirements.txt
├── start.sh # Dev: launches Flask + Vite concurrently
├── capture_screenshots.py # Playwright-based screenshot capture
├── .env.example
└── .gitignore
Every article is labelled with a news_type for structured analysis and future ML training:
| Type | Keywords / Triggers |
|---|---|
| Trump Post | Source is Truth Social or X |
| China | China, Beijing, Xi Jinping, Taiwan, Hong Kong, Huawei, TikTok |
| Middle East | Iran, Israel, Gaza, Saudi, OPEC, Iraq, Syria, Yemen |
| India | India, Modi, Mumbai, Nifty, Sensex, Rupee |
| Europe | EU, ECB, Germany, France, UK, Britain, eurozone, Bank of England |
| Tariff / Trade | Tariffs, trade war, sanctions, duties, customs |
| War / Military | War, attack, military, missile, NATO, conflict, nuclear |
| Crypto | Bitcoin, Ethereum, blockchain, DeFi, digital assets |
| Fed / Monetary | Federal Reserve, interest rates, Powell, FOMC, inflation, CPI |
| Political | Congress, Senate, elections, midterm, Epstein, executive order |
| Financial | Stocks, earnings, Wall Street, IPO, recession, S&P, Dow |
| US Local | Immigration, border, student loans, housing, FEMA |
| General | Articles not matching any specific category |
Region-specific types take priority over broad types (e.g. "Iran nuclear talks" classifies as Middle East, not War/Military).
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check |
/api/news |
GET | Latest 100 articles with sentiment scores, labels, and news_type |
/api/sentiment-summary |
GET | Latest sentiment snapshot with trend signal and percentage breakdown |
/api/sentiment-history |
GET | Full sentiment time series for charting |
/api/pipeline/run |
POST | Trigger fresh news ingestion, classification, analysis, and data collection |
| Endpoint | Method | Description |
|---|---|---|
/api/markets |
GET | S&P 500, Gold, VIX, and BTC OHLC (fetched sequentially) |
/api/markets/sp500 |
GET | S&P 500 OHLC only |
/api/markets/gold |
GET | Gold OHLC only |
/api/markets/vix |
GET | VIX data only |
/api/fear-greed |
GET | Crypto Fear & Greed Index |
/api/wall-street-fear-greed |
GET | Wall Street (CNN) Fear & Greed Index |
| Endpoint | Method | Description |
|---|---|---|
/api/correlation-matrix |
GET | Sentiment vs market correlation matrix for all 8 time periods |
/api/correlation-matrix?period=7d |
GET | Correlation for a specific period (1d, 3d, 7d, 15d, 1m, 1q, 6m, 1y) |
/api/daily-tracker |
GET | Raw daily sentiment + market indicator snapshots |
All market and fear-greed endpoints return
Cache-Control: no-store, no-cache, must-revalidateto prevent stale financial data from CDN or browser caching.
- Python 3.10+
- Node.js 18+
- Git
git clone https://github.com/ShaonINT/breaking_news_market_sentiment.git
cd breaking_news_market_sentimentpython -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtcd frontend
npm install
cd ..cp .env.example .env
# NEWSAPI_KEY=your_key — additional news sources
# RAPIDAPI_KEY=your_key — Wall Street Fear & Greed IndexDevelopment (two servers, hot reload):
./start.sh
# Frontend: http://localhost:3000
# API: http://localhost:5001Production build (single server):
cd frontend && npm run build && cd ..
python api/app.py
# Open http://localhost:5001Click Fetch Latest News to run the pipeline, or POST /api/pipeline/run.
- Push to GitHub
- Render Dashboard → New → Blueprint
- Connect your repository — Render auto-detects
render.yaml - Add a Persistent Disk (1 GB, $0.25/mo) mounted at
/app/datato preserve collected data across deploys - Set environment variables in the dashboard (
NEWSAPI_KEY,RAPIDAPI_KEY— both optional)
Correlations not showing on deployment? The matrix needs 3+ pipeline runs. Click "Fetch Latest News" 3 times on your deployed app (or wait a few hours for the hourly auto-runs). Ensure the disk is mounted at /app/data so data persists.
docker build -t news-sentiment .
docker run -p 5001:5001 -v ./data:/app/data news-sentimentMount ./data to persist the news archive, sentiment history, and daily tracker across container restarts.
The system is designed to accumulate structured data for future supervised learning:
| File | Contents | Use Case |
|---|---|---|
news_archive.csv |
All articles: title, summary, source, URL, sentiment scores, news_type |
Training data for text classification and sentiment prediction |
daily_tracker.csv |
Daily snapshots: sentiment, article count, S&P 500, Gold, VIX, BTC, Crypto F&G, WS F&G | Time-series regression: predict market moves from sentiment |
source_sentiment.csv |
Per-source avg sentiment + market data per day | Feature engineering: which sources predict which markets |
sentiment_history.json |
Rolling 500 sentiment snapshots with timestamps | Trend analysis and temporal patterns |
After collecting 30+ days of data:
- Load daily_tracker.csv — each row is a training sample
- Features:
sentiment_score,positive_pct,negative_pct,crypto_fear_greed_value,vix_close - Target: Next-day S&P 500 return (compute from
sp500_closeshifted by 1 day) - news_type breakdown: Aggregate per-type sentiment from
source_sentiment.csvas additional features - Train a model (XGBoost, Random Forest, or LSTM for sequence data)
The classify_news_types_multi() function in news_filter.py returns all matching types per article for multi-label classification tasks.
| Variable | Required | Description |
|---|---|---|
PORT |
No | Server port (default: 5001) |
NEWSAPI_KEY |
No | Free key from newsapi.org for expanded news coverage |
RAPIDAPI_KEY |
No | RapidAPI key for Wall Street (CNN) Fear & Greed; subscribe at RapidAPI |
TWITTER_BEARER_TOKEN |
No | X API v2 bearer token for Trump tweets (paid). Truth Social RSS works without this |
FLASK_DEBUG |
No | Set to true for auto-reload during development |
DISABLE_SCHEDULED_PIPELINE |
No | Set to true to disable all automatic scheduled jobs |
MARKET_REFRESH_MINUTES |
No | Market tracker + F&G refresh interval (default: 15) |
NEWS_PIPELINE_MINUTES |
No | News pipeline interval (default: 60) |
DAILY_SNAPSHOT_HOUR |
No | Hour (UTC) for daily ML snapshot (default: 21 = 9:30 PM UTC / 4:30 PM ET) |
DAILY_SNAPSHOT_MINUTE |
No | Minute for daily ML snapshot (default: 30) |
DATA_DIR |
No | Override data directory (default: project/data). On Render: mount disk at /app/data |
Why RSS over a commercial news API? RSS is free, zero-dependency, and updates in near-real-time. Every major financial publisher maintains active feeds. No registration or API key required — the system works immediately in any environment.
Why VADER over FinBERT? No GPU, no model download, no inference latency. For a real-time pipeline processing dozens of articles per cycle, the speed trade-off is significant. The modular architecture allows a drop-in NLP engine replacement without changing the API contract or the frontend.
Why headline + summary concatenation? Financial headlines are often terse and ambiguous in isolation ("Fed acts" could be bullish or bearish). Appending the article summary significantly improves classification accuracy without additional model complexity.
Why regex-based news type classification? For the labelling task, regex patterns are deterministic, fast, and require no training data. They produce clean, consistent labels that serve as ground-truth for training a more sophisticated ML classifier later.
Why sequential yfinance fetching?
Concurrent requests to yfinance introduce a caching bug where multiple assets return the same data. The /api/markets endpoint fetches each asset sequentially behind a threading lock — slower, but reliably correct.
Why filter Truth Social posts? Trump posts 20-30 times daily. Without filtering, they overwhelm the feed with birthday wishes and rally thanks. Only posts matching specific market-moving topics (tariffs, geopolitics, Fed, crypto, etc.) pass through.
- Upgrade sentiment engine to FinBERT (transformer-based financial NLP)
- Train supervised model on accumulated daily_tracker data (XGBoost / LSTM)
- Per-news-type sentiment correlation (e.g. "China news sentiment vs S&P 500")
- Statistical significance testing on correlations (Spearman rank + p-values)
- WebSocket support for real-time sentiment push updates
- Alert system — email / Slack notifications on extreme sentiment shifts
- Sentiment backtesting against historical price data
- Auto-scheduled pipeline runs (cron-based, not just manual)
Contributions are welcome. Please open an issue to discuss changes before submitting a pull request.
MIT © Shaon Biswas
Built with Python, Flask, React, and a passion for making financial intelligence accessible.
