Volume, News & Sentiment Analytics for FAANG Stocks
Capstone project correlating stock trading volume with news headlines, headline sentiment, and Wikipedia pageviews using Python, SQLite, pandas, and public data sources.
Built as my final project for the Python for Everybody specialization and as my first βseriousβ data project in my portfolio.
The goal of this project is to answer a simple question:
βWhen a company gets more attention in the news and on Wikipedia, does that show up in trading activity?β
To explore this, the project:
- Collects historical prices & volume for a small set of tickers (e.g. AAPL, AMZN).
- Collects news headlines from RSS feeds.
- Collects Wikipedia pageviews for the same companies.
- Computes simple sentiment scores on headlines.
- Stores everything in a normalized SQLite database.
- Uses SQL + pandas + Matplotlib to analyze and visualize relationships over time.
The project is designed in two phases:
- Phase 1 β Sample / downloaded data (offline files, easy to reproduce).
- Phase 2 β Live APIs (yfinance, Wikipedia Pageviews API, live RSS feeds) using the same schema and scripts.
Status: In active development (Week 1). Screenshots and visualizations will be added as features are completed.
[Visualization Preview]
This chart will show:
β Daily trading volume (blue line)
β News mention count (orange bars)
β Wikipedia pageviews (green line)
β Clear correlation patterns
β Interactive hover tooltips (Plotly version)
Status: In development - available after Phase 1 completion
Normalized SQLite database with 5 core tables:
- companies: Ticker metadata
- trading_volumes: Daily OHLCV data
- news_mentions: Scraped headlines
- wiki_pageviews: Daily attention metrics
- sentiment_scores: NLP analysis results
[Schema diagram coming soon]
$ python src/analyze_sql.py --ticker AAPL --days 30
Loading data for AAPL (last 30 days)...
β Found 30 trading days
β Found 127 news mentions
β Found 30 Wikipedia datapoints
Top 5 High-Attention Days:
ββββββββββββββββββββββββββββββββββββββββββββββββ
Date | Volume | News | Views | Sentiment
-----------|---------|------|-------|----------
2024-01-15 | 95.2M | 12 | 1.2M | +0.65
[... more rows ...]In the meantime, see the Repository Structure and Data Model sections for technical details.
Research Question:
Does public attention (news + Wikipedia) predict trading activity?
Personal Motivation:
With 6 years of trading experience, I've always wondered if "buzz" translates to volume. This project combines my trading background with new data skills to explore that question quantitatively.
What Makes It Unique:
- Combines three attention signals (news, Wikipedia, sentiment)
- Two-phase design (reproducible sample β live APIs)
- Production-quality code structure from a learning project
- Demonstrates full data pipeline: collection β storage β analysis β visualization
-
π§Ύ Multiple data formats
- CSV β prices & volume
- XML β RSS news headlines
- JSON β Wikipedia pageviews
- HTML (optional) β article parsing with BeautifulSoup
-
ποΈ Central SQLite database
companies,trading_volumes,news_mentions,
wiki_pageviews,sentiment_scores
-
π§ Text sentiment
- Basic positive/negative word counting on news headlines
- Per-headline and per-day sentiment scores
-
π Analysis & visualization
- Daily aggregation by ticker (volume, news count, sentiment, views)
- Time series charts: volume vs news mentions vs pageviews
- Keyword exploration (top words in headlines per ticker)
-
π§ͺ Educational focus
- Heavy use of core Python concepts from Python for Everybody:
- files, loops, lists, dictionaries
- CSV / JSON / XML parsing
- SQLite with
sqlite3 - plus pandas for higher-level analysis
- Heavy use of core Python concepts from Python for Everybody:
- Language: Python 3
- Data storage: SQLite (
sqlite3) - Data analysis: pandas
- Visualization: Matplotlib
- Parsing / scraping:
csv,json,xml.etree.ElementTreefeedparseror RSS parsing via XMLbeautifulsoup4(optional HTML parsing)
- Market data (Phase 2):
yfinance - Attention data (Phase 2): Wikipedia Pageviews API
Planned structure (evolves as project grows):
trading_attention_tracker/
ββ src/
β ββ __init__.py
β ββ db_init.py # create tables, get_connection()
β ββ load_sample_prices.py # Phase 1 β CSV β SQLite
β ββ load_sample_news.py # Phase 1 β XML (RSS) β SQLite
β ββ load_sample_wiki.py # Phase 1 β JSON β SQLite
β ββ fetch_live_prices.py # Phase 2 β yfinance β SQLite
β ββ fetch_live_news.py # Phase 2 β live RSS β SQLite
β ββ fetch_live_wiki.py # Phase 2 β Wikipedia API β SQLite
β ββ compute_sentiment.py # headline sentiment scoring
β ββ analyze_sql.py # pure-SQL style analysis (Py4E-style)
β ββ analyze_pandas.py # pandas-based analysis + plots
ββ data/
β ββ sample/ # CSV / XML / JSON sample files
β ββ live/ # cached API responses (optional)
ββ db/
β ββ market_news.sqlite # SQLite database
ββ docs/
β ββ diagrams.md # notes & architecture diagrams
β ββ charts/ # exported charts for README/portfolio
ββ .gitignore
ββ README.md
ββ requirements.txt
The core tables:
companies β tickers and company metadata
trading_volumes β daily close price & volume per company
news_mentions β news headlines, dates, sources, URLs
wiki_pageviews β daily pageviews per company
sentiment_scores β sentiment metrics per headline
Simplified diagram:
companies (1) ββββ< trading_volumes
β
βββββ< news_mentions (1) ββββ< sentiment_scores
β
βββββ< wiki_pageviews
Don't want to set up locally? View the project:
- π Sample Analysis Notebook - Pre-run results
- π Example Charts - Key visualizations
- ποΈ Database Schema - Table structure
Want to run it? Follow the full setup below (5 minutes):
git clone https://github.com/manuel-reyes-ml/trading_attention_tracker.git
cd trading_attention_tracker
python3 -m venv .venv
# macOS / Linux
source .venv/bin/activate
# Windows (PowerShell)
# .venv\Scripts\Activate.ps1
pip install -r requirements.txt
In Phase 1 you only need core libs:
pandas,matplotlib,beautifulsoup4.
In Phase 2 youβll addyfinanceand any HTTP / RSS helpers.
python src/db_init.py
This creates db/market_news.sqlite and all tables (companies, trading_volumes, etc.).
Phase 1 is designed to be completely reproducible without calling any external API.
Place files like prices_AAPL.csv, prices_AMZN.csv into data/sample/, then:
python src/load_sample_prices.py
This script:
- Reads each CSV with
csv.DictReader. - Inserts rows into
companiesandtrading_volumes.
Place rss_aapl.xml etc. into data/sample/, then:
python src/load_sample_news.py
This script:
- Parses XML with
xml.etree.ElementTree. - Extracts
<title>,<link>,<pubDate>. - Normalizes
pubDatetoYYYY-MM-DD. - Inserts into
news_mentions.
Place wiki_pageviews_sample.json in data/sample/, then:
python src/load_sample_wiki.py
This script:
- Parses JSON with
json.load. - Iterates over
itemslist (timestamp,views). - Inserts into
wiki_pageviews.
python src/compute_sentiment.py
This script:
- Reads all rows from
news_mentions. - Applies a simple sentiment function to each title.
- Inserts results into
sentiment_scores(pos/neg counts + score).
python src/analyze_sql.py
Example tasks in this script:
- βTop 10 days by volume & news count for AAPL.β
- βAverage sentiment by month.β
python src/analyze_pandas.py
Example plot (inside the script):
# in analyze_pandas.py
plot_news_vs_volume("AAPL")
Generates a chart like:
docs/charts/aapl_volume_vs_news.png
You can then reference it in the README:

After Phase 1 works with sample files, the next step is to swap in live data:
-
fetch_live_prices.py- Use
yfinanceto download OHLCV for selected tickers. - Insert into
trading_volumesusing the same schema.
- Use
-
fetch_live_wiki.py- Call the Wikipedia Pageviews API for each companyβs page.
- Insert into
wiki_pageviews.
-
fetch_live_news.py- Fetch RSS feeds directly from their URLs.
- Parse items and insert into
news_mentions.
Because the database schema is unchanged, your analysis scripts should work with no or minimal changes.
This project is designed to reinforce:
- Parsing CSV, JSON, XML, HTML in real scenarios.
- Designing & using a SQLite database from Python.
- Writing JOIN queries and aggregations.
- Using pandas as an analysis layer on top of raw SQL.
- Basic text processing & sentiment ideas.
- Building a reproducible, portfolio-quality repo (clear structure, README, scripts).
- Coming soon...
- SQLite database schema
- Sample data loaders (CSV, XML, JSON)
- Basic sentiment analysis
- SQL-based analysis scripts
- Integrate yfinance for real-time prices
- Wikipedia Pageviews API integration
- Live RSS feed parsing
- Automated daily updates
- Expand to 20+ tickers across sectors
- Advanced sentiment (VADER/TextBlob)
- Statistical significance testing (correlation p-values)
- Technical indicators (RSI, MACD)
- Interactive dashboard (Streamlit/Plotly)
- Automated daily data refresh (GitHub Actions)
- Deploy dashboard to Heroku/Streamlit Cloud
- Add alerts for significant attention spikes
MIT License - see LICENSE file for details
This project is part of my learning portfolio and available for educational purposes.
I'm Manuel, transitioning into data science after 6 years in trading. This project combines my domain expertise with new technical skills.
Connect with me:
Other Projects:
- Learning Journey - My complete learning roadmap (DA β ML β LLM)
- [Project 2] - Coming soon...
β If you found this project interesting, please give it a star!
If you have ideas or feedback, feel free to open an issue or fork the project.