📈 Trading Attention Tracker

Volume, News & Sentiment Analytics for FAANG Stocks

Capstone project correlating stock trading volume with news headlines, headline sentiment, and Wikipedia pageviews using Python, SQLite, pandas, and public data sources.

Built as my final project for the Python for Everybody specialization and as my first “serious” data project in my portfolio.

🔍 Project Overview

The goal of this project is to answer a simple question:

“When a company gets more attention in the news and on Wikipedia, does that show up in trading activity?”

To explore this, the project:

Collects historical prices & volume for a small set of tickers (e.g. AAPL, AMZN).
Collects news headlines from RSS feeds.
Collects Wikipedia pageviews for the same companies.
Computes simple sentiment scores on headlines.
Stores everything in a normalized SQLite database.
Uses SQL + pandas + Matplotlib to analyze and visualize relationships over time.

The project is designed in two phases:

Phase 1 – Sample / downloaded data (offline files, easy to reproduce).
Phase 2 – Live APIs (yfinance, Wikipedia Pageviews API, live RSS feeds) using the same schema and scripts.

📸 Screenshots

Status: In active development (Week 1). Screenshots and visualizations will be added as features are completed.

📊 Volume vs News Analysis

[Visualization Preview]

This chart will show:
✓ Daily trading volume (blue line)
✓ News mention count (orange bars)  
✓ Wikipedia pageviews (green line)
✓ Clear correlation patterns
✓ Interactive hover tooltips (Plotly version)

Status: In development - available after Phase 1 completion

🗄️ Database Schema

Normalized SQLite database with 5 core tables:
- companies: Ticker metadata
- trading_volumes: Daily OHLCV data
- news_mentions: Scraped headlines
- wiki_pageviews: Daily attention metrics
- sentiment_scores: NLP analysis results

[Schema diagram coming soon]

💻 Sample Terminal Output

$ python src/analyze_sql.py --ticker AAPL --days 30

Loading data for AAPL (last 30 days)...
✓ Found 30 trading days
✓ Found 127 news mentions  
✓ Found 30 Wikipedia datapoints

Top 5 High-Attention Days:
════════════════════════════════════════════════
Date       | Volume  | News | Views | Sentiment
-----------|---------|------|-------|----------
2024-01-15 | 95.2M   | 12   | 1.2M  | +0.65
[... more rows ...]

In the meantime, see the Repository Structure and Data Model sections for technical details.

💡 Why This Project?

Research Question:
Does public attention (news + Wikipedia) predict trading activity?

Personal Motivation:
With 6 years of trading experience, I've always wondered if "buzz" translates to volume. This project combines my trading background with new data skills to explore that question quantitatively.

What Makes It Unique:

Combines three attention signals (news, Wikipedia, sentiment)
Two-phase design (reproducible sample → live APIs)
Production-quality code structure from a learning project
Demonstrates full data pipeline: collection → storage → analysis → visualization

✨ Key Features

🧾 Multiple data formats
- CSV → prices & volume
- XML → RSS news headlines
- JSON → Wikipedia pageviews
- HTML (optional) → article parsing with BeautifulSoup
🗄️ Central SQLite database
- companies, trading_volumes, news_mentions,
  wiki_pageviews, sentiment_scores
🧠 Text sentiment
- Basic positive/negative word counting on news headlines
- Per-headline and per-day sentiment scores
📊 Analysis & visualization
- Daily aggregation by ticker (volume, news count, sentiment, views)
- Time series charts: volume vs news mentions vs pageviews
- Keyword exploration (top words in headlines per ticker)
🧪 Educational focus
- Heavy use of core Python concepts from Python for Everybody:
  - files, loops, lists, dictionaries
  - CSV / JSON / XML parsing
  - SQLite with sqlite3
  - plus pandas for higher-level analysis

🧱 Tech Stack

Language: Python 3
Data storage: SQLite (sqlite3)
Data analysis: pandas
Visualization: Matplotlib
Parsing / scraping:
- csv, json, xml.etree.ElementTree
- feedparser or RSS parsing via XML
- beautifulsoup4 (optional HTML parsing)
Market data (Phase 2): yfinance
Attention data (Phase 2): Wikipedia Pageviews API

🗂️ Repository Structure

Planned structure (evolves as project grows):

trading_attention_tracker/
├─ src/
│  ├─ __init__.py
│  ├─ db_init.py               # create tables, get_connection()
│  ├─ load_sample_prices.py    # Phase 1 – CSV → SQLite
│  ├─ load_sample_news.py      # Phase 1 – XML (RSS) → SQLite
│  ├─ load_sample_wiki.py      # Phase 1 – JSON → SQLite
│  ├─ fetch_live_prices.py     # Phase 2 – yfinance → SQLite
│  ├─ fetch_live_news.py       # Phase 2 – live RSS → SQLite
│  ├─ fetch_live_wiki.py       # Phase 2 – Wikipedia API → SQLite
│  ├─ compute_sentiment.py     # headline sentiment scoring
│  ├─ analyze_sql.py           # pure-SQL style analysis (Py4E-style)
│  └─ analyze_pandas.py        # pandas-based analysis + plots
├─ data/
│  ├─ sample/                  # CSV / XML / JSON sample files
│  └─ live/                    # cached API responses (optional)
├─ db/
│  └─ market_news.sqlite       # SQLite database
├─ docs/
│  ├─ diagrams.md              # notes & architecture diagrams
│  └─ charts/                  # exported charts for README/portfolio
├─ .gitignore
├─ README.md
└─ requirements.txt

🗃️ Data Model (SQLite Schema)

The core tables:

companies         – tickers and company metadata
trading_volumes   – daily close price & volume per company
news_mentions     – news headlines, dates, sources, URLs
wiki_pageviews    – daily pageviews per company
sentiment_scores  – sentiment metrics per headline

Simplified diagram:

companies (1) ────< trading_volumes
        │
        ├────< news_mentions (1) ────< sentiment_scores
        │
        └────< wiki_pageviews

🚀 Getting Started

🎯 Quick Start (For Reviewers)

Don't want to set up locally? View the project:

📊 Sample Analysis Notebook - Pre-run results
📈 Example Charts - Key visualizations
🗄️ Database Schema - Table structure

Want to run it? Follow the full setup below (5 minutes):

1. Clone the repo

git clone https://github.com/manuel-reyes-ml/trading_attention_tracker.git
cd trading_attention_tracker

2. Create & activate a virtual environment (recommended)

python3 -m venv .venv
# macOS / Linux
source .venv/bin/activate
# Windows (PowerShell)
# .venv\Scripts\Activate.ps1

3. Install dependencies

pip install -r requirements.txt

In Phase 1 you only need core libs: pandas, matplotlib, beautifulsoup4.
In Phase 2 you’ll add yfinance and any HTTP / RSS helpers.

4. Initialize the database

python src/db_init.py

This creates db/market_news.sqlite and all tables (companies, trading_volumes, etc.).

🧪 Phase 1: Working with Sample Data

Phase 1 is designed to be completely reproducible without calling any external API.

1️⃣ Load sample price data (CSV → SQLite)

Place files like prices_AAPL.csv, prices_AMZN.csv into data/sample/, then:

python src/load_sample_prices.py

This script:

Reads each CSV with csv.DictReader.
Inserts rows into companies and trading_volumes.

2️⃣ Load sample news headlines (RSS XML → SQLite)

Place rss_aapl.xml etc. into data/sample/, then:

python src/load_sample_news.py

This script:

Parses XML with xml.etree.ElementTree.
Extracts <title>, <link>, <pubDate>.
Normalizes pubDate to YYYY-MM-DD.
Inserts into news_mentions.

3️⃣ Load sample Wikipedia pageviews (JSON → SQLite)

Place wiki_pageviews_sample.json in data/sample/, then:

python src/load_sample_wiki.py

This script:

Parses JSON with json.load.
Iterates over items list (timestamp, views).
Inserts into wiki_pageviews.

4️⃣ Compute sentiment on headlines

python src/compute_sentiment.py

This script:

Reads all rows from news_mentions.
Applies a simple sentiment function to each title.
Inserts results into sentiment_scores (pos/neg counts + score).

5️⃣ Analyze & visualize

Option A – Pure SQL (Py4E style)

python src/analyze_sql.py

Example tasks in this script:

“Top 10 days by volume & news count for AAPL.”
“Average sentiment by month.”

Option B – pandas + Matplotlib

python src/analyze_pandas.py

Example plot (inside the script):

# in analyze_pandas.py
plot_news_vs_volume("AAPL")

Generates a chart like:

docs/charts/aapl_volume_vs_news.png

You can then reference it in the README:

![AAPL: Volume vs News Mentions](docs/charts/aapl_volume_vs_news.png)

🌐 Phase 2: Live Data (APIs)

After Phase 1 works with sample files, the next step is to swap in live data:

fetch_live_prices.py
- Use yfinance to download OHLCV for selected tickers.
- Insert into trading_volumes using the same schema.
fetch_live_wiki.py
- Call the Wikipedia Pageviews API for each company’s page.
- Insert into wiki_pageviews.
fetch_live_news.py
- Fetch RSS feeds directly from their URLs.
- Parse items and insert into news_mentions.

Because the database schema is unchanged, your analysis scripts should work with no or minimal changes.

🎯 Learning Goals

This project is designed to reinforce:

Parsing CSV, JSON, XML, HTML in real scenarios.
Designing & using a SQLite database from Python.
Writing JOIN queries and aggregations.
Using pandas as an analysis layer on top of raw SQL.
Basic text processing & sentiment ideas.
Building a reproducible, portfolio-quality repo (clear structure, README, scripts).

📚 What I Learned

Coming soon...

🛣️ Roadmap

✅ Phase 1: Foundation (In Progress)

SQLite database schema
Sample data loaders (CSV, XML, JSON)
Basic sentiment analysis
SQL-based analysis scripts

🚧 Phase 2: Live Data (Planned)

Integrate yfinance for real-time prices
Wikipedia Pageviews API integration
Live RSS feed parsing
Automated daily updates

📋 Phase 3: Enhanced Analysis (Planned)

Expand to 20+ tickers across sectors
Advanced sentiment (VADER/TextBlob)
Statistical significance testing (correlation p-values)
Technical indicators (RSI, MACD)

🚀 Phase 4: Production (Future)

Interactive dashboard (Streamlit/Plotly)
Automated daily data refresh (GitHub Actions)
Deploy dashboard to Heroku/Streamlit Cloud
Add alerts for significant attention spikes

📜 License

MIT License - see LICENSE file for details

This project is part of my learning portfolio and available for educational purposes.

👨‍💻 About Me

I'm Manuel, transitioning into data science after 6 years in trading. This project combines my domain expertise with new technical skills.

Connect with me:

💼 LinkedIn
🐙 GitHub
📧 Email

Other Projects:

Learning Journey - My complete learning roadmap (DA → ML → LLM)
[Project 2] - Coming soon...

⭐ If you found this project interesting, please give it a star!

If you have ideas or feedback, feel free to open an issue or fork the project.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
docs/charts		docs/charts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📈 Trading Attention Tracker

🔍 Project Overview

📸 Screenshots

📊 Volume vs News Analysis

🗄️ Database Schema

💻 Sample Terminal Output

💡 Why This Project?

✨ Key Features

🧱 Tech Stack

🗂️ Repository Structure

🗃️ Data Model (SQLite Schema)

🚀 Getting Started

🎯 Quick Start (For Reviewers)

1. Clone the repo

2. Create & activate a virtual environment (recommended)

3. Install dependencies

4. Initialize the database

🧪 Phase 1: Working with Sample Data

1️⃣ Load sample price data (CSV → SQLite)

2️⃣ Load sample news headlines (RSS XML → SQLite)

3️⃣ Load sample Wikipedia pageviews (JSON → SQLite)

4️⃣ Compute sentiment on headlines

5️⃣ Analyze & visualize

Option A – Pure SQL (Py4E style)

Option B – pandas + Matplotlib

🌐 Phase 2: Live Data (APIs)

🎯 Learning Goals

📚 What I Learned

🛣️ Roadmap

✅ Phase 1: Foundation (In Progress)

🚧 Phase 2: Live Data (Planned)

📋 Phase 3: Enhanced Analysis (Planned)

🚀 Phase 4: Production (Future)

📜 License

👨‍💻 About Me

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages