Skip to content

manuel-reyes-ml/attention-flow-catalyst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“ˆ Trading Attention Tracker

Volume, News & Sentiment Analytics for FAANG Stocks

Python SQLite Status License

Capstone project correlating stock trading volume with news headlines, headline sentiment, and Wikipedia pageviews using Python, SQLite, pandas, and public data sources.

Built as my final project for the Python for Everybody specialization and as my first β€œserious” data project in my portfolio.


πŸ” Project Overview

The goal of this project is to answer a simple question:

β€œWhen a company gets more attention in the news and on Wikipedia, does that show up in trading activity?”

To explore this, the project:

  • Collects historical prices & volume for a small set of tickers (e.g. AAPL, AMZN).
  • Collects news headlines from RSS feeds.
  • Collects Wikipedia pageviews for the same companies.
  • Computes simple sentiment scores on headlines.
  • Stores everything in a normalized SQLite database.
  • Uses SQL + pandas + Matplotlib to analyze and visualize relationships over time.

The project is designed in two phases:

  1. Phase 1 – Sample / downloaded data (offline files, easy to reproduce).
  2. Phase 2 – Live APIs (yfinance, Wikipedia Pageviews API, live RSS feeds) using the same schema and scripts.

πŸ“Έ Screenshots

Status: In active development (Week 1). Screenshots and visualizations will be added as features are completed.

πŸ“Š Volume vs News Analysis

[Visualization Preview]

This chart will show:
βœ“ Daily trading volume (blue line)
βœ“ News mention count (orange bars)  
βœ“ Wikipedia pageviews (green line)
βœ“ Clear correlation patterns
βœ“ Interactive hover tooltips (Plotly version)

Status: In development - available after Phase 1 completion

πŸ—„οΈ Database Schema

Normalized SQLite database with 5 core tables:
- companies: Ticker metadata
- trading_volumes: Daily OHLCV data
- news_mentions: Scraped headlines
- wiki_pageviews: Daily attention metrics
- sentiment_scores: NLP analysis results

[Schema diagram coming soon]

πŸ’» Sample Terminal Output

$ python src/analyze_sql.py --ticker AAPL --days 30

Loading data for AAPL (last 30 days)...
βœ“ Found 30 trading days
βœ“ Found 127 news mentions  
βœ“ Found 30 Wikipedia datapoints

Top 5 High-Attention Days:
════════════════════════════════════════════════
Date       | Volume  | News | Views | Sentiment
-----------|---------|------|-------|----------
2024-01-15 | 95.2M   | 12   | 1.2M  | +0.65
[... more rows ...]

In the meantime, see the Repository Structure and Data Model sections for technical details.


πŸ’‘ Why This Project?

Research Question:
Does public attention (news + Wikipedia) predict trading activity?

Personal Motivation:
With 6 years of trading experience, I've always wondered if "buzz" translates to volume. This project combines my trading background with new data skills to explore that question quantitatively.

What Makes It Unique:

  • Combines three attention signals (news, Wikipedia, sentiment)
  • Two-phase design (reproducible sample β†’ live APIs)
  • Production-quality code structure from a learning project
  • Demonstrates full data pipeline: collection β†’ storage β†’ analysis β†’ visualization

✨ Key Features

  • 🧾 Multiple data formats

    • CSV β†’ prices & volume
    • XML β†’ RSS news headlines
    • JSON β†’ Wikipedia pageviews
    • HTML (optional) β†’ article parsing with BeautifulSoup
  • πŸ—„οΈ Central SQLite database

    • companies, trading_volumes, news_mentions,
      wiki_pageviews, sentiment_scores
  • 🧠 Text sentiment

    • Basic positive/negative word counting on news headlines
    • Per-headline and per-day sentiment scores
  • πŸ“Š Analysis & visualization

    • Daily aggregation by ticker (volume, news count, sentiment, views)
    • Time series charts: volume vs news mentions vs pageviews
    • Keyword exploration (top words in headlines per ticker)
  • πŸ§ͺ Educational focus

    • Heavy use of core Python concepts from Python for Everybody:
      • files, loops, lists, dictionaries
      • CSV / JSON / XML parsing
      • SQLite with sqlite3
      • plus pandas for higher-level analysis

🧱 Tech Stack

  • Language: Python 3
  • Data storage: SQLite (sqlite3)
  • Data analysis: pandas
  • Visualization: Matplotlib
  • Parsing / scraping:
    • csv, json, xml.etree.ElementTree
    • feedparser or RSS parsing via XML
    • beautifulsoup4 (optional HTML parsing)
  • Market data (Phase 2): yfinance
  • Attention data (Phase 2): Wikipedia Pageviews API

πŸ—‚οΈ Repository Structure

Planned structure (evolves as project grows):

trading_attention_tracker/
β”œβ”€ src/
β”‚  β”œβ”€ __init__.py
β”‚  β”œβ”€ db_init.py               # create tables, get_connection()
β”‚  β”œβ”€ load_sample_prices.py    # Phase 1 – CSV β†’ SQLite
β”‚  β”œβ”€ load_sample_news.py      # Phase 1 – XML (RSS) β†’ SQLite
β”‚  β”œβ”€ load_sample_wiki.py      # Phase 1 – JSON β†’ SQLite
β”‚  β”œβ”€ fetch_live_prices.py     # Phase 2 – yfinance β†’ SQLite
β”‚  β”œβ”€ fetch_live_news.py       # Phase 2 – live RSS β†’ SQLite
β”‚  β”œβ”€ fetch_live_wiki.py       # Phase 2 – Wikipedia API β†’ SQLite
β”‚  β”œβ”€ compute_sentiment.py     # headline sentiment scoring
β”‚  β”œβ”€ analyze_sql.py           # pure-SQL style analysis (Py4E-style)
β”‚  └─ analyze_pandas.py        # pandas-based analysis + plots
β”œβ”€ data/
β”‚  β”œβ”€ sample/                  # CSV / XML / JSON sample files
β”‚  └─ live/                    # cached API responses (optional)
β”œβ”€ db/
β”‚  └─ market_news.sqlite       # SQLite database
β”œβ”€ docs/
β”‚  β”œβ”€ diagrams.md              # notes & architecture diagrams
β”‚  └─ charts/                  # exported charts for README/portfolio
β”œβ”€ .gitignore
β”œβ”€ README.md
└─ requirements.txt

πŸ—ƒοΈ Data Model (SQLite Schema)

The core tables:

companies         – tickers and company metadata
trading_volumes   – daily close price & volume per company
news_mentions     – news headlines, dates, sources, URLs
wiki_pageviews    – daily pageviews per company
sentiment_scores  – sentiment metrics per headline

Simplified diagram:

companies (1) ────< trading_volumes
        β”‚
        β”œβ”€β”€β”€β”€< news_mentions (1) ────< sentiment_scores
        β”‚
        └────< wiki_pageviews

πŸš€ Getting Started

🎯 Quick Start (For Reviewers)

Don't want to set up locally? View the project:

Want to run it? Follow the full setup below (5 minutes):

1. Clone the repo

git clone https://github.com/manuel-reyes-ml/trading_attention_tracker.git
cd trading_attention_tracker

2. Create & activate a virtual environment (recommended)

python3 -m venv .venv
# macOS / Linux
source .venv/bin/activate
# Windows (PowerShell)
# .venv\Scripts\Activate.ps1

3. Install dependencies

pip install -r requirements.txt

In Phase 1 you only need core libs: pandas, matplotlib, beautifulsoup4.
In Phase 2 you’ll add yfinance and any HTTP / RSS helpers.

4. Initialize the database

python src/db_init.py

This creates db/market_news.sqlite and all tables (companies, trading_volumes, etc.).


πŸ§ͺ Phase 1: Working with Sample Data

Phase 1 is designed to be completely reproducible without calling any external API.

1️⃣ Load sample price data (CSV β†’ SQLite)

Place files like prices_AAPL.csv, prices_AMZN.csv into data/sample/, then:

python src/load_sample_prices.py

This script:

  • Reads each CSV with csv.DictReader.
  • Inserts rows into companies and trading_volumes.

2️⃣ Load sample news headlines (RSS XML β†’ SQLite)

Place rss_aapl.xml etc. into data/sample/, then:

python src/load_sample_news.py

This script:

  • Parses XML with xml.etree.ElementTree.
  • Extracts <title>, <link>, <pubDate>.
  • Normalizes pubDate to YYYY-MM-DD.
  • Inserts into news_mentions.

3️⃣ Load sample Wikipedia pageviews (JSON β†’ SQLite)

Place wiki_pageviews_sample.json in data/sample/, then:

python src/load_sample_wiki.py

This script:

  • Parses JSON with json.load.
  • Iterates over items list (timestamp, views).
  • Inserts into wiki_pageviews.

4️⃣ Compute sentiment on headlines

python src/compute_sentiment.py

This script:

  • Reads all rows from news_mentions.
  • Applies a simple sentiment function to each title.
  • Inserts results into sentiment_scores (pos/neg counts + score).

5️⃣ Analyze & visualize

Option A – Pure SQL (Py4E style)

python src/analyze_sql.py

Example tasks in this script:

  • β€œTop 10 days by volume & news count for AAPL.”
  • β€œAverage sentiment by month.”

Option B – pandas + Matplotlib

python src/analyze_pandas.py

Example plot (inside the script):

# in analyze_pandas.py
plot_news_vs_volume("AAPL")

Generates a chart like:

docs/charts/aapl_volume_vs_news.png

You can then reference it in the README:

![AAPL: Volume vs News Mentions](docs/charts/aapl_volume_vs_news.png)

🌐 Phase 2: Live Data (APIs)

After Phase 1 works with sample files, the next step is to swap in live data:

  • fetch_live_prices.py

    • Use yfinance to download OHLCV for selected tickers.
    • Insert into trading_volumes using the same schema.
  • fetch_live_wiki.py

    • Call the Wikipedia Pageviews API for each company’s page.
    • Insert into wiki_pageviews.
  • fetch_live_news.py

    • Fetch RSS feeds directly from their URLs.
    • Parse items and insert into news_mentions.

Because the database schema is unchanged, your analysis scripts should work with no or minimal changes.


🎯 Learning Goals

This project is designed to reinforce:

  • Parsing CSV, JSON, XML, HTML in real scenarios.
  • Designing & using a SQLite database from Python.
  • Writing JOIN queries and aggregations.
  • Using pandas as an analysis layer on top of raw SQL.
  • Basic text processing & sentiment ideas.
  • Building a reproducible, portfolio-quality repo (clear structure, README, scripts).

πŸ“š What I Learned

  • Coming soon...

πŸ›£οΈ Roadmap

βœ… Phase 1: Foundation (In Progress)

  • SQLite database schema
  • Sample data loaders (CSV, XML, JSON)
  • Basic sentiment analysis
  • SQL-based analysis scripts

🚧 Phase 2: Live Data (Planned)

  • Integrate yfinance for real-time prices
  • Wikipedia Pageviews API integration
  • Live RSS feed parsing
  • Automated daily updates

πŸ“‹ Phase 3: Enhanced Analysis (Planned)

  • Expand to 20+ tickers across sectors
  • Advanced sentiment (VADER/TextBlob)
  • Statistical significance testing (correlation p-values)
  • Technical indicators (RSI, MACD)

πŸš€ Phase 4: Production (Future)

  • Interactive dashboard (Streamlit/Plotly)
  • Automated daily data refresh (GitHub Actions)
  • Deploy dashboard to Heroku/Streamlit Cloud
  • Add alerts for significant attention spikes

πŸ“œ License

MIT License - see LICENSE file for details

This project is part of my learning portfolio and available for educational purposes.


πŸ‘¨β€πŸ’» About Me

I'm Manuel, transitioning into data science after 6 years in trading. This project combines my domain expertise with new technical skills.

Connect with me:

Other Projects:

  • Learning Journey - My complete learning roadmap (DA β†’ ML β†’ LLM)
  • [Project 2] - Coming soon...

⭐ If you found this project interesting, please give it a star!

If you have ideas or feedback, feel free to open an issue or fork the project.

About

Analyze how news headlines, sentiment, and Wikipedia attention relate to stock trading volume using Python, SQLite, pandas, and public APIs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors