Skip to content

atf-inc/jan26_intern_C

Repository files navigation

News Curation System for Executives

Python FastAPI Supabase AI License

An AI-powered daily news curation system that aggregates, processes, and delivers executive-level intelligence briefings via Slack.

FeaturesQuick StartArchitecturePipelineAPI Reference


Overview

This system is designed to provide management teams and executives with a one-stop daily briefing containing:

  • Target Company News — News about your company (default: Akatsuki)
  • Competitor Intelligence — Japan local & global competitor updates
  • Stock Market Data — Real-time stock performance tracking
  • AI-Powered Insights — Summaries, sentiment analysis, and importance scoring

Target Audience

Role Use Case
Executives Quick daily briefing for strategic decisions
Product Managers Track competitor product launches
Investors Monitor stock movements and market trends
Business Analysts Understand industry landscape

Key Features

Feature Description
Automated Pipeline Daily automated data collection and processing
AI Summarization Gemini 2.0 Flash-powered article summaries
Stock Tracking Real-time stock data via Yahoo Finance
Slack Delivery Beautiful formatted digests delivered to Slack
Admin Controls Toggle news categories directly from Slack
Supabase Database Cloud-native PostgreSQL storage
Keyword Filtering Dynamic keyword-based article relevance
Bilingual Support Japanese & English content support

System Architecture

High-Level Overview

graph TB
    subgraph "Data Sources"
        RSS[RSS/News Feeds]
        STOCK[Stock APIs<br/>Yahoo Finance]
        SOCIAL[Social Media<br/>Reddit/Reviews]
    end
    
    subgraph "Processing Engine"
        INGEST[Ingestion Layer]
        DEDUP[Deduplication]
        FILTER[Keyword Filtering]
        AI[AI Processor<br/>Gemini 2.0]
    end
    
    subgraph "Storage"
        DB[(Supabase<br/>PostgreSQL)]
    end
    
    subgraph "Delivery"
        DIGEST[Digest Builder]
        SLACK[Slack Webhook]
    end
    
    RSS --> INGEST
    STOCK --> INGEST
    SOCIAL --> INGEST
    
    INGEST --> DEDUP
    DEDUP --> FILTER
    FILTER --> AI
    AI --> DB
    
    DB --> DIGEST
    DIGEST --> SLACK
    
    style AI fill:#ff9800,color:#fff
    style DB fill:#6b46c1,color:#fff
    style SLACK fill:#4a154b,color:#fff
Loading

Module Architecture

graph LR
    subgraph "config/"
        SETTINGS[settings.py]
        FEEDS[feed_sources.json]
        COMP[competitor_cache.json]
    end
    
    subgraph "ingestion/"
        NEWS[news_ingestion.py]
        STOCK[stock_ingestion.py]
        SOCIAL[social_ingestion.py]
    end
    
    subgraph "processing/"
        AIP[ai_processor.py]
        SUMM[news_summarizer.py]
        CANA[competitor_analysis.py]
        STKA[stock_analysis.py]
        DEDUP[deduplication.py]
        FILT[filtering.py]
    end
    
    subgraph "delivery/"
        SFMT[slack_formatter.py]
        SSND[slack_sender.py]
    end
    
    subgraph "database/"
        CONN[connection.py]
        MODELS[models.py]
        SCHEMA[schema.sql]
    end
    
    SETTINGS --> NEWS
    SETTINGS --> STOCK
    NEWS --> DEDUP
    STOCK --> DB[(Supabase)]
    DEDUP --> FILT
    FILT --> AIP
    AIP --> SUMM
    SUMM --> SFMT
    SFMT --> SSND
Loading

Daily Pipeline

The system runs a 5-phase pipeline orchestrated by run_daily_pipeline.py:

flowchart TD
    START([Pipeline Start]) --> P1
    
    subgraph P1["Phase 1: General News"]
        P1A[Fetch RSS Feeds] --> P1B[Remove Duplicates]
        P1B --> P1C[Keyword Filtering]
        P1C --> P1D[AI Scoring & Summarization]
        P1D --> P1E[Save to Database]
    end
    
    P1 --> P2
    
    subgraph P2["Phase 2: Competitor News"]
        P2A[Load Competitor Feeds] --> P2B[Fetch Competitor News]
        P2B --> P2C[Categorize Japan/Global]
        P2C --> P2D[Process & Score]
        P2D --> P2E[Store Results]
    end
    
    P2 --> P3
    
    subgraph P3["Phase 3: Stock Analysis"]
        P3A[Fetch Stock Tickers] --> P3B[Get Yahoo Finance Data]
        P3B --> P3C[Calculate Changes]
        P3C --> P3D[Generate AI Analysis]
        P3D --> P3E[Store Stock Data]
    end
    
    P3 --> P4
    
    subgraph P4["Phase 4: News Summarization"]
        P4A[Fetch Recent Articles] --> P4B[Group by Category]
        P4B --> P4C[Generate Category Summaries]
        P4C --> P4D[Format Article Listings]
    end
    
    P4 --> P5
    
    subgraph P5["Phase 5: Slack Delivery"]
        P5A[Build Complete Digest] --> P5B[Format for Slack]
        P5B --> P5C[Add Control Links]
        P5C --> P5D[Send via Webhook]
    end
    
    P5 --> DONE([Complete])
    
    style P1 fill:#e3f2fd
    style P2 fill:#fce4ec
    style P3 fill:#e8f5e9
    style P4 fill:#fff3e0
    style P5 fill:#f3e5f5
Loading

Pipeline Flow Details

Phase Script Duration Description
1 process_articles.py ~30s Ingests news, filters by keywords, AI-scores top articles
2 process_competitor_news.py ~20s Fetches and categorizes competitor news
3 process_competitor_stocks.py ~15s Fetches stock data for target + competitors
4 processing/news_summarizer.py ~25s Generates category summaries with AI
5 delivery/slack_formatter.py ~5s Formats and sends Slack digest

Database Schema

Entity Relationship Diagram

erDiagram
    raw_articles ||--o{ processed_articles : "processes"
    company_config ||--o{ competitors : "has"
    competitors ||--o{ competitor_rss_feeds : "has"
    daily_digests ||--|{ processed_articles : "contains"
    
    raw_articles {
        uuid id PK
        text title
        text content
        text source_url UK
        text source_name
        timestamp published_at
        text category
        text article_type
        text related_entity
        boolean processed
    }
    
    processed_articles {
        uuid id PK
        uuid raw_article_id FK
        text summary
        decimal importance_score
        text[] relevance_tags
        text[] key_points
        text sentiment
        boolean is_competitor_news
        jsonb ai_metadata
    }
    
    stock_data {
        uuid id PK
        text ticker
        text company_name
        date date
        decimal open_price
        decimal close_price
        decimal change_percent
        bigint volume
    }
    
    daily_digests {
        uuid id PK
        date digest_date
        text digest_content
        boolean slack_sent
        text company_name
        int company_articles_count
        int competitor_articles_count
    }
    
    company_config {
        uuid id PK
        text company_name UK
        text stock_ticker
        text industry
        text[] keywords
    }
    
    competitors {
        uuid id PK
        text company_name FK
        text competitor_name
        text competitor_ticker
        text competition_level
        int priority
    }
Loading

Core Tables

Table Purpose Key Fields
raw_articles Stores ingested news articles title, source_url, category
processed_articles AI-processed article summaries summary, importance_score, sentiment
stock_data Daily stock price data ticker, close_price, change_percent
daily_digests Generated daily reports digest_content, slack_sent
company_config Target company settings company_name, stock_ticker, keywords
competitors Competitor tracking competitor_name, priority
admin_keywords Dynamic filtering keywords keyword, is_active

Quick Start

Prerequisites

  • Python 3.9+
  • Supabase account (free tier works)
  • Gemini API key (Google AI Studio)
  • Slack webhook URL

Installation

# 1. Clone the repository
git clone <repository-url>
cd jan26_intern_C

# 2. Create virtual environment
python -m venv myenv
myenv\Scripts\activate  # Windows
# source myenv/bin/activate  # macOS/Linux

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment variables
copy .env.example .env  # Then edit .env

Environment Configuration

Create a .env file with the following variables:

# Database (Supabase)
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-service-role-key

# AI (Gemini)
GEMINI_API_KEY=your-gemini-api-key

# Slack
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx/yyy/zzz

# Target Company (Optional - defaults to Akatsuki)
TARGET_COMPANY=Akatsuki
TARGET_COMPANY_TICKER=3932.T

# Server (Optional)
API_HOST=127.0.0.1
API_PORT=8000
LOG_LEVEL=INFO

Database Setup

Run the schema in your Supabase SQL Editor:

-- Copy contents of database/schema.sql
-- Execute in Supabase SQL Editor

Running the System

# Option 1: Run full daily pipeline
python run_daily_pipeline.py

# Option 2: Run individual components
python process_articles.py          # General news only
python process_competitor_news.py   # Competitor news only
python process_competitor_stocks.py # Stock data only

# Option 3: Start the API server
python main.py
# Or with uvicorn:
uvicorn main:app --reload --host 127.0.0.1 --port 8000

API Reference

Available Endpoints

Method Endpoint Description
GET / API info and status
GET /health Health check with DB status
GET /status Detailed system status
POST /trigger/daily-pipeline Manually trigger pipeline
GET /api/articles Get processed articles
GET /admin/genres/toggle Toggle news category
POST /send/daily-digest Send digest immediately

Example API Usage

# Check health
curl http://localhost:8000/health

# Trigger pipeline
curl -X POST http://localhost:8000/trigger/daily-pipeline

# Get top articles
curl http://localhost:8000/api/articles?limit=10

API Documentation

Interactive docs available at:


Project Structure

jan26_intern_C/
├── main.py                    # FastAPI application entry point
├── run_daily_pipeline.py      # Main pipeline orchestrator
├── process_articles.py        # General news processing
├── process_competitor_news.py # Competitor news processing
├── process_competitor_stocks.py # Stock data processing
├── requirements.txt           # Python dependencies
│
├── config/                    # Configuration files
│   ├── settings.py            # Pydantic settings
│   ├── feed_sources.json      # RSS feed configurations
│   ├── competitor_cache.json  # Competitor definitions
│   └── competitor_feeds.json  # Competitor RSS feeds
│
├── database/                  # Database layer
│   ├── connection.py          # Supabase client
│   ├── models.py              # Pydantic data models
│   └── schema.sql             # SQL schema definitions
│
├── ingestion/                 # Data ingestion modules
│   ├── news_ingestion.py      # RSS feed parser
│   ├── stock_ingestion.py     # Yahoo Finance integration
│   ├── social_ingestion.py    # Social media scraper
│   └── storage.py             # Temporary storage helpers
│
├── processing/                # Data processing modules
│   ├── ai_processor.py        # Main AI processing
│   ├── news_summarizer.py     # Article summarization
│   ├── competitor_analysis.py # Competitor insights
│   ├── stock_analysis.py      # Stock market analysis
│   ├── deduplication.py       # Duplicate detection
│   ├── filtering.py           # Keyword filtering
│   └── news_analysis.py       # News categorization
│
├── delivery/                  # Output delivery modules
│   ├── slack_formatter.py     # Slack message formatting
│   └── slack_sender.py        # Slack webhook integration
│
├── digest/                    # Digest generation
│   ├── digest_builder.py      # Daily digest builder
│   ├── generator.py           # Digest generation logic
│   └── templates.py           # Slack templates
│
├── scheduler/                 # Job scheduling
│   └── cron_jobs.py           # Async pipeline jobs
│
├── services/                  # Business logic services
│   └── keyword_service.py     # Keyword management
│
└── utils/                     # Utility modules
    ├── gemini_client.py       # Google Gemini AI client
    ├── logger.py              # Logging configuration
    ├── slack_admin.py         # Slack admin utilities
    ├── slack_sender.py        # Slack message sender
    └── url_validator.py       # URL validation helpers

Configuration

Feed Sources (config/feed_sources.json)

Configure RSS feeds by category:

{
  "feeds": {
    "japan_games": [
      {"name": "Famitsu", "url": "https://...", "enabled": true},
      {"name": "4Gamer", "url": "https://...", "enabled": true}
    ],
    "global": [
      {"name": "IGN", "url": "https://...", "enabled": true}
    ]
  },
  "settings": {
    "days_lookback": 7,
    "enabled_genres": {
      "japan_games": true,
      "global": true
    }
  }
}

Competitor Configuration (config/competitor_cache.json)

Define competitors to track:

{
  "competitors": [
    {
      "name": "Bandai Namco",
      "ticker": "7832.T",
      "market": "japan",
      "priority": 1
    },
    {
      "name": "Electronic Arts",
      "ticker": "EA",
      "market": "global",
      "priority": 2
    }
  ]
}

AI Processing

Gemini Integration

The system uses Google Gemini 2.0 Flash for:

graph LR
    subgraph "AI Capabilities"
        SUMM[Summarization<br/>2-3 sentence summaries]
        SCORE[Importance Scoring<br/>1-10 scale]
        SENT[Sentiment Analysis<br/>positive/negative/neutral]
        KEY[Key Point Extraction<br/>3 main points]
    end
    
    ARTICLE[Raw Article] --> SUMM
    ARTICLE --> SCORE
    ARTICLE --> SENT
    ARTICLE --> KEY
    
    SUMM --> PROC[Processed Article]
    SCORE --> PROC
    SENT --> PROC
    KEY --> PROC
Loading

AI Functions

Function Purpose Output
summarize_article() Generate concise summary 2-3 sentences
calculate_importance_score() Rate relevance (1-10) Integer score
analyze_sentiment() Determine sentiment positive/negative/neutral
extract_key_points() Pull main points List of 3 strings
generate_text() General text generation Custom prompt response

Slack Digest Format

Sample Output

══════════════════════════════════════════════════
*Akatsuki デイリーインテリジェンスレポート*
2026/01/24
══════════════════════════════════════════════════

*ニュース概要*

*国内ニュース* (5件)
Akatsuki announced new gacha game collaboration with 
popular anime franchise, expected Q2 launch...

*グローバルニュース* (3件)
Mobile gaming market shows 15% growth in Asia Pacific 
region, with Japan leading the expansion...

────────────────────────────────────
*注目記事*

*国内ニュース*
1. <https://...|Akatsuki新作発表> _(Famitsu)_
   _新しいガチャゲームのコラボレーションを発表_

────────────────────────────────────
*競合株価スナップショット*

企業名                 銘柄       株価        日次     週次

>Akatsuki              3932.T    ¥1,250    +2.50%  +5.20%
 Bandai Namco          7832.T    ¥3,850    +1.20%  +3.10%
 Capcom                9697.T    ¥2,100    -0.50%  +2.80%

*マーケット分析*
• 業界全体で堅調な動き
• Akatsukiは週次で競合を上回るパフォーマンス
• 新製品発表により投資家心理が改善

──────────────────────────────────────────────────
_生成時刻: 09:00 UTC | Akatsuki_

*Content Filters*
<http://localhost:8000/admin/genres/open|Add / Remove News Categories>

Security Best Practices

Warning

Never commit .env files or API keys to version control!

Security Measure Implementation
API Keys Store in .env, never hardcode
Supabase Key Use Service Role key (not anon)
Slack Webhook Keep URL private
Database Use Row Level Security (RLS)

Extending the System

Adding New Data Sources

  1. Create new ingestion module in ingestion/
  2. Add configuration to config/feed_sources.json
  3. Integrate into pipeline via run_daily_pipeline.py

Adding New Delivery Channels

  1. Create new sender in delivery/ (e.g., email_sender.py)
  2. Add formatter for channel-specific format
  3. Integrate into digest builder

Customizing AI Processing

  1. Modify prompts in utils/gemini_client.py
  2. Add new AI functions as needed
  3. Update processing/ai_processor.py

Testing

# Run tests
pytest

# Test specific module
pytest tests/test_ingestion.py

# Test with coverage
pytest --cov=. --cov-report=html

Troubleshooting

Common Issues

Issue Solution
SUPABASE_URL is missing Check .env file exists and is loaded
Gemini API error Verify API key is valid and has quota
Slack message failed Confirm webhook URL is correct
No articles fetched Check RSS feed URLs are accessible
Import errors Run pip install -r requirements.txt

Debug Mode

# Enable debug logging
LOG_LEVEL=DEBUG python run_daily_pipeline.py

Technology Stack

Category Technology
Backend FastAPI, Python 3.9+
Database Supabase (PostgreSQL)
AI/ML Google Gemini 2.0 Flash
Data Feeds feedparser, yfinance
Messaging Slack SDK
Async asyncio, aiohttp
Validation Pydantic

Contributing

We welcome contributions from the community! If you'd like to contribute to this project, here's how you can get started:

  1. Fork the repository - Create your own copy of the repository
  2. Create a feature branch - Use a descriptive branch name like feature/your-feature-name or fix/bug-description
  3. Make your changes - Write clean, well-documented code and add tests if applicable
  4. Commit your changes - Write clear commit messages that describe what you changed and why
  5. Push to your branch - Push your changes to your forked repository
  6. Open a Pull Request - Submit your changes for review with a clear description of what you've done

Please ensure your code follows the existing style and includes appropriate documentation. We'll review your contribution and provide feedback.


License

This project is licensed under the MIT License. See the LICENSE file for details.

The MIT License is a permissive license that allows you to use, modify, and distribute this software with minimal restrictions. You are free to use this project for commercial or personal purposes.


Contact

For questions, suggestions, or support regarding this project:

  • Team: Product & Data Team
  • Target Company: Akatsuki Inc. (Configurable via environment variables)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages