News Curation System for Executives

An AI-powered daily news curation system that aggregates, processes, and delivers executive-level intelligence briefings via Slack.

Features • Quick Start • Architecture • Pipeline • API Reference

Overview

This system is designed to provide management teams and executives with a one-stop daily briefing containing:

Target Company News — News about your company (default: Akatsuki)
Competitor Intelligence — Japan local & global competitor updates
Stock Market Data — Real-time stock performance tracking
AI-Powered Insights — Summaries, sentiment analysis, and importance scoring

Target Audience

Role	Use Case
Executives	Quick daily briefing for strategic decisions
Product Managers	Track competitor product launches
Investors	Monitor stock movements and market trends
Business Analysts	Understand industry landscape

Key Features

Feature	Description
Automated Pipeline	Daily automated data collection and processing
AI Summarization	Gemini 2.0 Flash-powered article summaries
Stock Tracking	Real-time stock data via Yahoo Finance
Slack Delivery	Beautiful formatted digests delivered to Slack
Admin Controls	Toggle news categories directly from Slack
Supabase Database	Cloud-native PostgreSQL storage
Keyword Filtering	Dynamic keyword-based article relevance
Bilingual Support	Japanese & English content support

System Architecture

High-Level Overview

graph TB
    subgraph "Data Sources"
        RSS[RSS/News Feeds]
        STOCK[Stock APIs<br/>Yahoo Finance]
        SOCIAL[Social Media<br/>Reddit/Reviews]
    end
    
    subgraph "Processing Engine"
        INGEST[Ingestion Layer]
        DEDUP[Deduplication]
        FILTER[Keyword Filtering]
        AI[AI Processor<br/>Gemini 2.0]
    end
    
    subgraph "Storage"
        DB[(Supabase<br/>PostgreSQL)]
    end
    
    subgraph "Delivery"
        DIGEST[Digest Builder]
        SLACK[Slack Webhook]
    end
    
    RSS --> INGEST
    STOCK --> INGEST
    SOCIAL --> INGEST
    
    INGEST --> DEDUP
    DEDUP --> FILTER
    FILTER --> AI
    AI --> DB
    
    DB --> DIGEST
    DIGEST --> SLACK
    
    style AI fill:#ff9800,color:#fff
    style DB fill:#6b46c1,color:#fff
    style SLACK fill:#4a154b,color:#fff

Module Architecture

graph LR
    subgraph "config/"
        SETTINGS[settings.py]
        FEEDS[feed_sources.json]
        COMP[competitor_cache.json]
    end
    
    subgraph "ingestion/"
        NEWS[news_ingestion.py]
        STOCK[stock_ingestion.py]
        SOCIAL[social_ingestion.py]
    end
    
    subgraph "processing/"
        AIP[ai_processor.py]
        SUMM[news_summarizer.py]
        CANA[competitor_analysis.py]
        STKA[stock_analysis.py]
        DEDUP[deduplication.py]
        FILT[filtering.py]
    end
    
    subgraph "delivery/"
        SFMT[slack_formatter.py]
        SSND[slack_sender.py]
    end
    
    subgraph "database/"
        CONN[connection.py]
        MODELS[models.py]
        SCHEMA[schema.sql]
    end
    
    SETTINGS --> NEWS
    SETTINGS --> STOCK
    NEWS --> DEDUP
    STOCK --> DB[(Supabase)]
    DEDUP --> FILT
    FILT --> AIP
    AIP --> SUMM
    SUMM --> SFMT
    SFMT --> SSND

Daily Pipeline

The system runs a 5-phase pipeline orchestrated by run_daily_pipeline.py:

flowchart TD
    START([Pipeline Start]) --> P1
    
    subgraph P1["Phase 1: General News"]
        P1A[Fetch RSS Feeds] --> P1B[Remove Duplicates]
        P1B --> P1C[Keyword Filtering]
        P1C --> P1D[AI Scoring & Summarization]
        P1D --> P1E[Save to Database]
    end
    
    P1 --> P2
    
    subgraph P2["Phase 2: Competitor News"]
        P2A[Load Competitor Feeds] --> P2B[Fetch Competitor News]
        P2B --> P2C[Categorize Japan/Global]
        P2C --> P2D[Process & Score]
        P2D --> P2E[Store Results]
    end
    
    P2 --> P3
    
    subgraph P3["Phase 3: Stock Analysis"]
        P3A[Fetch Stock Tickers] --> P3B[Get Yahoo Finance Data]
        P3B --> P3C[Calculate Changes]
        P3C --> P3D[Generate AI Analysis]
        P3D --> P3E[Store Stock Data]
    end
    
    P3 --> P4
    
    subgraph P4["Phase 4: News Summarization"]
        P4A[Fetch Recent Articles] --> P4B[Group by Category]
        P4B --> P4C[Generate Category Summaries]
        P4C --> P4D[Format Article Listings]
    end
    
    P4 --> P5
    
    subgraph P5["Phase 5: Slack Delivery"]
        P5A[Build Complete Digest] --> P5B[Format for Slack]
        P5B --> P5C[Add Control Links]
        P5C --> P5D[Send via Webhook]
    end
    
    P5 --> DONE([Complete])
    
    style P1 fill:#e3f2fd
    style P2 fill:#fce4ec
    style P3 fill:#e8f5e9
    style P4 fill:#fff3e0
    style P5 fill:#f3e5f5

Pipeline Flow Details

Phase	Script	Duration	Description
1	`process_articles.py`	~30s	Ingests news, filters by keywords, AI-scores top articles
2	`process_competitor_news.py`	~20s	Fetches and categorizes competitor news
3	`process_competitor_stocks.py`	~15s	Fetches stock data for target + competitors
4	`processing/news_summarizer.py`	~25s	Generates category summaries with AI
5	`delivery/slack_formatter.py`	~5s	Formats and sends Slack digest

Database Schema

Entity Relationship Diagram

erDiagram
    raw_articles ||--o{ processed_articles : "processes"
    company_config ||--o{ competitors : "has"
    competitors ||--o{ competitor_rss_feeds : "has"
    daily_digests ||--|{ processed_articles : "contains"
    
    raw_articles {
        uuid id PK
        text title
        text content
        text source_url UK
        text source_name
        timestamp published_at
        text category
        text article_type
        text related_entity
        boolean processed
    }
    
    processed_articles {
        uuid id PK
        uuid raw_article_id FK
        text summary
        decimal importance_score
        text[] relevance_tags
        text[] key_points
        text sentiment
        boolean is_competitor_news
        jsonb ai_metadata
    }
    
    stock_data {
        uuid id PK
        text ticker
        text company_name
        date date
        decimal open_price
        decimal close_price
        decimal change_percent
        bigint volume
    }
    
    daily_digests {
        uuid id PK
        date digest_date
        text digest_content
        boolean slack_sent
        text company_name
        int company_articles_count
        int competitor_articles_count
    }
    
    company_config {
        uuid id PK
        text company_name UK
        text stock_ticker
        text industry
        text[] keywords
    }
    
    competitors {
        uuid id PK
        text company_name FK
        text competitor_name
        text competitor_ticker
        text competition_level
        int priority
    }

Core Tables

Table	Purpose	Key Fields
`raw_articles`	Stores ingested news articles	title, source_url, category
`processed_articles`	AI-processed article summaries	summary, importance_score, sentiment
`stock_data`	Daily stock price data	ticker, close_price, change_percent
`daily_digests`	Generated daily reports	digest_content, slack_sent
`company_config`	Target company settings	company_name, stock_ticker, keywords
`competitors`	Competitor tracking	competitor_name, priority
`admin_keywords`	Dynamic filtering keywords	keyword, is_active

Quick Start

Prerequisites

Python 3.9+
Supabase account (free tier works)
Gemini API key (Google AI Studio)
Slack webhook URL

Installation

# 1. Clone the repository
git clone <repository-url>
cd jan26_intern_C

# 2. Create virtual environment
python -m venv myenv
myenv\Scripts\activate  # Windows
# source myenv/bin/activate  # macOS/Linux

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment variables
copy .env.example .env  # Then edit .env

Environment Configuration

Create a .env file with the following variables:

# Database (Supabase)
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-service-role-key

# AI (Gemini)
GEMINI_API_KEY=your-gemini-api-key

# Slack
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx/yyy/zzz

# Target Company (Optional - defaults to Akatsuki)
TARGET_COMPANY=Akatsuki
TARGET_COMPANY_TICKER=3932.T

# Server (Optional)
API_HOST=127.0.0.1
API_PORT=8000
LOG_LEVEL=INFO

Database Setup

Run the schema in your Supabase SQL Editor:

-- Copy contents of database/schema.sql
-- Execute in Supabase SQL Editor

Running the System

# Option 1: Run full daily pipeline
python run_daily_pipeline.py

# Option 2: Run individual components
python process_articles.py          # General news only
python process_competitor_news.py   # Competitor news only
python process_competitor_stocks.py # Stock data only

# Option 3: Start the API server
python main.py
# Or with uvicorn:
uvicorn main:app --reload --host 127.0.0.1 --port 8000

API Reference

Available Endpoints

Method	Endpoint	Description
`GET`	`/`	API info and status
`GET`	`/health`	Health check with DB status
`GET`	`/status`	Detailed system status
`POST`	`/trigger/daily-pipeline`	Manually trigger pipeline
`GET`	`/api/articles`	Get processed articles
`GET`	`/admin/genres/toggle`	Toggle news category
`POST`	`/send/daily-digest`	Send digest immediately

Example API Usage

# Check health
curl http://localhost:8000/health

# Trigger pipeline
curl -X POST http://localhost:8000/trigger/daily-pipeline

# Get top articles
curl http://localhost:8000/api/articles?limit=10

API Documentation

Interactive docs available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Project Structure

jan26_intern_C/
├── main.py                    # FastAPI application entry point
├── run_daily_pipeline.py      # Main pipeline orchestrator
├── process_articles.py        # General news processing
├── process_competitor_news.py # Competitor news processing
├── process_competitor_stocks.py # Stock data processing
├── requirements.txt           # Python dependencies
│
├── config/                    # Configuration files
│   ├── settings.py            # Pydantic settings
│   ├── feed_sources.json      # RSS feed configurations
│   ├── competitor_cache.json  # Competitor definitions
│   └── competitor_feeds.json  # Competitor RSS feeds
│
├── database/                  # Database layer
│   ├── connection.py          # Supabase client
│   ├── models.py              # Pydantic data models
│   └── schema.sql             # SQL schema definitions
│
├── ingestion/                 # Data ingestion modules
│   ├── news_ingestion.py      # RSS feed parser
│   ├── stock_ingestion.py     # Yahoo Finance integration
│   ├── social_ingestion.py    # Social media scraper
│   └── storage.py             # Temporary storage helpers
│
├── processing/                # Data processing modules
│   ├── ai_processor.py        # Main AI processing
│   ├── news_summarizer.py     # Article summarization
│   ├── competitor_analysis.py # Competitor insights
│   ├── stock_analysis.py      # Stock market analysis
│   ├── deduplication.py       # Duplicate detection
│   ├── filtering.py           # Keyword filtering
│   └── news_analysis.py       # News categorization
│
├── delivery/                  # Output delivery modules
│   ├── slack_formatter.py     # Slack message formatting
│   └── slack_sender.py        # Slack webhook integration
│
├── digest/                    # Digest generation
│   ├── digest_builder.py      # Daily digest builder
│   ├── generator.py           # Digest generation logic
│   └── templates.py           # Slack templates
│
├── scheduler/                 # Job scheduling
│   └── cron_jobs.py           # Async pipeline jobs
│
├── services/                  # Business logic services
│   └── keyword_service.py     # Keyword management
│
└── utils/                     # Utility modules
    ├── gemini_client.py       # Google Gemini AI client
    ├── logger.py              # Logging configuration
    ├── slack_admin.py         # Slack admin utilities
    ├── slack_sender.py        # Slack message sender
    └── url_validator.py       # URL validation helpers

Configuration

Feed Sources (`config/feed_sources.json`)

Configure RSS feeds by category:

{
  "feeds": {
    "japan_games": [
      {"name": "Famitsu", "url": "https://...", "enabled": true},
      {"name": "4Gamer", "url": "https://...", "enabled": true}
    ],
    "global": [
      {"name": "IGN", "url": "https://...", "enabled": true}
    ]
  },
  "settings": {
    "days_lookback": 7,
    "enabled_genres": {
      "japan_games": true,
      "global": true
    }
  }
}

Competitor Configuration (`config/competitor_cache.json`)

Define competitors to track:

{
  "competitors": [
    {
      "name": "Bandai Namco",
      "ticker": "7832.T",
      "market": "japan",
      "priority": 1
    },
    {
      "name": "Electronic Arts",
      "ticker": "EA",
      "market": "global",
      "priority": 2
    }
  ]
}

AI Processing

Gemini Integration

The system uses Google Gemini 2.0 Flash for:

graph LR
    subgraph "AI Capabilities"
        SUMM[Summarization<br/>2-3 sentence summaries]
        SCORE[Importance Scoring<br/>1-10 scale]
        SENT[Sentiment Analysis<br/>positive/negative/neutral]
        KEY[Key Point Extraction<br/>3 main points]
    end
    
    ARTICLE[Raw Article] --> SUMM
    ARTICLE --> SCORE
    ARTICLE --> SENT
    ARTICLE --> KEY
    
    SUMM --> PROC[Processed Article]
    SCORE --> PROC
    SENT --> PROC
    KEY --> PROC

AI Functions

Function	Purpose	Output
`summarize_article()`	Generate concise summary	2-3 sentences
`calculate_importance_score()`	Rate relevance (1-10)	Integer score
`analyze_sentiment()`	Determine sentiment	positive/negative/neutral
`extract_key_points()`	Pull main points	List of 3 strings
`generate_text()`	General text generation	Custom prompt response

Slack Digest Format

Sample Output

══════════════════════════════════════════════════
*Akatsuki デイリーインテリジェンスレポート*
2026/01/24
══════════════════════════════════════════════════

*ニュース概要*

*国内ニュース* (5件)
Akatsuki announced new gacha game collaboration with 
popular anime franchise, expected Q2 launch...

*グローバルニュース* (3件)
Mobile gaming market shows 15% growth in Asia Pacific 
region, with Japan leading the expansion...

────────────────────────────────────
*注目記事*

*国内ニュース*
1. <https://...|Akatsuki新作発表> _(Famitsu)_
   _新しいガチャゲームのコラボレーションを発表_

────────────────────────────────────
*競合株価スナップショット*

企業名                 銘柄       株価        日次     週次

>Akatsuki              3932.T    ¥1,250    +2.50%  +5.20%
 Bandai Namco          7832.T    ¥3,850    +1.20%  +3.10%
 Capcom                9697.T    ¥2,100    -0.50%  +2.80%

*マーケット分析*
• 業界全体で堅調な動き
• Akatsukiは週次で競合を上回るパフォーマンス
• 新製品発表により投資家心理が改善

──────────────────────────────────────────────────
_生成時刻: 09:00 UTC | Akatsuki_

*Content Filters*
<http://localhost:8000/admin/genres/open|Add / Remove News Categories>

Security Best Practices

Warning

Never commit .env files or API keys to version control!

Security Measure	Implementation
API Keys	Store in `.env`, never hardcode
Supabase Key	Use Service Role key (not anon)
Slack Webhook	Keep URL private
Database	Use Row Level Security (RLS)

Extending the System

Adding New Data Sources

Create new ingestion module in ingestion/
Add configuration to config/feed_sources.json
Integrate into pipeline via run_daily_pipeline.py

Adding New Delivery Channels

Create new sender in delivery/ (e.g., email_sender.py)
Add formatter for channel-specific format
Integrate into digest builder

Customizing AI Processing

Modify prompts in utils/gemini_client.py
Add new AI functions as needed
Update processing/ai_processor.py

Testing

# Run tests
pytest

# Test specific module
pytest tests/test_ingestion.py

# Test with coverage
pytest --cov=. --cov-report=html

Troubleshooting

Common Issues

Issue	Solution
`SUPABASE_URL is missing`	Check `.env` file exists and is loaded
`Gemini API error`	Verify API key is valid and has quota
`Slack message failed`	Confirm webhook URL is correct
`No articles fetched`	Check RSS feed URLs are accessible
`Import errors`	Run `pip install -r requirements.txt`

Debug Mode

# Enable debug logging
LOG_LEVEL=DEBUG python run_daily_pipeline.py

Technology Stack

Category	Technology
Backend	FastAPI, Python 3.9+
Database	Supabase (PostgreSQL)
AI/ML	Google Gemini 2.0 Flash
Data Feeds	feedparser, yfinance
Messaging	Slack SDK
Async	asyncio, aiohttp
Validation	Pydantic

Contributing

We welcome contributions from the community! If you'd like to contribute to this project, here's how you can get started:

Fork the repository - Create your own copy of the repository
Create a feature branch - Use a descriptive branch name like feature/your-feature-name or fix/bug-description
Make your changes - Write clean, well-documented code and add tests if applicable
Commit your changes - Write clear commit messages that describe what you changed and why
Push to your branch - Push your changes to your forked repository
Open a Pull Request - Submit your changes for review with a clear description of what you've done

Please ensure your code follows the existing style and includes appropriate documentation. We'll review your contribution and provide feedback.

License

This project is licensed under the MIT License. See the LICENSE file for details.

The MIT License is a permissive license that allows you to use, modify, and distribute this software with minimal restrictions. You are free to use this project for commercial or personal purposes.

Contact

For questions, suggestions, or support regarding this project:

Team: Product & Data Team
Target Company: Akatsuki Inc. (Configurable via environment variables)

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
config		config
database		database
delivery		delivery
digest		digest
ingestion		ingestion
processing		processing
scheduler		scheduler
services		services
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
TEAM_OVERVIEW.md		TEAM_OVERVIEW.md
cloudbuild.yaml		cloudbuild.yaml
deploy.sh		deploy.sh
main.py		main.py
process_articles.py		process_articles.py
process_competitor_news.py		process_competitor_news.py
process_competitor_stocks.py		process_competitor_stocks.py
requirements.txt		requirements.txt
run_daily_pipeline.py		run_daily_pipeline.py

Folders and files

Latest commit

History

Repository files navigation

News Curation System for Executives

Overview

Target Audience

Key Features

System Architecture

High-Level Overview

Module Architecture

Daily Pipeline

Pipeline Flow Details

Database Schema

Entity Relationship Diagram

Core Tables

Quick Start

Prerequisites

Installation

Environment Configuration

Database Setup

Running the System

API Reference

Available Endpoints

Example API Usage

API Documentation

Project Structure

Configuration

Feed Sources (config/feed_sources.json)

Competitor Configuration (config/competitor_cache.json)

AI Processing

Gemini Integration

AI Functions

Slack Digest Format

Sample Output

Security Best Practices

Extending the System

Adding New Data Sources

Adding New Delivery Channels

Customizing AI Processing

Testing

Troubleshooting

Common Issues

Debug Mode

Technology Stack

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Feed Sources (`config/feed_sources.json`)

Competitor Configuration (`config/competitor_cache.json`)

Packages