News Daily Digest AI Recommender

A sophisticated news recommendation system using CrewAI agents, PostgreSQL, vector databases, and semantic search to provide personalized article recommendations based on user reading history. The system aggregates news from multiple sources to deliver comprehensive daily digests.

Installation and Setup

1. Environment Setup

Install dependencies using uv:

# Create virtual environment and install dependencies
uv sync

2. Database Setup

Ensure your local PostgreSQL database is running and accessible. The system requires a PostgreSQL instance for storing articles, user data, and reading history.

3. Environment Variables

Set up the required API keys:

export OPENAI_API_KEY="your-openai-api-key"
export NEWSAPI_KEY="your-newsapi-key"
export SERPER_API_KEY="your-serper-api-key"

Getting Started

1. Initialize Database

First, create the necessary database tables:

python scripts/create_news_table.py

This creates all required tables for storing news responses, articles, users, and reading history.

Data Pipeline

News Data Source

The system uses NewsAPI.ai to fetch news articles from multiple sources. You'll need an API key from NewsAPI.ai (2,500 requests are free for testing).

2. Run Complete Pipeline

Execute the full data processing pipeline:

NEWSAPI_KEY=<news-api-key> OPENAI_API_KEY='<openai-api-key>' PYTHONPATH=src python3 scripts/pipeline_runner.py

This orchestrates the following steps:

Individual Pipeline Scripts:

newsapi_extractor.py - Extracts articles from NewsAPI
- Searches for articles using specified keywords across multiple news sources
- Supports multiple keywords and date ranges
- Stores raw API responses in PostgreSQL
- Default: searches for Technology, Finance, Health articles
process_response_to_articles.py - Processes API responses
- Converts raw API responses into individual article records
- Extracts article metadata (title, URL, content, etc.)
- Stores processed articles in the database
create_vector_db.py - Creates vector embeddings
- Generates semantic embeddings for articles using OpenAI
- Stores embeddings in ChromaDB for similarity search
- Enables semantic article recommendations
create_mock_users.py - Sets up test users
- Creates mock user accounts
- Generates reading history for the primary user
- Establishes user-article relationships

Agent System

Prerequisites for Agent Execution

MCP Server: Local Model Context Protocol server for PostgreSQL database access
Node.js: Required to run the MCP server via npx
Serper API Key: Required for web search functionality (get from serper.dev)

MCP Server Setup

The system uses the Model Context Protocol (MCP) server for database access. Start the MCP server for PostgreSQL:

# Install npx if not already available
npm install -g npx

# Start MCP server (replace with your PostgreSQL connection string)
npx @modelcontextprotocol/server-postgres postgresql://username:password@localhost:5432/database_name

3. Run Recommendation Agent

Execute the AI agent system:

SERPER_API_KEY=<serper-api-key> OPENAI_API_KEY=<openai-api-key> PYTHONPATH=src python3 scripts/run_crew.py

Agent Architecture

The system uses CrewAI with three specialized agents:

Agents (`src/llm/agent/agents.py`)

DatabaseAgent - Database Analyst
- Queries PostgreSQL to analyze user reading habits
- Performs cluster analysis on user's article history
- Identifies reading patterns and preferences
RecommenderAgent - Article Recommender
- Uses vector similarity search to find relevant articles
- Combines semantic search with database queries
- Selects articles matching user's interest clusters
ReportWriterAgent - Content Report Writer
- Creates personalized markdown reports
- Searches for related articles and timelines
- Generates engaging, contextual content

Tasks (`src/llm/agent/tasks.py`)

Analysis Task - User reading pattern analysis
Recommendation Task - Article recommendation with context
Report Generation Task - Personalized markdown report creation

Tools (`src/llm/agent/tools.py`)

DatabaseTools - PostgreSQL access via MCP server
VectorDatabaseTool - Semantic similarity search
Web Search Tools - SerperDev for related article discovery
Web Scraping Tools - Content extraction for timelines

Project Structure

rohlik_agent/
├── README.md
├── pyproject.toml                   # Project dependencies and configuration
├── data/
│   ├── reports/                     # Generated recommendation reports
│   └── vector_store/                # FAISS vector database files
├── scripts/                         # Executable scripts
│   ├── create_news_table.py         # Database initialization
│   ├── newsapi_extractor.py         # News API data extraction
│   ├── process_response_to_articles.py  # Process API responses
│   ├── create_vector_db.py          # Generate vector embeddings
│   ├── create_mock_users.py         # Create test users and history
│   ├── pipeline_runner.py           # Complete data pipeline orchestrator
│   └── run_crew.py                  # Agent execution
└── src/                             # Source code
    ├── db_utils/                    # Database utilities and connections
    ├── etl/                         # Extract, Transform, Load modules
    │   ├── newsapi_client.py        # NewsAPI integration
    │   ├── article_processor.py     # Article data processing
    │   └── vector_store.py          # Vector database operations
    └── llm/                         # AI agent system
        └── agent/                   # CrewAI agent implementation
            ├── agents.py            # Agent definitions
            ├── tasks.py             # Task specifications
            ├── tools.py             # Custom tools
            └── crew.py              # Crew orchestration

Output

The system generates personalized news recommendation reports in the reports/ directory, featuring:

Tailored article selections based on reading history from multiple news sources
Chronological timelines for each recommended article
Related article discovery and context
Engaging, personalized content presentation

Features

Multi-source News Aggregation: Fetches articles from various news sources via NewsAPI.ai
Semantic Search: Uses OpenAI embeddings and FAISS for similarity-based recommendations
Personalized Recommendations: Analyzes user reading history to suggest relevant articles
Timeline Generation: Creates chronological context for recommended articles
Automated Reporting: Generates comprehensive markdown reports with personalized insights
MCP Integration: Uses Model Context Protocol for secure database access

Technical Stack

Backend: Python 3.12+, PostgreSQL, FAISS
AI Framework: CrewAI, OpenAI GPT models
Data Processing: Custom ETL pipeline with vector embeddings
Database: PostgreSQL for structured data, FAISS for vector similarity
APIs: NewsAPI.ai, Serper.dev, OpenAI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Daily Digest AI Recommender

Installation and Setup

1. Environment Setup

2. Database Setup

3. Environment Variables

Getting Started

1. Initialize Database

Data Pipeline

News Data Source

2. Run Complete Pipeline

Individual Pipeline Scripts:

Agent System

Prerequisites for Agent Execution

MCP Server Setup

3. Run Recommendation Agent

Agent Architecture

Agents (`src/llm/agent/agents.py`)

Tasks (`src/llm/agent/tasks.py`)

Tools (`src/llm/agent/tools.py`)

Project Structure

Output

Features

Technical Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

News Daily Digest AI Recommender

Installation and Setup

1. Environment Setup

2. Database Setup

3. Environment Variables

Getting Started

1. Initialize Database

Data Pipeline

News Data Source

2. Run Complete Pipeline

Individual Pipeline Scripts:

Agent System

Prerequisites for Agent Execution

MCP Server Setup

3. Run Recommendation Agent

Agent Architecture

Agents (src/llm/agent/agents.py)

Tasks (src/llm/agent/tasks.py)

Tools (src/llm/agent/tools.py)

Project Structure

Output

Features

Technical Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Agents (`src/llm/agent/agents.py`)

Tasks (`src/llm/agent/tasks.py`)

Tools (`src/llm/agent/tools.py`)

Packages