Skip to content

Ankit052003/AI-NEWS-AGENT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI News Research Agent

An autonomous AI news research product that turns a user query into a saved, source-backed Markdown research report.

The MVP is complete through Phase 7 of the project plan. It can accept a research query, collect sources, extract article text, summarize findings, generate a cited report, persist the result, and let the user reopen saved reports from a Next.js interface.

MVP Status

Phase Status Result
Phase 1 Complete FastAPI backend and Next.js frontend structure
Phase 2 Complete Typed POST /research API contract
Phase 3 Complete NewsAPI search, source normalization, deduping, ranking
Phase 4 Complete Article fetching and clean text extraction
Phase 5 Complete Article summaries and cited Markdown reports
Phase 6 Complete SQLAlchemy persistence and research history APIs
Phase 7 Complete Browser UI for research, saved reports, detail view, dashboard

Advanced phases such as LangGraph orchestration, multi-agent workflows, memory, RAG, background jobs, scheduling, and exports are intentionally left for the next stage after the MVP pipeline works end to end.

What It Does

Example query:

Give me the latest AI startup funding news this week.

MVP flow:

User query
  -> FastAPI research endpoint
  -> Web search through NewsAPI when configured
  -> Mock source fallback when no search key is available
  -> Source normalization, deduplication, and ranking
  -> Article HTML fetching and text extraction
  -> Article-level summarization
  -> Markdown report generation with citations
  -> Database persistence
  -> Frontend report browsing and dashboard

Why This Project Is Recruiter-Friendly

This is not just a chatbot wrapper. It demonstrates product-minded AI engineering across the full stack:

  • API-first backend design with typed Pydantic request and response schemas.
  • Real web integration with a configurable search provider.
  • Robust fallback behavior so the app works without paid API keys.
  • Article extraction with timeout handling, content filtering, and skipped failures instead of broken research runs.
  • Report generation that keeps citations tied to source links.
  • Persistence of research queries, sources, extracted article text, summaries, and final reports.
  • A usable frontend with query submission, loading states, saved report list, report detail page, source browsing, and a lightweight dashboard.
  • Modular services that are ready to evolve into LangGraph nodes and specialized agents in later phases.

Current Features

Backend:

  • GET /health health check.
  • POST /research to run a research request.
  • GET /research/history to list saved reports.
  • GET /research/reports/{report_id} to reopen one saved report.
  • NewsAPI-backed web search when SEARCH_API_KEY is configured.
  • Mocked development sources when no search key is present.
  • Article extraction with httpx and a lightweight HTML parser.
  • Local deterministic summarization fallback.
  • Optional OpenAI-compatible summarization path.
  • SQLAlchemy models for users, research queries, articles, reports, and user memory.
  • Alembic scaffold for database migrations.

Frontend:

  • Main research workspace at /.
  • Saved reports page at /reports.
  • Saved report detail page at /reports/[id].
  • Activity dashboard at /dashboard.
  • Markdown report rendering.
  • Source link panels and extracted-article metadata.

Architecture

frontend/
  Next.js App Router
  Tailwind CSS
  API client
  Report pages and dashboard
        |
        v
backend/
  FastAPI routes
        |
        v
  ResearchService
        |
        +--> WebSearchService
        |      - NewsAPI integration
        |      - normalization
        |      - duplicate removal
        |      - relevance ranking
        |
        +--> ContentExtractionService
        |      - HTML fetching
        |      - page cleanup
        |      - article text extraction
        |      - metadata capture
        |
        +--> SummarizationService
        |      - article summaries
        |      - key points
        |      - Markdown report synthesis
        |
        +--> ResearchHistoryService
               - SQLAlchemy persistence
               - saved report list
               - saved report detail

Tech Stack

Layer Technology
Frontend Next.js, React, Tailwind CSS
Backend FastAPI, Python
Validation Pydantic
Search NewsAPI
HTTP httpx
Persistence SQLAlchemy, Alembic
Local DB SQLite
Production DB Target PostgreSQL via DATABASE_URL
LLM Local fallback by default, OpenAI optional
Linting Ruff, ESLint

API Overview

Health

GET /health

Example response:

{
  "status": "ok",
  "service": "News Research Agent",
  "environment": "development"
}

Run Research

POST /research

Request:

{
  "query": "Latest AI startup funding news this week",
  "max_sources": 3,
  "date_range": {
    "start_date": "2026-05-01",
    "end_date": "2026-05-15"
  }
}

Response includes:

  • query_id
  • report_id
  • summary
  • Markdown report
  • normalized sources
  • extracted articles
  • structured article_summaries
  • generated_at

Saved Reports

GET /research/history
GET /research/reports/{report_id}

The saved report detail includes the original query, final report, source list, article records, summaries, extracted text where available, and timestamps.

Project Structure

backend/
  app/
    agents/
    memory/
    models/
      research.py
    rag/
    routes/
      health.py
      research.py
    schemas/
      article.py
      research.py
      search.py
    services/
      content_extraction.py
      history.py
      research_service.py
      summarization.py
      web_search.py
    utils/
    workflows/
    config.py
    database.py
    main.py
  alembic/
  alembic.ini
  requirements.txt
  pyproject.toml

frontend/
  app/
    dashboard/
    reports/
    globals.css
    layout.tsx
    page.tsx
  components/
  services/
  package.json

docs/
  phase-01-setup.md
  phase-02-api-foundation.md
  phase-03-web-search.md
  phase-04-article-extraction.md
  phase-05-summarization-report.md
  phase-06-database-history.md
  phase-07-frontend-mvp.md

Run Locally

1. Backend

cd backend
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m uvicorn app.main:app --reload --host 127.0.0.1 --port 8000

Health check:

Invoke-RestMethod http://127.0.0.1:8000/health

2. Frontend

In a second terminal:

cd frontend
npm install
npm.cmd run dev

Open:

http://localhost:3000

Environment Variables

Create .env from .env.example.

APP_NAME="News Research Agent"
ENVIRONMENT=development
CORS_ORIGINS=["http://localhost:3000","http://127.0.0.1:3000"]
LOG_LEVEL=INFO

OPENAI_API_KEY=
GEMINI_API_KEY=
OPENAI_API_BASE_URL=https://api.openai.com/v1/chat/completions

SEARCH_API_KEY=
SEARCH_PROVIDER=newsapi
SEARCH_API_BASE_URL=https://newsapi.org/v2/everything
SEARCH_LANGUAGE=en
SEARCH_SORT_BY=relevancy
SEARCH_TIMEOUT_SECONDS=10

CONTENT_EXTRACTION_TIMEOUT_SECONDS=10
CONTENT_EXTRACTION_MIN_WORDS=80
CONTENT_EXTRACTION_USER_AGENT=NewsResearchAgent/0.1

DEFAULT_LLM_PROVIDER=mock
DEFAULT_MODEL=mock-news-researcher
DEFAULT_MAX_SOURCES=5
LLM_TIMEOUT_SECONDS=30
LLM_TEMPERATURE=0.2

DATABASE_URL=sqlite:///./news_agent.db
REDIS_URL=

NEXT_PUBLIC_API_BASE_URL=http://localhost:8000

API keys are optional for local Phase 7 verification. Without SEARCH_API_KEY, the backend uses mocked sources and still generates and saves reports. For live news, set SEARCH_API_KEY. For OpenAI summaries, set DEFAULT_LLM_PROVIDER=openai, provide OPENAI_API_KEY, and choose an OpenAI chat model in DEFAULT_MODEL.

Verification

Backend:

cd backend
ruff check app alembic
python -m compileall app alembic

Backend smoke test:

cd backend
python -c "from fastapi.testclient import TestClient; from app.main import app; client=TestClient(app); r=client.post('/research', json={'query':'Latest AI startup funding news this week','max_sources':3}); data=r.json(); print(r.status_code, data['report_id']); print(client.get('/research/history').status_code); print(client.get('/research/reports/'+data['report_id']).status_code)"

Frontend:

cd frontend
npm.cmd run lint
npm.cmd run build

Roadmap

Completed:

  • Phase 1: Project setup and repository structure.
  • Phase 2: Backend API foundation.
  • Phase 3: Web search system.
  • Phase 4: Article extraction.
  • Phase 5: Summarization and Markdown report MVP.
  • Phase 6: Database and research history.
  • Phase 7: Frontend MVP.

Next:

  • Phase 8: Convert the pipeline into a LangGraph workflow.
  • Phase 9: Split workflow responsibilities into specialized agents.
  • Phase 10: Add memory and RAG over previous research.
  • Phase 11: Add deeper trend analytics and charts.
  • Phase 12+: Add background jobs, scheduling, exports, credibility controls, testing, deployment, and portfolio polish.

Engineering Notes

  • The response shape is intentionally stable so the frontend can use the same report data immediately after generation and later from saved history.
  • Search, extraction, summarization, and persistence live in separate services, making the current MVP ready for workflow orchestration.
  • External failures are isolated. Missing search credentials use mocked sources, and extraction failures are logged and skipped per source.
  • SQLite is the default for easy local demos, while the schema and SQLAlchemy setup are prepared for PostgreSQL.

Author

Ankit Kumar Singh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors