Skip to content

maghamalyan/deep-research-kit

Repository files navigation

Deep Research Kit

A multi-agent system built with LangGraph for conducting deep, iterative research on any topic. Given a research task, it plans queries, searches the web, crawls and analyzes sources, generates follow-up questions, and synthesizes a cited report — all autonomously.

How it works

The system is a LangGraph state graph driven by a set of specialized agents:

Agent Responsibility
Topic Analyst Breaks the task into an initial set of search queries and generates follow-up queries as findings accumulate.
Search Agent Runs queries against a search provider and filters results by relevance.
Content Analyzer Crawls sources, chunks their content, and extracts structured findings with importance/confidence scores.
Report Generator Synthesizes findings into a structured, cited Markdown report.
Supervisor Oversees the loop and decides when the research is deep enough to stop.

The graph runs as a loop — search → analyze → decide next step → follow-up → report — up to a configurable max_depth.

Architecture highlights

  • Pluggable providers — swap LLMs, search engines, and crawlers via config:
    • LLM: OpenAI, Anthropic, Google (Gemini, default).
    • Search: SerpAPI (default), Google CSE, plus a Tavily/Bing-friendly interface.
    • Crawler: Firecrawl.
  • Custom async HTTP client (deep_research/providers/http) with a middleware pipeline: retries, response caching (in-memory or Redis), client-side rate limiting, circuit breaking, structured error handling, and metrics collection.
  • Typed configuration — every knob (timeouts, thresholds, concurrency, model choice) is a Pydantic model with sensible defaults; override any of them via get_graph_config(**overrides).

Installation

Requires Python ≥ 3.10.

git clone https://github.com/maghamalyan/deep-research-kit.git
cd deep-research-kit
pip install -e .

Configuration

Copy the example env file and fill in keys for the providers you intend to use:

cp env.example .env

With the default configuration you need:

  • GOOGLE_API_KEY — for the default Gemini LLM
  • SERPAPI_API_KEY — for the default search provider
  • FIRECRAWL_API_KEY — for content extraction

Other supported keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, TAVILY_API_KEY, BING_SEARCH_API_KEY, GOOGLE_SEARCH_API_KEY / GOOGLE_SEARCH_CX) let you switch providers via config. Keys are read from the environment — none are ever hardcoded.

Usage

Command line

python main.py \
  --research-task "Impact of agentic AI on scientific research workflows" \
  --research-context "Focus on peer-reviewed work from the last two years" \
  --max-depth 3 \
  --output report.md

Interactive mode

python main.py --interactive

Visualize the graph

python main.py --visualize   # writes research_graph.png

Programmatic

import asyncio
from deep_research import run_research
from deep_research.graph.config import get_graph_config

async def main():
    result = await run_research(
        research_task="State of the art in retrieval-augmented generation",
        research_context="Prioritize open-source approaches",
        config=get_graph_config(graph__max_depth=3),
    )
    if result.final_report:
        print(result.final_report.to_markdown())

asyncio.run(main())

Development

pip install -e ".[dev,test]"   # dev + test extras
pre-commit install
pytest tests/                  # run the test suite

Linting/formatting/typing are wired through pre-commit (black, isort, flake8, mypy).

License

MIT — see LICENSE.

About

A multi-agent system built with LangGraph for conducting deep, iterative research on any topic — plans queries, searches, crawls, analyzes sources, and synthesizes a cited report.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages