Skip to content

Latest commit

 

History

History
88 lines (68 loc) · 4.28 KB

File metadata and controls

88 lines (68 loc) · 4.28 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

DeepScrap is an agent-powered deep research tool that produces LaTeX-to-PDF investment research reports on publicly traded companies. Uses a multi-agent architecture with a Synthesis-driven feedback loop. Emphasizes forensic due diligence, rigorous valuation, and forward-looking catalyst analysis.

Development Commands

# Install in dev mode
pip install -e ".[dev]"

# Run all tests
pytest tests/ -v

# Run single test
pytest tests/path/test_file.py::TestClass::test_name -v

# Run the tool (from project root; or use python -m deepscrap.cli.main)
deepscrap analyze AAPL --depth shallow
deepscrap analyze AAPL --depth medium
deepscrap analyze AAPL --depth deep --verbose

Architecture

Multi-agent system with hub-and-spoke design:

  • Orchestrator (agents/orchestrator.py) — coordinates generic research agents, runs iterative feedback loop
  • Research Agents (Sonnet, agents/research.py) — N generic agents (agent_1 through agent_N) that autonomously plan which data sources to query based on their directive. Count is configurable via Settings.max_sub_agents (default 6).
  • Synthesis Agent (Opus, agents/synthesis.py) — the brain; evaluates coverage, directs research agents by name, produces final qualitative analysis. Dynamic prompt includes agent count so it only references valid agent names.
  • Adversarial Reviewer (GPT, agents/adversarial.py) — challenges Synthesis output
  • Report Generator (report/generator.py) — Jinja2 LaTeX templates → PDF. Auto-populates appendix (sources list from store metadata, methodology description).

Key layers:

  • llm/ — Provider abstraction (Claude + OpenAI) with role-based model routing via registry.py. Supports adaptive thinking and effort levels (output_config.effort) mapped from analysis depth.
  • sources/ — Pluggable data sources (Yahoo Finance, SEC EDGAR, Serper.dev, Google News, OpenInsider)
  • store/ — Pydantic models + ResearchStore with JSON persistence and coverage tracking
  • report/ — Chart generation (matplotlib) + LaTeX templates (Jinja2 with <% %> delimiters) + PDF compilation

Report sections (12-section structure):

  1. Executive Summary
  2. Business Analysis
  3. Financial Deep Dive (includes accounting quality & red flags)
  4. Valuation Analysis (comps, DCF, sum-of-parts, pricing verdict)
  5. Leadership Assessment
  6. Forensic & Red Flag Analysis (related-party deals, suspicious M&A, SEC inquiries, insider patterns)
  7. Bull Case
  8. Bear Case
  9. Risk Matrix
  10. Forward-Looking Analysis (next earnings preview, catalyst timeline, what to watch)
  11. Conclusion
  12. Confidence Assessment (overall, per-section breakdown, evidence gaps, source quality)

Research flow:

  1. Orchestrator dispatches 6 research agents with targeted initial directives:
    • Company overview + suspicious deals
    • Deep financial analysis + accounting red flags
    • Valuation models + analyst targets
    • Executive team + insider trading patterns
    • Forward-looking analysis + catalyst events
    • Industry/regulatory/sentiment + short interest
  2. Synthesis evaluates coverage map, issues targeted directives to agent_1..agent_N
  3. Iterate until satisfied (or depth limit reached)
  4. Synthesis produces 12-section qualitative-first analysis
  5. GPT adversarially reviews → Opus revises/rebuts
  6. Report agent renders LaTeX → PDF (appendix auto-populated from store)

Reference system:

  • Research agents register ReferenceEntry objects in the store as they fetch data
  • SEC filings are downloaded to {output_dir}/references/ during research
  • Report generator builds a numbered bibliography with \href{} links
  • Inline [Source: ...] citations are linked to bibliography entries via keyword matching

Effort levels (Claude API):

  • --depth shallow → effort medium
  • --depth medium → effort high
  • --depth deep → effort max (Opus only)

All Claude calls use adaptive thinking (thinking: {type: "adaptive"}).

Configuration

API keys via environment variables or .env file: ANTHROPIC_API_KEY, OPENAI_API_KEY, SERPER_API_KEY

Model routing defined in llm/config.py — Sonnet for research agents, Opus for synthesis, GPT for adversarial review.