🤖 AgentLead - AI-Powered Research & Personalized Outreach

Next-generation lead intelligence platform that combines multi-agent AI research with personalized outreach generation. No templates, no sequences - every message is uniquely crafted based on real business intelligence.

Transform your cold outreach with AI agents that research companies in-depth and generate highly personalized messages based on actual business insights.

⚡ Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Set up environment variables
cp .env.template .env
# Add your API keys (see Configuration section)

# 3. Activate virtual environment
source agent-lead/bin/activate

# 4. Run the full AI pipeline (progress tracking enabled by default)
python run.py outreach --leads-file your_leads.csv --output results --format both

# 5. View your personalized outreach
cat results_outreach_*.json

🎯 What Makes AgentLead Different

Traditional Tools:

Mail merge with static templates
Basic data append services
Generic personalization tokens

AgentLead:

AI-native research on every company using multiple sources
Unique content generation - no templates or sequences
Multi-agent orchestration with CrewAI for sophisticated workflows
Real-time intelligence from web, news, and social sources
True personalization based on actual business insights

🧠 Architecture Overview

CSV Input → LeadProfile → Multi-Agent Research → AI Personalization → Export
                              ↓
                    ┌─────────────────────────┐
                    │   Research Agents       │
                    ├─────────────────────────┤
                    │ • FirecrawlSource       │ ← Website scraping + AI extraction
                    │ • TavilySearchSource    │ ← Intelligent web search  
                    │ • TavilyExtractSource   │ ← Deep content analysis
                    │ • GrokDeepSearchSource  │ ← Real-time social/news intel
                    └─────────────────────────┘
                              ↓
                    ┌─────────────────────────┐
                    │  Intelligence Synthesis │ ← Combine all sources
                    └─────────────────────────┘
                              ↓
                    ┌─────────────────────────┐
                    │ Personalization Agent   │ ← Generate unique outreach
                    └─────────────────────────┘

Core Components

Multi-Agent Orchestration: CrewAI manages the entire pipeline
Research Layer: 4+ AI-powered sources gather business intelligence
Intelligence Synthesis: Combines insights into unified company profiles
Personalization Engine: Generates unique, contextual outreach
Export System: Multiple formats with analytics and visualization

🚀 Core Features

🔍 Multi-Source AI Research

Website Intelligence: AI-powered extraction from company websites
Web Search: Intelligent search across the internet for recent information
Deep Content Analysis: Extract insights from documents and long-form content
Real-time Social Intelligence: Live data from news, social media, and market discussions

🎯 AI-Powered Personalization

No Templates: Every email is uniquely generated
Context-Aware: References specific company information and challenges
Intelligent Hooks: AI identifies the most relevant pain points and opportunities
Professional Quality: Maintains professional tone while being highly specific

📊 Enterprise Features

Real-time Progress Tracking: Monitor research and generation in real-time
Streaming CSV Export: Results write to CSV immediately as they complete (no waiting!)
Batch Processing: Handle hundreds of leads efficiently
Multiple Export Formats: JSON, CSV, Excel, HTML with analytics
Quality Scoring: Confidence scores and personalization metrics

🛠️ Installation

Prerequisites

Python 3.8+
Virtual environment (recommended)

Setup

# Clone repository
git clone https://github.com/your-org/AgentLead.git
cd AgentLead

# Create virtual environment
python -m venv agent-lead
source agent-lead/bin/activate  # On Windows: agent-lead\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.template .env

Required API Keys

# Core AI APIs
OPENAI_API_KEY=your_openai_key_here

# Research Sources (at least one required)
FIRECRAWL_API_KEY=your_firecrawl_key_here
TAVILY_API_KEY=your_tavily_key_here
XAI_API_KEY=your_grok_key_here  # Optional

# Optional: For enhanced features
ANTHROPIC_API_KEY=your_claude_key_here

💡 Usage Examples

Basic Research & Outreach

# Research companies and generate personalized outreach (progress enabled by default)
python run.py outreach --leads-file leads.csv --output campaign --format both

Sample Output:

{
  "company_name": "25 Again",
  "services": [
    "Hormone Therapy",
    "Weight Loss Program", 
    "Age Management",
    "Aesthetic Services"
  ],
  "growth_signals": [
    "Comprehensive health and aesthetics membership program",
    "Multiple locations across Kentucky and Indiana"
  ],
  "personalized_outreach": {
    "subject_line": "Your holistic approach at 25 Again caught my eye",
    "email_body": "Hi Kasey,\n\nNoticed 25 Again's unique approach combining hormone therapy with aesthetic services...",
    "personalization_score": 0.85
  }
}

Research Only

# Just research without outreach generation
python run.py research --leads-file leads.csv --output research --export-format json --analytics

Full Pipeline with Options

# Complete pipeline with all features
python run.py full-pipeline \
  --leads-file medical_practices.csv \
  --output complete_campaign \
  --output-format both \
  --max-leads 50 \
  --progress \
  --analytics \
  --charts

Batch Processing

# Process large datasets efficiently
python run.py batch --config batch_config.json --progress --resume-from 100

📋 CSV Input Format

Your CSV should contain these columns:

companyName,companyDomain,contactName,contactTitle,contactEmail
"Acme Corp","acme.com","John Smith","CEO","john@acme.com"
"Tech Startup","techstartup.com","Jane Doe","CTO","jane@techstartup.com"

Required Fields:

companyName: Company name to research
companyDomain: Company website domain
contactName: Contact person name
contactTitle: Contact's job title

Optional Fields:

contactEmail: Contact email
companyIndustry: Industry classification
companyLocation: Company location

🎛️ CLI Reference

Main Commands

Command	Description	Example
`research`	Multi-source company research	`python run.py research --leads-file leads.csv`
`outreach`	Generate personalized messages	`python run.py outreach --leads-file leads.csv`
`full-pipeline`	Complete research + outreach	`python run.py full-pipeline --leads-file leads.csv`
`export-research`	Export existing research	`python run.py export-research --input data.json`

System Commands

Command	Description	Example
`health`	System health check	`python run.py health --full`
`validate`	Validate configuration	`python run.py validate`

Common Options

Option	Description	Example
`--leads-file`	Input CSV file	`--leads-file leads.csv`
`--output`	Output file prefix	`--output campaign_results`
`--format`	Export format	`--format json`, `--format csv`, `--format both`
`--limit`	Max leads to process	`--limit 50`
`--progress/--no-progress`	Show real-time progress (default: on)	`--no-progress`
`--analytics`	Include analytics report	`--analytics`

📊 Output Formats

JSON Output (Detailed)

{
  "metadata": {
    "timestamp": "20250124_143022",
    "total_leads": 3,
    "successful": 3,
    "average_personalization_score": 0.87
  },
  "results": [
    {
      "company_name": "Health Conscious Living",
      "research_confidence": 0.82,
      "sources_used": [
        "FirecrawlSource",
        "TavilySearchSource",
        "https://healthconsciouslivinginc.com"
      ],
      "services": [
        "Health Conscious Living Podcast",
        "Spiritual Self-Development",
        "Private Sessions"
      ],
      "growth_signals": [
        "Featured on Authority Magazine and Apple Podcasts",
        "Integration of science, medicine, and spirituality"
      ],
      "personalized_outreach": {
        "subject_line": "Your spiritual guidance scaling approach?",
        "opening_line": "Noticed your Health Conscious Living podcast is making waves...",
        "email_body": "Hi Dr. Myers,\n\nNoticed your Health Conscious Living podcast...",
        "personalization_score": 0.9
      }
    }
  ]
}

CSV Output (Spreadsheet-friendly)

company_name,contact_name,research_confidence,personalization_score,subject_line,services,growth_signals
"Health Conscious Living","Gayle Myers MD",0.82,0.9,"Your spiritual guidance scaling approach?","Health Conscious Living Podcast; Spiritual Self-Development","Featured on Authority Magazine and Apple Podcasts"

🔧 Configuration

Environment Variables

# Core Configuration
LOG_LEVEL=INFO
DEBUG=false

# API Settings
OPENAI_MODEL=gpt-4o-mini
MAX_CONCURRENT_AGENTS=3
REQUEST_TIMEOUT=120

# API Control
MAX_TOKENS_PER_REQUEST=4000

# Research Settings
MAX_SOURCES_PER_LEAD=5
RESEARCH_TIMEOUT=300

Research Source Configuration

Each research source can be configured:

# In your config
research_config = {
    'sources': {
        'firecrawl': {
            'enabled': True,
            'priority': 1,
            'timeout': 30
        },
        'tavily_search': {
            'enabled': True,
            'priority': 2,
            'search_depth': 'advanced'
        }
    }
}

🎯 Best Practices

Input Data Quality

Clean domains: Ensure company domains are valid
Complete contact info: Include names and titles for better personalization
Industry context: Add industry information when available

API Usage

Rate limiting: Built-in rate limiting prevents API throttling
Batch processing: Use batch mode for large datasets (500+ leads)
Cost estimation: Use small batches (--limit 10) to estimate costs for larger datasets

Personalization Quality

Review samples: Always review a few generated emails before sending
A/B testing: Test different approaches on small segments
Industry-specific: Consider industry-specific customizations

🔍 Troubleshooting

Common Issues

"No API key configured"

# Check your .env file
cat .env | grep API_KEY

# Validate configuration
python run.py validate

"Rate limit exceeded"

# Use built-in delays
python run.py outreach --leads-file leads.csv --delay 2.0

# Process in smaller batches
python run.py outreach --leads-file leads.csv --limit 10

"Research quality is low"

# Check input data quality
python run.py health --full

# Enable more research sources
# Add TAVILY_API_KEY and XAI_API_KEY to .env

Performance Optimization

For large datasets (1000+ leads):

# Use batch processing
python run.py batch --config batch_config.json --concurrent 5

# Process in chunks
python run.py outreach --leads-file leads.csv --limit 100 --output batch1

For cost estimation:

# Get precise upfront cost estimate for any dataset
python run.py outreach --leads-file leads.csv --limit 2500

# Shows exact costs before processing:
# 💰 Estimated cost: ~$1,217.41 ($0.488 per lead)
#    OpenAI: $0.471 per lead (gpt-4o-mini/gpt-4o-mini)
#    External APIs: $0.017 per lead
# ⚠️  Continue with outreach generation? [y/N]

📈 Performance Metrics

Typical Performance

Research Success Rate: 85-95% (depending on data quality)
Personalization Quality: 0.8+ average personalization score
Processing Speed: 1-2 leads per minute (with full research)
Precise Cost Estimation: Real-time calculation using actual token counts and API pricing

Quality Benchmarks

High Quality (0.85+ score): Specific references to company services/challenges
Medium Quality (0.7-0.84 score): General personalization with some specifics
Low Quality (<0.7 score): Generic messaging (review input data)

🔐 Security & Privacy

API Key Security: All keys stored in environment variables
Data Privacy: No data stored permanently, processed in memory
Rate Limiting: Built-in protections against API abuse
Error Handling: Graceful handling of API failures

📝 Development

Project Structure

AgentLead/
├── src/
│   ├── agents/              # Multi-agent components
│   │   ├── models.py           # Data models (LeadProfile, CompanyIntelligence)
│   │   ├── research_agent.py   # Research orchestration
│   │   ├── personalization_agent.py  # Outreach generation
│   │   ├── firecrawl_source.py # Website scraping + AI extraction
│   │   ├── tavily_*_source.py  # Web search and extraction
│   │   └── crew_manager.py     # CrewAI orchestration
│   └── core/                # Core utilities
│       ├── research_exporter.py # Export functionality
│       └── progress_tracker.py # Real-time progress
├── config/                  # Configuration files
├── tests/                   # Test suite
└── run.py                   # Main CLI application

Running Tests

# Run all tests
pytest tests/ -v

# Test specific components
pytest tests/test_research_agent.py -v

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Add tests for new functionality
Ensure all tests pass: pytest
Submit a pull request

📄 License

MIT License - see LICENSE for details.

AgentLead - Transforming cold outreach with AI-powered research and personalization. No templates, no sequences, just intelligent, contextual communication that gets results.

Questions? Open an issue or check the troubleshooting section above.

FilesExpand file tree

README.md

Latest commit

History