Skip to content

Latest commit

Β 

History

History
468 lines (371 loc) Β· 14.7 KB

File metadata and controls

468 lines (371 loc) Β· 14.7 KB

πŸ€– AgentLead - AI-Powered Research & Personalized Outreach

Next-generation lead intelligence platform that combines multi-agent AI research with personalized outreach generation. No templates, no sequences - every message is uniquely crafted based on real business intelligence.

Transform your cold outreach with AI agents that research companies in-depth and generate highly personalized messages based on actual business insights.

⚑ Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Set up environment variables
cp .env.template .env
# Add your API keys (see Configuration section)

# 3. Activate virtual environment
source agent-lead/bin/activate

# 4. Run the full AI pipeline (progress tracking enabled by default)
python run.py outreach --leads-file your_leads.csv --output results --format both

# 5. View your personalized outreach
cat results_outreach_*.json

🎯 What Makes AgentLead Different

Traditional Tools:

  • Mail merge with static templates
  • Basic data append services
  • Generic personalization tokens

AgentLead:

  • AI-native research on every company using multiple sources
  • Unique content generation - no templates or sequences
  • Multi-agent orchestration with CrewAI for sophisticated workflows
  • Real-time intelligence from web, news, and social sources
  • True personalization based on actual business insights

🧠 Architecture Overview

CSV Input β†’ LeadProfile β†’ Multi-Agent Research β†’ AI Personalization β†’ Export
                              ↓
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Research Agents       β”‚
                    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                    β”‚ β€’ FirecrawlSource       β”‚ ← Website scraping + AI extraction
                    β”‚ β€’ TavilySearchSource    β”‚ ← Intelligent web search  
                    β”‚ β€’ TavilyExtractSource   β”‚ ← Deep content analysis
                    β”‚ β€’ GrokDeepSearchSource  β”‚ ← Real-time social/news intel
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Intelligence Synthesis β”‚ ← Combine all sources
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ Personalization Agent   β”‚ ← Generate unique outreach
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

  1. Multi-Agent Orchestration: CrewAI manages the entire pipeline
  2. Research Layer: 4+ AI-powered sources gather business intelligence
  3. Intelligence Synthesis: Combines insights into unified company profiles
  4. Personalization Engine: Generates unique, contextual outreach
  5. Export System: Multiple formats with analytics and visualization

πŸš€ Core Features

πŸ” Multi-Source AI Research

  • Website Intelligence: AI-powered extraction from company websites
  • Web Search: Intelligent search across the internet for recent information
  • Deep Content Analysis: Extract insights from documents and long-form content
  • Real-time Social Intelligence: Live data from news, social media, and market discussions

🎯 AI-Powered Personalization

  • No Templates: Every email is uniquely generated
  • Context-Aware: References specific company information and challenges
  • Intelligent Hooks: AI identifies the most relevant pain points and opportunities
  • Professional Quality: Maintains professional tone while being highly specific

πŸ“Š Enterprise Features

  • Real-time Progress Tracking: Monitor research and generation in real-time
  • Streaming CSV Export: Results write to CSV immediately as they complete (no waiting!)
  • Batch Processing: Handle hundreds of leads efficiently
  • Multiple Export Formats: JSON, CSV, Excel, HTML with analytics
  • Quality Scoring: Confidence scores and personalization metrics

πŸ› οΈ Installation

Prerequisites

  • Python 3.8+
  • Virtual environment (recommended)

Setup

# Clone repository
git clone https://github.com/your-org/AgentLead.git
cd AgentLead

# Create virtual environment
python -m venv agent-lead
source agent-lead/bin/activate  # On Windows: agent-lead\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.template .env

Required API Keys

# Core AI APIs
OPENAI_API_KEY=your_openai_key_here

# Research Sources (at least one required)
FIRECRAWL_API_KEY=your_firecrawl_key_here
TAVILY_API_KEY=your_tavily_key_here
XAI_API_KEY=your_grok_key_here  # Optional

# Optional: For enhanced features
ANTHROPIC_API_KEY=your_claude_key_here

πŸ’‘ Usage Examples

Basic Research & Outreach

# Research companies and generate personalized outreach (progress enabled by default)
python run.py outreach --leads-file leads.csv --output campaign --format both

Sample Output:

{
  "company_name": "25 Again",
  "services": [
    "Hormone Therapy",
    "Weight Loss Program", 
    "Age Management",
    "Aesthetic Services"
  ],
  "growth_signals": [
    "Comprehensive health and aesthetics membership program",
    "Multiple locations across Kentucky and Indiana"
  ],
  "personalized_outreach": {
    "subject_line": "Your holistic approach at 25 Again caught my eye",
    "email_body": "Hi Kasey,\n\nNoticed 25 Again's unique approach combining hormone therapy with aesthetic services...",
    "personalization_score": 0.85
  }
}

Research Only

# Just research without outreach generation
python run.py research --leads-file leads.csv --output research --export-format json --analytics

Full Pipeline with Options

# Complete pipeline with all features
python run.py full-pipeline \
  --leads-file medical_practices.csv \
  --output complete_campaign \
  --output-format both \
  --max-leads 50 \
  --progress \
  --analytics \
  --charts

Batch Processing

# Process large datasets efficiently
python run.py batch --config batch_config.json --progress --resume-from 100

πŸ“‹ CSV Input Format

Your CSV should contain these columns:

companyName,companyDomain,contactName,contactTitle,contactEmail
"Acme Corp","acme.com","John Smith","CEO","john@acme.com"
"Tech Startup","techstartup.com","Jane Doe","CTO","jane@techstartup.com"

Required Fields:

  • companyName: Company name to research
  • companyDomain: Company website domain
  • contactName: Contact person name
  • contactTitle: Contact's job title

Optional Fields:

  • contactEmail: Contact email
  • companyIndustry: Industry classification
  • companyLocation: Company location

πŸŽ›οΈ CLI Reference

Main Commands

Command Description Example
research Multi-source company research python run.py research --leads-file leads.csv
outreach Generate personalized messages python run.py outreach --leads-file leads.csv
full-pipeline Complete research + outreach python run.py full-pipeline --leads-file leads.csv
export-research Export existing research python run.py export-research --input data.json

System Commands

Command Description Example
health System health check python run.py health --full
validate Validate configuration python run.py validate

Common Options

Option Description Example
--leads-file Input CSV file --leads-file leads.csv
--output Output file prefix --output campaign_results
--format Export format --format json, --format csv, --format both
--limit Max leads to process --limit 50
--progress/--no-progress Show real-time progress (default: on) --no-progress
--analytics Include analytics report --analytics

πŸ“Š Output Formats

JSON Output (Detailed)

{
  "metadata": {
    "timestamp": "20250124_143022",
    "total_leads": 3,
    "successful": 3,
    "average_personalization_score": 0.87
  },
  "results": [
    {
      "company_name": "Health Conscious Living",
      "research_confidence": 0.82,
      "sources_used": [
        "FirecrawlSource",
        "TavilySearchSource",
        "https://healthconsciouslivinginc.com"
      ],
      "services": [
        "Health Conscious Living Podcast",
        "Spiritual Self-Development",
        "Private Sessions"
      ],
      "growth_signals": [
        "Featured on Authority Magazine and Apple Podcasts",
        "Integration of science, medicine, and spirituality"
      ],
      "personalized_outreach": {
        "subject_line": "Your spiritual guidance scaling approach?",
        "opening_line": "Noticed your Health Conscious Living podcast is making waves...",
        "email_body": "Hi Dr. Myers,\n\nNoticed your Health Conscious Living podcast...",
        "personalization_score": 0.9
      }
    }
  ]
}

CSV Output (Spreadsheet-friendly)

company_name,contact_name,research_confidence,personalization_score,subject_line,services,growth_signals
"Health Conscious Living","Gayle Myers MD",0.82,0.9,"Your spiritual guidance scaling approach?","Health Conscious Living Podcast; Spiritual Self-Development","Featured on Authority Magazine and Apple Podcasts"

πŸ”§ Configuration

Environment Variables

# Core Configuration
LOG_LEVEL=INFO
DEBUG=false

# API Settings
OPENAI_MODEL=gpt-4o-mini
MAX_CONCURRENT_AGENTS=3
REQUEST_TIMEOUT=120

# API Control
MAX_TOKENS_PER_REQUEST=4000

# Research Settings
MAX_SOURCES_PER_LEAD=5
RESEARCH_TIMEOUT=300

Research Source Configuration

Each research source can be configured:

# In your config
research_config = {
    'sources': {
        'firecrawl': {
            'enabled': True,
            'priority': 1,
            'timeout': 30
        },
        'tavily_search': {
            'enabled': True,
            'priority': 2,
            'search_depth': 'advanced'
        }
    }
}

🎯 Best Practices

Input Data Quality

  • Clean domains: Ensure company domains are valid
  • Complete contact info: Include names and titles for better personalization
  • Industry context: Add industry information when available

API Usage

  • Rate limiting: Built-in rate limiting prevents API throttling
  • Batch processing: Use batch mode for large datasets (500+ leads)
  • Cost estimation: Use small batches (--limit 10) to estimate costs for larger datasets

Personalization Quality

  • Review samples: Always review a few generated emails before sending
  • A/B testing: Test different approaches on small segments
  • Industry-specific: Consider industry-specific customizations

πŸ” Troubleshooting

Common Issues

"No API key configured"

# Check your .env file
cat .env | grep API_KEY

# Validate configuration
python run.py validate

"Rate limit exceeded"

# Use built-in delays
python run.py outreach --leads-file leads.csv --delay 2.0

# Process in smaller batches
python run.py outreach --leads-file leads.csv --limit 10

"Research quality is low"

# Check input data quality
python run.py health --full

# Enable more research sources
# Add TAVILY_API_KEY and XAI_API_KEY to .env

Performance Optimization

For large datasets (1000+ leads):

# Use batch processing
python run.py batch --config batch_config.json --concurrent 5

# Process in chunks
python run.py outreach --leads-file leads.csv --limit 100 --output batch1

For cost estimation:

# Get precise upfront cost estimate for any dataset
python run.py outreach --leads-file leads.csv --limit 2500

# Shows exact costs before processing:
# πŸ’° Estimated cost: ~$1,217.41 ($0.488 per lead)
#    OpenAI: $0.471 per lead (gpt-4o-mini/gpt-4o-mini)
#    External APIs: $0.017 per lead
# ⚠️  Continue with outreach generation? [y/N]

πŸ“ˆ Performance Metrics

Typical Performance

  • Research Success Rate: 85-95% (depending on data quality)
  • Personalization Quality: 0.8+ average personalization score
  • Processing Speed: 1-2 leads per minute (with full research)
  • Precise Cost Estimation: Real-time calculation using actual token counts and API pricing

Quality Benchmarks

  • High Quality (0.85+ score): Specific references to company services/challenges
  • Medium Quality (0.7-0.84 score): General personalization with some specifics
  • Low Quality (<0.7 score): Generic messaging (review input data)

πŸ” Security & Privacy

  • API Key Security: All keys stored in environment variables
  • Data Privacy: No data stored permanently, processed in memory
  • Rate Limiting: Built-in protections against API abuse
  • Error Handling: Graceful handling of API failures

πŸ“ Development

Project Structure

AgentLead/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agents/              # Multi-agent components
β”‚   β”‚   β”œβ”€β”€ models.py           # Data models (LeadProfile, CompanyIntelligence)
β”‚   β”‚   β”œβ”€β”€ research_agent.py   # Research orchestration
β”‚   β”‚   β”œβ”€β”€ personalization_agent.py  # Outreach generation
β”‚   β”‚   β”œβ”€β”€ firecrawl_source.py # Website scraping + AI extraction
β”‚   β”‚   β”œβ”€β”€ tavily_*_source.py  # Web search and extraction
β”‚   β”‚   └── crew_manager.py     # CrewAI orchestration
β”‚   └── core/                # Core utilities
β”‚       β”œβ”€β”€ research_exporter.py # Export functionality
β”‚       └── progress_tracker.py # Real-time progress
β”œβ”€β”€ config/                  # Configuration files
β”œβ”€β”€ tests/                   # Test suite
└── run.py                   # Main CLI application

Running Tests

# Run all tests
pytest tests/ -v

# Test specific components
pytest tests/test_research_agent.py -v

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Add tests for new functionality
  4. Ensure all tests pass: pytest
  5. Submit a pull request

πŸ“„ License

MIT License - see LICENSE for details.


AgentLead - Transforming cold outreach with AI-powered research and personalization. No templates, no sequences, just intelligent, contextual communication that gets results.

Questions? Open an issue or check the troubleshooting section above.