Skip to content

Latest commit

Β 

History

History
161 lines (118 loc) Β· 7.12 KB

File metadata and controls

161 lines (118 loc) Β· 7.12 KB

πŸ•΅οΈβ€β™‚οΈ OSINT-Harvester

OSINT-Harvester is an advanced Python-based command-line and API-driven tool for harvesting emails, subdomains, and intelligence from multiple data sources. It includes CAPTCHA bypassing, proxy support, browser emulation, and risk scoring via VirusTotal and AbuseIPDB. The tool integrates with SpiderFoot, Maltego, FOFA, ZoomEye, and Censys.


πŸš€ Features

  • βœ… Email & Subdomain Harvesting:

    • Search Engines: Google, Bing, Yahoo, DuckDuckGo
    • Certificate Transparency: crt.sh
    • DNS Discovery: DNSDumpster
  • 🧠 Intelligence & Scoring:

    • VirusTotal domain/IP scoring
    • AbuseIPDB abuse classification
    • Risk-level output
  • 🧩 Tool Integrations:

    • SpiderFoot (API)
    • Maltego (export & transforms)
  • 🧱 Modular Architecture:

    • Browser-based scraping (Playwright)
    • Proxy rotation
    • CAPTCHA solving (2Captcha, AntiCaptcha)
    • User-Agent spoofing
  • πŸ–₯️ Custom Output:

    • Formats: JSON, CSV, PDF, TXT
    • Slack/Email alerts
  • ⚑ FastAPI Backend:

    • REST API for remote or frontend access

πŸ“¦ Installation

  1. Clone the repo
git clone https://github.com/yourusername/osint-harvester.git
cd osint-harvester
  1. Install dependencies
pip install -r requirements.txt
  1. Install Playwright browsers
playwright install

πŸ” Environment Setup

Create a .env file in the root directory with the following:

VT_API_KEY=your_virustotal_key
ABUSEIPDB_API_KEY=your_abuseipdb_key
SPIDERFOOT_API_KEY=your_spiderfoot_key
FOFA_EMAIL=your_email
FOFA_KEY=your_fofa_key
SHODAN_API_KEY=your_shodan_key
CENSYS_API_ID=your_censys_id
CENSYS_API_SECRET=your_censys_secret
2CAPTCHA_API_KEY=your_2captcha_key

πŸ§ͺ CLI Usage

python harvest_tool.py --domain example.com --sources google,bing,crtsh --output json

CLI Options

Option Description
--domain Target domain (e.g., example.com)
--sources Comma-separated list of sources
--output Output format: json, csv, pdf, txt
--proxy Enable proxy rotation
--headless Run in headless browser mode

🌐 FastAPI Backend

Start the backend:

uvicorn backend.main:app --reload

Sample API Request (POST /api/scan)

{
  "domain": "example.com",
  "sources": ["google", "crtsh"],
  "use_virustotal": true,
  "use_abuseipdb": true,
  "output_format": "json"
}

πŸ“€ Output Options

  • Console display via rich

  • JSON, CSV, PDF, or TXT files

  • Slack/Email alerts (optional)

πŸ” Supported Sources

  • Search Engines

    • Google
    • Bing
    • Yahoo
    • DuckDuckGo
  • DNS & Certificates

    • crt.sh
    • DNSDumpster
    • Censys
  • APIs & Directories

    • FOFA
    • Shodan
    • ZoomEye

πŸ”„ Integration Support

Tool Support Type
SpiderFoot API integration
Maltego Export & transform
VirusTotal Domain/IP scoring
AbuseIPDB IP abuse lookup

πŸ›‘οΈ CAPTCHA & Redirects

  • JavaScript redirect handling
  • CAPTCHA solving:
    • βœ… 2Captcha
    • βœ… AntiCaptcha
  • Headless Playwright emulation
  • Modern browser support (Chrome, Brave, Arc)

πŸ”§ Developer Tools

  • Proxy support
  • .env config manager
  • Modular plugins
  • Graceful error handling
  • Extendable data pipelines