Skip to content

This is a simple website crawler for recon part. With colorfull output. And understandable response number.

License

Notifications You must be signed in to change notification settings

0warn/WEB-CRAWLER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🌐 WEB-CRAWLER

An elite, asynchronous, high-performance web crawler built with Python — tailored for penetration testers, bug bounty hunters, and advanced recon workflows. It intelligently discovers hidden files, directories, keywords, and misconfigurations across large websites with speed and stealth.


⚙️ Features

  • ✅ Asynchronous & concurrent crawling (built with asyncio & aiohttp)
  • 🧠 Smart scope control (same domain, subdomains, or full)
  • 🔍 Keyword/Regex match on URLs, titles, and HTML
  • 📁 Scans sensitive paths & files (e.g. /.env, /admin, /.git)
  • 📜 Optional robots.txt ignoring (for authorized scans)
  • 🔁 User-agent rotation, delays, retries & URL deduplication
  • 📄 Exports results in CSV + JSON (with status, title, match info)
  • 🧪 Designed for offensive security use in CI or CLI pipelines
  • ☠️ Graceful Ctrl+C stop (saves partial data)

🚀 Quick Start

  1. Clone the repository:
git clone https://github.com/0warn/WEB-CRAWLER.git
cd WEB-CRAWLER
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the crawler:
python3 elite_crawler.py https://example.com \
  --max-depth 3 \
  --output scan_results \
  --keywords admin login password \
  --concurrency 20 \
  --delay 0.3 \
  --url-budget 1000

📌 Command-Line Options

Argument Description
url Target base URL to crawl
--max-depth How deep to follow links (default: 3)
--output Output file prefix (e.g., results → results.csv, .json)
--keywords Keywords or regex to trigger alerts (e.g., admin login)
--user-agents Custom user-agent list
--concurrency Number of concurrent requests (default: 15)
--delay Delay between requests (seconds)
--url-budget Stop after N total URLs crawled
--domain-scope Scope control: same, subdomains, or all
--ignore-robots Ignore robots.txt (for authorized targets only)

image

🎯 Example Use Cases

  • ✅ Recon and discovery in bug bounty programs
  • 🔍 Internal web asset scanning for exposed secrets
  • 🛡️ Red teaming infrastructure enumeration
  • 🧪 CI-integrated automated recon scans

🛠 Tech Stack

  • 🐍 Python 3.8+
  • ⚙️ aiohttp / asyncio – fast, async HTTP client
  • 🧠 BeautifulSoup – HTML parsing
  • 🎨 Colorama – CLI colors
  • 🧰 argparse – CLI argument parsing

💡 Tips

  • Set User-Agent strings that match your engagement type.
  • Rotate proxies for stealth crawling.
  • Use --url-budget to avoid infinite loops.
  • Use --keywords with regex like 'api_key|password|token' to find secrets.
  • Always scan ethically and with permission.

📦 Output Format

  • results.csv → Contains: url, status, title, hit
  • results.json → Full structured output for integration

Example:

[
  {
    "url": "https://example.com/admin",
    "status": 200,
    "title": "Admin Panel",
    "hit": true
  }
]

🧑‍💻 Developer Setup

Install manually without setup script:

pip install aiohttp beautifulsoup4 colorama

Make executable:

chmod +x elite_crawler.py
./elite_crawler.py https://example.com --max-depth 2 --output scan

🧯 Troubleshooting

Problem Fix
ImportError Re-run pip install requirements.txt
Permission denied chmod +x elite_crawler.py
Blocked URLs Check robots.txt or use --ignore-robots (for authorized scans only)
SSL errors Add --insecure flag (planned) or verify target cert
Nothing crawled Check base URL syntax and depth / scope limits

🤝 Community & Support

For discussions, help, or suggestions:

📬 GitHub Issues → Submit feature requests or bugs


🛡️ Legal & Ethics Notice

This tool is provided for legal penetration testing, bug bounty research, and authorized reconnaissance.

⚠️ Do not scan any target without explicit permission.

Author is not responsible for misuse of this tool.


📄 License

MIT License — feel free to modify, reuse, and contribute.


Made with 🐍 by 0warn

About

This is a simple website crawler for recon part. With colorfull output. And understandable response number.

Topics

Resources

License

Stars

Watchers

Forks