An elite, asynchronous, high-performance web crawler built with Python — tailored for penetration testers, bug bounty hunters, and advanced recon workflows. It intelligently discovers hidden files, directories, keywords, and misconfigurations across large websites with speed and stealth.
- ✅ Asynchronous & concurrent crawling (built with asyncio & aiohttp)
- 🧠 Smart scope control (same domain, subdomains, or full)
- 🔍 Keyword/Regex match on URLs, titles, and HTML
- 📁 Scans sensitive paths & files (e.g. /.env, /admin, /.git)
- 📜 Optional robots.txt ignoring (for authorized scans)
- 🔁 User-agent rotation, delays, retries & URL deduplication
- 📄 Exports results in CSV + JSON (with status, title, match info)
- 🧪 Designed for offensive security use in CI or CLI pipelines
- ☠️ Graceful Ctrl+C stop (saves partial data)
- Clone the repository:
git clone https://github.com/0warn/WEB-CRAWLER.git
cd WEB-CRAWLER- Install dependencies:
pip install -r requirements.txt- Run the crawler:
python3 elite_crawler.py https://example.com \
--max-depth 3 \
--output scan_results \
--keywords admin login password \
--concurrency 20 \
--delay 0.3 \
--url-budget 1000| Argument | Description |
|---|---|
| url | Target base URL to crawl |
| --max-depth | How deep to follow links (default: 3) |
| --output | Output file prefix (e.g., results → results.csv, .json) |
| --keywords | Keywords or regex to trigger alerts (e.g., admin login) |
| --user-agents | Custom user-agent list |
| --concurrency | Number of concurrent requests (default: 15) |
| --delay | Delay between requests (seconds) |
| --url-budget | Stop after N total URLs crawled |
| --domain-scope | Scope control: same, subdomains, or all |
| --ignore-robots | Ignore robots.txt (for authorized targets only) |
- ✅ Recon and discovery in bug bounty programs
- 🔍 Internal web asset scanning for exposed secrets
- 🛡️ Red teaming infrastructure enumeration
- 🧪 CI-integrated automated recon scans
- 🐍 Python 3.8+
- ⚙️ aiohttp / asyncio – fast, async HTTP client
- 🧠 BeautifulSoup – HTML parsing
- 🎨 Colorama – CLI colors
- 🧰 argparse – CLI argument parsing
- Set User-Agent strings that match your engagement type.
- Rotate proxies for stealth crawling.
- Use --url-budget to avoid infinite loops.
- Use --keywords with regex like 'api_key|password|token' to find secrets.
- Always scan ethically and with permission.
- results.csv → Contains: url, status, title, hit
- results.json → Full structured output for integration
Example:
[
{
"url": "https://example.com/admin",
"status": 200,
"title": "Admin Panel",
"hit": true
}
]Install manually without setup script:
pip install aiohttp beautifulsoup4 coloramaMake executable:
chmod +x elite_crawler.py
./elite_crawler.py https://example.com --max-depth 2 --output scan| Problem | Fix |
|---|---|
| ImportError | Re-run pip install requirements.txt |
| Permission denied | chmod +x elite_crawler.py |
| Blocked URLs | Check robots.txt or use --ignore-robots (for authorized scans only) |
| SSL errors | Add --insecure flag (planned) or verify target cert |
| Nothing crawled | Check base URL syntax and depth / scope limits |
For discussions, help, or suggestions:
📬 GitHub Issues → Submit feature requests or bugs
This tool is provided for legal penetration testing, bug bounty research, and authorized reconnaissance.
Author is not responsible for misuse of this tool.
MIT License — feel free to modify, reuse, and contribute.
Made with 🐍 by 0warn