A Python tool that scrapes URLs for a target from the Wayback Machine, filters for interesting parameters (e.g., ?id=, ?redirect=, etc.), and uses a headless browser (Playwright) to verify them.
- Wayback Machine Scraping: Fetches all known URLs for a domain.
- Parameter Filtering: Targets a huge list of sensitive/interesting query parameters (IDORs, Redirects, etc.).
- Headless Verification: Uses Playwright to visit each URL and capture real HTTP status codes and response lengths, filtering out dead links.
- Advanced Options: Custom headers, rate limiting, timeouts, and extension filtering.
- Async Concurrency: Fast verification using
asyncio.
Follow these steps to set up the tool on Kali Linux (or any Debian-based system).
Ensure your system is up to date and has pip installed.
sudo apt update
sudo apt install python3-pip python3-venv -yModern Linux distributions require using a virtual environment to install Python packages.
# Create a virtual environment named 'venv'
python3 -m venv venv
# Activate the virtual environment
source venv/bin/activate(You will see (venv) in your terminal prompt after activation.)
Install the required Python libraries.
pip install -r requirements.txtIf you don't have a requirements.txt yet, run:
pip install requests playwright coloramaPlaywright needs its own browser binaries to work.
playwright install chromiumIf you encounter errors, try running python -m playwright install chromium.
Basic command structure:
python wayback_miner.py -d <target_domain> [options]| Argument | Description |
|---|---|
-d, --domain |
Target domain (e.g., example.com) |
-l, --list |
Path to a file containing a list of domains |
-o, --output |
File to save the results to |
-v, --verbose |
[NEW] Enable verbose logging (shows real-time visiting) |
-c, --concurrency |
Number of concurrent browser tabs (default: 5) |
--exclude |
[NEW] Comma-separated extensions to exclude (e.g., css,png) |
--delay |
[NEW] Random delay in seconds between requests (e.g., 2.0) |
--timeout |
[NEW] Navigation timeout in ms (default: 15000) |
--headers |
[NEW] JSON string of custom headers (e.g., {"Cookie": "..."}) |
--proxy |
Proxy URL (e.g., http://127.0.0.1:8080) |
--screenshot |
Directory to save screenshots of valid URLs |
--match |
Regex pattern to match in response body (e.g., admin) |
Quickly scan a domain and save results.
python wayback_miner.py -d tesla.com -o results.txtEnable verbose mode to see what's happening, add a delay to avoid rate limiting, and exclude images.
python wayback_miner.py -d tesla.com -v --delay 2.0 --exclude png,jpg,css,svgPass a session cookie or API key to access protected endpoints.
python wayback_miner.py -d example.com --headers '{"Cookie": "session=YOUR_SESSION_ID", "Authorization": "Bearer TOKEN"}'Route traffic through a local proxy.
python wayback_miner.py -d example.com --proxy http://127.0.0.1:8080 --jsonLook for specific keywords in the response body (e.g., "password", "admin", "error").
python wayback_miner.py -d example.com --match "admin|password|syntax error" -o sensitive.txtThe console output is color-coded. The file output (if specified) follows this format:
200 | 15302 | https://example.com/page?id=123
403 | 502 | https://example.com/admin?redirect=login
...
(Status Code | Response Length | URL)