Skip to content

A Python tool that scrapes URLs for a target from the Wayback Machine, filters for interesting parameters (e.g., ?id=, ?redirect=, etc.), and uses a headless browser (Playwright) to verify them.

License

Notifications You must be signed in to change notification settings

mrdineshpathro/wayback_miner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Wayback Parameter Miner

A Python tool that scrapes URLs for a target from the Wayback Machine, filters for interesting parameters (e.g., ?id=, ?redirect=, etc.), and uses a headless browser (Playwright) to verify them.

Features

  • Wayback Machine Scraping: Fetches all known URLs for a domain.
  • Parameter Filtering: Targets a huge list of sensitive/interesting query parameters (IDORs, Redirects, etc.).
  • Headless Verification: Uses Playwright to visit each URL and capture real HTTP status codes and response lengths, filtering out dead links.
  • Advanced Options: Custom headers, rate limiting, timeouts, and extension filtering.
  • Async Concurrency: Fast verification using asyncio.

🛠️ Installation on Kali Linux

Follow these steps to set up the tool on Kali Linux (or any Debian-based system).

1. Update System & Install Python3-pip

Ensure your system is up to date and has pip installed.

sudo apt update
sudo apt install python3-pip python3-venv -y

2. Set Up a Virtual Environment (Recommended)

Modern Linux distributions require using a virtual environment to install Python packages.

# Create a virtual environment named 'venv'
python3 -m venv venv

# Activate the virtual environment
source venv/bin/activate

(You will see (venv) in your terminal prompt after activation.)

3. Install Dependencies

Install the required Python libraries.

pip install -r requirements.txt

If you don't have a requirements.txt yet, run:

pip install requests playwright colorama

4. Install Playwright Browsers

Playwright needs its own browser binaries to work.

playwright install chromium

If you encounter errors, try running python -m playwright install chromium.


🚀 Usage

Basic command structure:

python wayback_miner.py -d <target_domain> [options]

Arguments

Argument Description
-d, --domain Target domain (e.g., example.com)
-l, --list Path to a file containing a list of domains
-o, --output File to save the results to
-v, --verbose [NEW] Enable verbose logging (shows real-time visiting)
-c, --concurrency Number of concurrent browser tabs (default: 5)
--exclude [NEW] Comma-separated extensions to exclude (e.g., css,png)
--delay [NEW] Random delay in seconds between requests (e.g., 2.0)
--timeout [NEW] Navigation timeout in ms (default: 15000)
--headers [NEW] JSON string of custom headers (e.g., {"Cookie": "..."})
--proxy Proxy URL (e.g., http://127.0.0.1:8080)
--screenshot Directory to save screenshots of valid URLs
--match Regex pattern to match in response body (e.g., admin)

💡 Examples

1. Basic Scan

Quickly scan a domain and save results.

python wayback_miner.py -d tesla.com -o results.txt

2. Verbose & Stealth Scan

Enable verbose mode to see what's happening, add a delay to avoid rate limiting, and exclude images.

python wayback_miner.py -d tesla.com -v --delay 2.0 --exclude png,jpg,css,svg

3. Authenticated Scan (Custom Headers)

Pass a session cookie or API key to access protected endpoints.

python wayback_miner.py -d example.com --headers '{"Cookie": "session=YOUR_SESSION_ID", "Authorization": "Bearer TOKEN"}'

4. Proxy Scan (Burp Suite / Tor)

Route traffic through a local proxy.

python wayback_miner.py -d example.com --proxy http://127.0.0.1:8080 --json

5. Content Matching (Hunt for Secrets)

Look for specific keywords in the response body (e.g., "password", "admin", "error").

python wayback_miner.py -d example.com --match "admin|password|syntax error" -o sensitive.txt

Output Format

The console output is color-coded. The file output (if specified) follows this format:

200 | 15302 | https://example.com/page?id=123
403 | 502 | https://example.com/admin?redirect=login
...

(Status Code | Response Length | URL)

About

A Python tool that scrapes URLs for a target from the Wayback Machine, filters for interesting parameters (e.g., ?id=, ?redirect=, etc.), and uses a headless browser (Playwright) to verify them.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages