You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🔍 Account Hunter v2 — LLM-Driven Smart Data Extractor
A local-first tool that lets you describe data you're looking for in plain English — and then automatically scans your files to extract it. Built around LLMs + regex, with a clean web UI and REST API.
✨ What Makes It Different
Tool
Approach
Drawback
grep / ripgrep
Fast regex search
You must write the regex
gitleaks / trufflehog
Secret detection
Fixed patterns only
LangChain Extraction
LLM extraction
Web/cloud-oriented
Account Hunter
Natural language → auto-generated rules → local file scan
✅ None — just describe it
🚀 Features
Natural language queries — type "find email:password pairs" and it handles the rest
8 built-in presets — run instantly without any LLM: emails, credentials, API keys, phone numbers, crypto wallets, IPs, URLs, SSH keys
Real-time progress via WebSocket — see files being scanned live
Ruleset preview — inspect and review the LLM-generated regex before scanning
Scan history — every scan is stored in SQLite, browse past results anytime
Multiple export formats — TXT, JSON, CSV
Safe by default — LLM normalization code is sandboxed behind --allow-unsafe-code
Docker support — one-command deployment
🖥️ Web UI
# 1. Install dependencies
pip install -r requirements.txt
# 2. Start the server
python server.py
# → Web UI available at http://localhost:8000# → API docs at http://localhost:8000/docs
Pages
/ — Submit a query or pick a preset, watch live scan progress, view results
/static/history.html — All past scans with stats and export links
/static/scan.html?id=N — Full result detail for a specific scan
💻 CLI (also works without the server)
# Using Ollama (local, free)
python main.py "find all gmail accounts" --scan-dir C:\Downloads
# Using OpenAI
python main.py "find API keys and tokens" \
--provider openai --model gpt-4o --api-key sk-...
# Use a preset (no LLM needed)# Presets are available via the web UI / API# Scan all drives
python main.py "find email:password pairs" --all-drives
# Allow LLM normalization code (disabled by default for safety)
python main.py "find gmail accounts and remove dot aliases" --allow-unsafe-code
🐳 Docker
# Set your scan directory and optional LLM key
cp .env.example .env
# Edit .env: set SCAN_DIR and OPENAI_API_KEY if using OpenAI
docker-compose up --build
# → http://localhost:8000
⚙️ Configuration
Variable
Description
Default
OPENAI_API_KEY
OpenAI API key (for OpenAI provider)
—
SCAN_DIR
Directory to mount in Docker
.
AH_DB_PATH
SQLite database path
hunter_data.db
AH_OUTPUT_DIR
Output directory
output/
PORT
Server port
8000
📡 REST API
Method
Endpoint
Description
GET
/api/presets
List built-in presets
POST
/api/ruleset/preview
Preview LLM ruleset (no scan)
POST
/api/scan
Start a scan
GET
/api/scan/{id}/status
Scan status
GET
/api/scan/{id}/results
Paginated results
GET
/api/scans
All past scans
GET
/api/scan/{id}/export?fmt=txt|json|csv
Download results
WS
/ws/scan/{id}
Real-time progress events
🧪 Testing
# Run the core pipeline integration test (no LLM required)
python test_pipeline.py
# Start dev server with hot-reload
make dev
# Run tests + linting
make test