Vestigo: Firmware analysis & crypto-detection pipeline

Vestigo is a collection of tools, scripts and services to automate the process of (1) producing cross-compiled test binaries, (2) statically and dynamically analyzing firmware/binaries, (3) extracting ML-ready features, and (4) producing datasets and inference results for cryptographic-function detection. The repo combines headless Ghidra-based extraction, Qiling-based dynamic tracing, a dataset generation pipeline (including optional LLM assisted labeling), and a small backend + frontend for web access.

This README gives a concise, practical overview and quickstart so you can get the pipeline running and contribute.

Key project goals

Extract function-level and trace-level features suitable for ML
Provide utilities for static (Ghidra) and dynamic (Qiling) analysis
Offer scripts to build training CSVs and run inference
Provide a backend API and frontend for file upload and analysis

Quick facts / highlights

Languages: Python (main tooling & backend), TypeScript/React frontend
Major folders: ghidra_scripts, qiling_analysis, ml, backend, frontend
Important entry points:
- generate_dataset.py — create ML CSVs from Ghidra JSONs
  - analyzer.py, bare_metal.py, main.py — orchestrate analysis flows
  - factory/builder.py — cross-compile sources across arch/opt matrix
  - qiling_analysis/ — dynamic tracing & batch extraction pipeline
  - backend/ — FastAPI backend with analysis endpoints

Quick Setup

1. Automated Installation

./setup.sh
source activate_vestigo.sh

What gets installed:

Python environment with all dependencies (FastAPI, Qiling, ML libraries)
Ghidra headless analyzer (/opt/ghidra)
Qiling framework + rootfs
Cross-compiler toolchains (ARM, MIPS, AArch64)
Container runtime (Podman/Docker)

Options: --minimal | --skip-ghidra | --skip-ml | --help

2. Manual Steps Required

Frontend (Node.js 18+):

# Install Node.js for your OS, then:
cd frontend && npm install && cd ..

Database (PostgreSQL):

# Option A: Local
sudo apt install postgresql && sudo -u postgres createdb vestigo

# Option B: Cloud (https://neon.tech - recommended)
# Get connection string and add to .env

Configure .env:

DATABASE_URL=postgresql://user:pass@host:5432/vestigo
OPENAI_API_KEY=sk-your-key-here  # Get from platform.openai.com

Initialize Database:

cd backend && prisma db push && prisma generate && cd ..

Usage

Always activate environment first: source activate_vestigo.sh

Static Analysis (Ghidra)

python3 scripts/analyzer.py <binary>

Dynamic Analysis (Qiling)

python3 qiling_analysis/tests/verify_crypto.py <binary>

Generate ML Dataset

python3 scripts/generate_dataset.py --input-dir ghidra_output --output dataset.csv

Batch Processing

python3 qiling_analysis/batch_extract_features.py \
    --dataset-dir ./dataset_binaries --output-dir ./results --parallel 4

Cross-Compile Binaries

python3 factory/builder.py --source algorithm.c

LLM Crypto Analysis

python3 qiling_analysis/tests/llm/crypto_deep_analyzer.py --strace trace.log --output analysis.json

Run Web Interface

# Backend (terminal 1)
cd backend && uvicorn main:app --reload

# Frontend (terminal 2)
cd frontend && npm run dev

Project Structure

vestigo-data/
├── setup.sh                 # Automated installation
├── activate_vestigo.sh      # Environment activation
├── backend/                 # FastAPI server
├── frontend/                # React UI
├── factory/                 # Cross-compilation tools
├── ghidra_scripts/          # Ghidra analysis scripts
├── qiling_analysis/         # Dynamic tracing pipeline
├── ml/                      # ML models and training
├── scripts/                 # Analysis orchestration
└── dataset_binaries/        # Sample binaries

Key Scripts:

scripts/analyzer.py - Ghidra static analysis
scripts/generate_dataset.py - Create ML datasets
qiling_analysis/tests/verify_crypto.py - Dynamic analysis
factory/builder.py - Cross-compilation

Troubleshooting

Issue	Solution
Virtual environment not found	Run `./setup.sh`
Import errors	`pip install -r requirements.txt -r backend/requirements.txt`
Qiling rootfs missing	`git clone --depth 1 https://github.com/qilingframework/rootfs.git qiling_analysis/rootfs`
Ghidra not found	Set `export GHIDRA_HOME=/opt/ghidra`
Database errors	Check `DATABASE_URL` in `.env`, run `prisma generate`
OpenAI quota exceeded	Check billing at platform.openai.com
Frontend won't start	`cd frontend && rm -rf node_modules && npm install`

System Requirements

OS: Ubuntu/Debian, Fedora/RHEL, Arch, macOS
RAM: 8GB min, 16GB recommended
Disk: ~10GB
Python: 3.9+ (3.11 recommended)
Node.js: 18+ (for frontend)

Documentation

qiling_analysis/QUICKSTART_GUIDE.md - Dynamic analysis guide
CONTRIBUTING.md - Contribution guidelines

License

Apache-2.0 - See LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.github		.github
analysis		analysis
backend		backend
docs		docs
dynamic_analysis		dynamic_analysis
factory		factory
frontend		frontend
ghidra_scripts		ghidra_scripts
gnn_output		gnn_output
ml		ml
protocol_analysis		protocol_analysis
qiling_analysis		qiling_analysis
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Containerfile		Containerfile
Dockerfile		Dockerfile
Dockerfile.frontend		Dockerfile.frontend
GSOC-2026.md		GSOC-2026.md
LICENSE		LICENSE
README.md		README.md
activate_vestigo.sh		activate_vestigo.sh
requirements.txt		requirements.txt
requirements_production.txt		requirements_production.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vestigo: Firmware analysis & crypto-detection pipeline

Key project goals

Quick facts / highlights

Quick Setup

1. Automated Installation

2. Manual Steps Required

Usage

Static Analysis (Ghidra)

Dynamic Analysis (Qiling)

Generate ML Dataset

Batch Processing

Cross-Compile Binaries

LLM Crypto Analysis

Run Web Interface

Project Structure

Troubleshooting

System Requirements

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vestigo: Firmware analysis & crypto-detection pipeline

Key project goals

Quick facts / highlights

Quick Setup

1. Automated Installation

2. Manual Steps Required

Usage

Static Analysis (Ghidra)

Dynamic Analysis (Qiling)

Generate ML Dataset

Batch Processing

Cross-Compile Binaries

LLM Crypto Analysis

Run Web Interface

Project Structure

Troubleshooting

System Requirements

Documentation

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages