SecFlow — Analyzers

This document describes each of SecFlow's five analyzer microservices: their purpose, Docker service names, real API endpoints, tools used, and output contracts.

Common Interface

Each analyzer is an independent Docker microservice. The Orchestrator never imports analyzer code — it always calls via HTTP over the secflow-net Docker bridge. All containers listen on port 5000 internally.

Analyzer	Docker service	Host port	Request format
Malware	`malware-analyzer`	5001	`multipart/form-data` file
Steganography	`steg-analyzer`	5002	`multipart/form-data` file (async)
Reconnaissance	`recon-analyzer`	5003	JSON `{"query": "..."}`
Web Vulnerability	`web-analyzer`	5005	JSON `{"url": "..."}`
Macro / Office	`macro-analyzer`	5006	`multipart/form-data` file

Each service returns its own native JSON. An adapter inside the Orchestrator (orchestrator/app/adapters/<name>_adapter.py) translates that into the SecFlow contract:

{
    "analyzer":   str,         # "malware" | "steg" | "recon" | "web" | "macro"
    "pass":       int,         # 1-indexed loop pass number
    "input":      str,         # the exact value passed in
    "findings":   list[dict],  # normalised finding objects
    "risk_score": float,       # aggregate risk for this pass, 0.0–10.0
    "raw_output": str          # full text output (AI reads this for IOC extraction)
}

Each finding object:

{
    "type":     str,   # finding type string (e.g. "malware_detection", "av_detection")
    "detail":   str,   # human-readable description
    "severity": str,   # "info" | "low" | "medium" | "high" | "critical"
    "evidence": str,   # raw evidence — rendered intelligently in the HTML report
}

Analyzer services must never crash the Orchestrator. The adapter wraps all HTTP calls in try/except and returns an error-shaped finding dict if the service is unreachable.

1. Malware Analyzer

Source: backend/Malware-Analyzer/
Docker service: malware-analyzer
Host port: 5001 → container port 5000
Base image: eclipse-temurin:21-jdk-jammy (JDK 21 required for Ghidra JVM)
Adapter: orchestrator/app/adapters/malware_adapter.py

Real Endpoints

Method	Route	Timeout	Purpose
`GET`	`/api/malware-analyzer/health`	—	Health check
`POST`	`/api/malware-analyzer/file-analysis`	60s	VirusTotal API v3 lookup
`POST`	`/api/malware-analyzer/decompile`	180s	Ghidra decompile + objdump -d
`POST`	`/api/malware-analyzer/ai-summary`	—	Gemini narrative (internal, not used by orchestrator)

There is no bare POST /api/malware-analyzer/ route.

How the Orchestrator Calls It

# Call 1 — VirusTotal threat intel
requests.post(f"{_MALWARE_BASE}/file-analysis", files={"file": open(path, "rb")}, timeout=60)

# Call 2 — Ghidra decompile (slow — JVM + full analysis)
requests.post(f"{_MALWARE_BASE}/decompile", files={"file": open(path, "rb")}, timeout=180)

# Merged before adapter:
raw = {"vt": <file-analysis resp>, "decompile": <decompile resp>}

Analysis Tools

Tool	Purpose
`pyghidra` + Ghidra 12.0.1	Full decompilation, auto-analysis of all binary functions
`objdump -d`	Assembly-level disassembly
VirusTotal API v3	70+ AV engine detections, behavioral tags, file stats

Supported Extensions

exe, dll, so, elf, bin, o, out — other extensions return HTTP 400.

Required Env Vars

VIRUSTOTAL_API_KEY — required for /file-analysis
GEMINI_API_KEY — only needed for /ai-summary and /diagram-generator

Finding Types Generated by Adapter

Finding type	Severity	Description
`malware_detection`	critical/high/info	VT detection stats
`av_detection`	high/medium	Individual AV engine results
`malware_clean`	info	No VT detections
`decompile_result`	medium/info	Ghidra decompiled code
`suspicious_string`	high	URL/IP/C2 found in decompile

2. Steganography Analyzer

Source: backend/Steg-Analyzer/
Docker service: steg-analyzer
Host port: 5002 → container port 5000
Adapter: orchestrator/app/adapters/steg_adapter.py

Real Endpoints

Method	Route	Purpose
`POST`	`/api/steg-analyzer/upload`	Submit file, returns `{hash}`
`GET`	`/api/steg-analyzer/status/{hash}`	Poll analysis status
`GET`	`/api/steg-analyzer/result/{hash}`	Fetch final results

How the Orchestrator Calls It

The steg analyzer is asynchronous — upload, then poll:

# Step 1 — upload
r = requests.post(f"{_STEG_BASE}/upload", files={"file": open(path, "rb")})
hash_ = r.json()["hash"]

# Step 2 — poll until done
while True:
    r = requests.get(f"{_STEG_BASE}/status/{hash_}", timeout=10)
    if r.json()["status"] == "done":
        break
    time.sleep(2)

# Step 3 — fetch results
r = requests.get(f"{_STEG_BASE}/result/{hash_}", timeout=30)

Analysis Tools

LSB analysis (pixel-level encoding detection)
binwalk — file carving and embedded file extraction
zsteg (PNG) — steganography detection
steghide (JPEG/BMP) — extraction
ExifTool — metadata inspection
outguess, pngcheck, graphicsmagick

Dependencies

The Steg Analyzer runs with PostgreSQL (steg-postgres) + Redis (steg-redis) + RQ worker (steg-worker) for async job queuing.

3. Reconnaissance Analyzer

Source: backend/Recon-Analyzer/src/
Docker service: recon-analyzer
Host port: 5003 → container port 5000
API prefix: /api/Recon-Analyzer (capital R and A — exact) Adapter: orchestrator/app/adapters/recon_adapter.py

Real Endpoints

Method	Route	Purpose	Input
`GET`	`/api/Recon-Analyzer/health`	Health check	—
`POST`	`/api/Recon-Analyzer/scan`	IP/domain threat intel	`{"query": "ip_or_domain"}`
`POST`	`/api/Recon-Analyzer/footprint`	Email/phone/username OSINT	`{"query": "email"}`

The request body key is query, not target.

How the Orchestrator Calls It

# IP or domain
requests.post(f"{_RECON_BASE}/scan", json={"query": ip_or_domain}, timeout=60)

# Email / phone / username OSINT (when AI chains from macro IOCs)
requests.post(f"{_RECON_BASE}/footprint", json={"query": email_or_username}, timeout=60)

Analysis Modules (wired into main.py)

Module	What it checks
`ipapi.py`	Country, ISP, ASN, city, timezone via ip-api.com
`talos.py`	Cisco Talos IP blocklist (local `talos.txt`, auto-downloaded)
`tor.py`	Tor exit node list (local `tor.txt`, auto-downloaded)
`tranco.py`	Tranco domain ranking (domains only)
`threatfox.py`	ThreatFox IOC lookup — malware family, confidence (domains only)
`xposedornot.py`	Email breach check (email footprint)
`phone.py`	NumVerify phone validation (phone footprint)
`username.py`	Sagemode multi-site username OSINT (username footprint)

Supported Input Auto-Detection

Valid IPv4 regex → runs ipapi + talos + tor
Valid domain regex → resolves IP, runs ipapi + talos + tor + tranco + threatfox
Email regex → footprint: xposedornot breach check
Phone regex → footprint: NumVerify
Else → footprint: Sagemode username OSINT

Required Env Vars (optional)

NUMVERIFY_API_KEY — phone validation
THREATFOX_API_KEY — higher rate limit
ipAPI_KEY — ip-api.com Pro

4. Web Vulnerability Analyzer

Source: backend/Web-Analyzer/
Docker service: web-analyzer
Host port: 5005 → container port 5000
Adapter: orchestrator/app/adapters/web_adapter.py

Real Endpoint

Method	Route	Input
`POST`	`/api/web-analyzer/`	JSON `{"url": "https://..."}`

Analysis Capabilities

HTTP response analysis (status code, headers, redirect chain)
Security header audit (CSP, HSTS, X-Frame-Options, X-Content-Type-Options, etc.)
Technology fingerprinting
Basic vulnerability scanning

Required Env Vars

GEMINI_API_KEY — used internally for enhanced analysis (optional)

5. Macro / Office Analyzer

Source: backend/macro-analyzer/
Docker service: macro-analyzer
Host port: 5006 → container port 5000
Adapter: orchestrator/app/adapters/macro_adapter.py

Real Endpoints

Method	Route	Purpose
`GET`	`/api/macro-analyzer/health`	Health check
`POST`	`/api/macro-analyzer/analyze`	Full VBA + VirusTotal analysis

How the Orchestrator Calls It

requests.post(f"{_MACRO_BASE}/analyze",
              files={"file": (original_name, open(path, "rb"))},
              timeout=60)

Supported File Types

.doc, .docx, .xls, .xlsx, .xlsm, .xlsb, .ppt, .pptx, .pptm, .rtf, .docm

Analysis Tools

Tool	Purpose
`oletools` / `olevba`	VBA macro extraction, indicator analysis, IOC extraction
VirusTotal API v3	SHA-256 hash lookup → upload → poll for analysis results

olevba Indicator Categories

Category	Severity	Meaning
`AutoExec`	critical	Macro runs automatically on open/close
`Suspicious`	high	Suspicious API calls (Shell, CreateObject, etc.)
`IOC`	high	Embedded URLs, IPs, file paths
`Hex String`	medium	Hex-encoded obfuscated content
`Base64 String`	medium	Base64-obfuscated content
`Dridex String`	critical	Dridex banking trojan string encoding

Risk Level Mapping

olevba risk_level	Risk score	Condition
`malicious`	9.5 (base)	AutoExec + Suspicious flags both present
`suspicious`	6.5 (base)	Suspicious or IOC or obfuscated
`macro_present`	3.0	Macros found, no suspicious flags
`clean`	0.5	No macros

If VirusTotal confirms malicious hits, risk score is raised: 1+ detections → max(base, 7.0); 5+ → max(base, 9.5).

Finding Types Generated by Adapter

Finding type	Description
`macro_malicious` / `macro_suspicious` / `macro_present`	Overall VBA verdict
`macro_indicator_autoexec`	AutoExec indicators
`macro_indicator_suspicious`	Suspicious API calls
`macro_indicator_ioc`	Extracted IOCs
`macro_ioc`	IOC chip list (enables AI to chain to recon/web)
`macro_source`	Full VBA source (collapsible in report)
`macro_xlm`	Excel 4 (XLM) deobfuscated macros
`malware_detection`	VirusTotal stats table
`av_detection`	Per-engine AV detection (up to 10)
`payload_downloaded`	Always shown when file was fetched from a URL

Each service returns its own native JSON. An adapter inside the Orchestrator (orchestrator/app/adapters/<name>_adapter.py) translates that into the SecFlow contract:

{
    "analyzer": str,         # "malware" | "steg" | "recon" | "url" | "web"
    "pass": int,             # 1-indexed loop pass number
    "input": str,            # the exact value passed in
    "findings": list[dict],  # see per-analyzer finding format below
    "risk_score": float,     # aggregate risk for this pass, 0.0–10.0
    "raw_output": str        # concatenated raw tool output (for AI consumption)
}

Analyzer services must never crash the Orchestrator. The adapter must wrap the HTTP call in try/except and return an error-shaped finding dict if the service is unreachable or returns a non-200 response.

1. Malware Analyzer

Service: backend/malware-analyzer/ — POST http://malware-analyzer:5001/api/malware-analyzer/ Adapter: orchestrator/app/adapters/malware_adapter.py

Purpose

Detect malicious characteristics in executables, PE binaries, and extracted binary payloads.

Accepted Input

File path to: .exe, .dll, .bin, .elf, extracted payload from another analyzer pass

Analysis Techniques

Technique	Description
File hashing	Compute MD5, SHA1, SHA256
YARA scanning	Match bundled YARA rule set
PE header analysis	Parse PE sections, imports, exports, timestamps
String extraction	Extract printable strings; flag suspicious patterns (URLs, IPs, registry keys, API names)
Entropy analysis	High entropy sections → possible packing/encryption
(Optional) VirusTotal	Hash lookup via VT API if key is configured

Finding Object Format

{
    "type": "signature_match" | "suspicious_string" | "pe_metadata" | "hash" | "entropy" | "error",
    "detail": str,     # human-readable description
    "severity": "low" | "medium" | "high" | "critical",
    "evidence": str    # raw evidence snippet
}

Example Findings

[
  { "type": "hash", "detail": "SHA256: abc123...", "severity": "info", "evidence": "" },
  { "type": "signature_match", "detail": "YARA rule: Trojan.GenericKDZ matched", "severity": "critical", "evidence": "offset 0x200" },
  { "type": "suspicious_string", "detail": "HTTP callout found", "severity": "high", "evidence": "http://192.168.1.100/beacon" }
]

Planned Libraries

yara-python — YARA rule matching
pefile — PE binary parsing
hashlib — File hashing (stdlib)
strings (system) or regex — String extraction

2. Steganography Analyzer

Service: backend/steg-analyzer/ — POST http://steg-analyzer:5002/api/steg-analyzer/ Adapter: orchestrator/app/adapters/steg_adapter.py

Purpose

Detect and extract hidden data embedded within image files using steganographic or watermarking techniques.

Accepted Input

File path to: .png, .jpg, .jpeg, .bmp, .gif, .tiff

Analysis Techniques

Technique	Description
LSB analysis	Detect least-significant-bit encoding in pixel data
Metadata inspection	ExifTool — check for hidden data in EXIF/IPTC/XMP
Embedded file extraction	binwalk — detect and extract appended/embedded files
Tool-based detection	zsteg (PNG), stegdetect (JPEG), steghide (JPEG/BMP)
Strings scan	Run strings on the image binary, flag suspicious patterns

Finding Object Format

{
    "type": "embedded_file" | "lsb_data" | "metadata_anomaly" | "suspicious_string" | "error",
    "detail": str,
    "severity": "low" | "medium" | "high" | "critical",
    "evidence": str,
    "extracted_path": str | None   # path to extracted file if applicable
}

Example Findings

[
  { "type": "embedded_file", "detail": "binwalk found embedded PE binary", "severity": "critical", "evidence": "offset 0x8200", "extracted_path": "/tmp/secflow/extracted/steg_payload.exe" },
  { "type": "metadata_anomaly", "detail": "EXIF GPS data present", "severity": "low", "evidence": "GPS: 37.7749,-122.4194", "extracted_path": null }
]

Planned Tools/Libraries

binwalk (system) — File carving, embedded file extraction
zsteg (system/gem) — PNG steg detection
stegdetect (system) — JPEG steg detection
steghide (system) — Steghide extraction
pyexiftool or exiftool (system) — Metadata inspection
Pillow — Image loading and pixel-level analysis

3. Reconnaissance Analyzer

Service: backend/recon-analyzer/ — POST http://recon-analyzer:5003/api/recon-analyzer/ Adapter: orchestrator/app/adapters/recon_adapter.py

Purpose

Gather OSINT and infrastructure intelligence on IPs, domains, and hostnames.

Accepted Input

IP address string (e.g., "192.168.1.100")
Domain or hostname string (e.g., "evil.example.com")

Analysis Techniques

Technique	Description
WHOIS lookup	Registrant, registrar, creation/expiry dates
DNS records	A, AAAA, MX, NS, TXT, CNAME records
Reverse DNS	PTR record lookup
Port scanning	Top ports scan via nmap
Geolocation	Country, ASN, ISP
Threat intel	Shodan lookup (optional), AbuseIPDB (optional)
Certificate info	TLS cert subjects and SANs (for domains)

Finding Object Format

{
    "type": "whois" | "dns" | "port" | "geolocation" | "threat_intel" | "cert" | "error",
    "detail": str,
    "severity": "info" | "low" | "medium" | "high" | "critical",
    "evidence": str
}

Example Findings

[
  { "type": "port", "detail": "Open ports detected", "severity": "medium", "evidence": "22/tcp open ssh, 80/tcp open http, 443/tcp open https" },
  { "type": "threat_intel", "detail": "IP found in Shodan with malware tag", "severity": "critical", "evidence": "tags: malware, c2" },
  { "type": "whois", "detail": "Domain registered 2 days ago", "severity": "high", "evidence": "created: 2026-03-04" }
]

Planned Libraries/Tools

python-whois — WHOIS lookups
dnspython — DNS queries
nmap (system) + python-nmap — Port scanning
shodan — Shodan API (optional; requires SHODAN_API_KEY)
requests — AbuseIPDB / threat intel APIs
socket — Reverse DNS

4. Web Vulnerability Analyzer

Service: backend/web-analyzer/ — POST http://web-analyzer:5005/api/web-analyzer/ Adapter: orchestrator/app/adapters/web_adapter.py

Purpose

Analyze URLs and web endpoints for vulnerabilities, misconfigurations, and security weaknesses.

Accepted Input

Full URL string (e.g., "http://192.168.1.100/beacon", "https://example.com/login")

Analysis Techniques

Technique	Description
HTTP response analysis	Status code, response headers, redirect chain
Security header audit	Check for missing CSP, HSTS, X-Frame-Options, etc.
Technology fingerprinting	Identify server, framework, CMS versions
Cookie security	Inspect Secure, HttpOnly, SameSite flags
Basic vuln scanning	nuclei (optional), common path probing
TLS/SSL inspection	Certificate validity, weak ciphers
URL reputation	VirusTotal URL scan (optional)

Finding Object Format

{
    "type": "missing_header" | "vuln" | "tech_fingerprint" | "tls_issue" | "redirect" | "cookie" | "error",
    "detail": str,
    "severity": "info" | "low" | "medium" | "high" | "critical",
    "evidence": str
}

Example Findings

[
  { "type": "missing_header", "detail": "Content-Security-Policy header absent", "severity": "medium", "evidence": "" },
  { "type": "tech_fingerprint", "detail": "Apache 2.4.49 detected (known CVE)", "severity": "critical", "evidence": "Server: Apache/2.4.49" },
  { "type": "tls_issue", "detail": "TLS 1.0 supported (deprecated)", "severity": "high", "evidence": "TLSv1.0 cipher accepted" }
]

Planned Libraries/Tools

requests — HTTP requests and response analysis
Wappalyzer (or builtwith) — Technology fingerprinting
nuclei (system, optional) — Template-based vuln scanning
sslyze or ssl (stdlib) — TLS/SSL analysis
urllib (stdlib) — URL parsing

Risk Score Calculation

Each analyzer computes a risk_score (0.0–10.0) for the pass based on the severity distribution of its findings:

Severity	Weight
`critical`	4.0
`high`	2.5
`medium`	1.0
`low`	0.3
`info`	0.0

Score = min(10.0, sum of severity weights)

The Report Generator computes an overall risk score as the maximum risk score observed across all passes.

Adding a New Analyzer

Create a new Docker service directory under backend/<name>-analyzer/ with its own Dockerfile and requirements.txt.
Add the service to backend/compose.yml on the secflow-net network.
Create orchestrator/app/adapters/<name>_adapter.py to translate the service's native response into the SecFlow contract.
Add the analyzer name to the routing rules in orchestrator/app/classifier/rules.py.
Add the analyzer name to the available tools list in orchestrator/app/ai/engine.py.
Document the service and its endpoint in this file.

FilesExpand file tree

analyzers.md

Latest commit

History

analyzers.md

File metadata and controls

SecFlow — Analyzers

Common Interface

1. Malware Analyzer

Real Endpoints

How the Orchestrator Calls It

Analysis Tools

Supported Extensions

Required Env Vars

Finding Types Generated by Adapter

2. Steganography Analyzer

Real Endpoints

How the Orchestrator Calls It

Analysis Tools

Dependencies

3. Reconnaissance Analyzer

Real Endpoints

How the Orchestrator Calls It

Analysis Modules (wired into main.py)

Supported Input Auto-Detection

Required Env Vars (optional)

4. Web Vulnerability Analyzer

Real Endpoint

Analysis Capabilities

Required Env Vars

5. Macro / Office Analyzer

Real Endpoints

How the Orchestrator Calls It

Supported File Types

Analysis Tools

olevba Indicator Categories

Risk Level Mapping

Finding Types Generated by Adapter

1. Malware Analyzer

Purpose

Accepted Input

Analysis Techniques

Finding Object Format

Example Findings

Planned Libraries

2. Steganography Analyzer

Purpose

Accepted Input

Analysis Techniques

Finding Object Format

Example Findings

Planned Tools/Libraries

3. Reconnaissance Analyzer

Purpose

Accepted Input

Analysis Techniques

Finding Object Format

Example Findings

Planned Libraries/Tools

4. Web Vulnerability Analyzer

Purpose

Accepted Input

Analysis Techniques

Finding Object Format

Example Findings

Planned Libraries/Tools

Risk Score Calculation

Adding a New Analyzer