Skip to content

Latest commit

 

History

History
551 lines (420 loc) · 19.6 KB

File metadata and controls

551 lines (420 loc) · 19.6 KB

SecFlow — Analyzers

This document describes each of SecFlow's five analyzer microservices: their purpose, Docker service names, real API endpoints, tools used, and output contracts.


Common Interface

Each analyzer is an independent Docker microservice. The Orchestrator never imports analyzer code — it always calls via HTTP over the secflow-net Docker bridge. All containers listen on port 5000 internally.

Analyzer Docker service Host port Request format
Malware malware-analyzer 5001 multipart/form-data file
Steganography steg-analyzer 5002 multipart/form-data file (async)
Reconnaissance recon-analyzer 5003 JSON {"query": "..."}
Web Vulnerability web-analyzer 5005 JSON {"url": "..."}
Macro / Office macro-analyzer 5006 multipart/form-data file

Each service returns its own native JSON. An adapter inside the Orchestrator (orchestrator/app/adapters/<name>_adapter.py) translates that into the SecFlow contract:

{
    "analyzer":   str,         # "malware" | "steg" | "recon" | "web" | "macro"
    "pass":       int,         # 1-indexed loop pass number
    "input":      str,         # the exact value passed in
    "findings":   list[dict],  # normalised finding objects
    "risk_score": float,       # aggregate risk for this pass, 0.0–10.0
    "raw_output": str          # full text output (AI reads this for IOC extraction)
}

Each finding object:

{
    "type":     str,   # finding type string (e.g. "malware_detection", "av_detection")
    "detail":   str,   # human-readable description
    "severity": str,   # "info" | "low" | "medium" | "high" | "critical"
    "evidence": str,   # raw evidence — rendered intelligently in the HTML report
}

Analyzer services must never crash the Orchestrator. The adapter wraps all HTTP calls in try/except and returns an error-shaped finding dict if the service is unreachable.


1. Malware Analyzer

Source: backend/Malware-Analyzer/
Docker service: malware-analyzer
Host port: 5001 → container port 5000
Base image: eclipse-temurin:21-jdk-jammy (JDK 21 required for Ghidra JVM)
Adapter: orchestrator/app/adapters/malware_adapter.py

Real Endpoints

Method Route Timeout Purpose
GET /api/malware-analyzer/health Health check
POST /api/malware-analyzer/file-analysis 60s VirusTotal API v3 lookup
POST /api/malware-analyzer/decompile 180s Ghidra decompile + objdump -d
POST /api/malware-analyzer/ai-summary Gemini narrative (internal, not used by orchestrator)

There is no bare POST /api/malware-analyzer/ route.

How the Orchestrator Calls It

# Call 1 — VirusTotal threat intel
requests.post(f"{_MALWARE_BASE}/file-analysis", files={"file": open(path, "rb")}, timeout=60)

# Call 2 — Ghidra decompile (slow — JVM + full analysis)
requests.post(f"{_MALWARE_BASE}/decompile", files={"file": open(path, "rb")}, timeout=180)

# Merged before adapter:
raw = {"vt": <file-analysis resp>, "decompile": <decompile resp>}

Analysis Tools

Tool Purpose
pyghidra + Ghidra 12.0.1 Full decompilation, auto-analysis of all binary functions
objdump -d Assembly-level disassembly
VirusTotal API v3 70+ AV engine detections, behavioral tags, file stats

Supported Extensions

exe, dll, so, elf, bin, o, out — other extensions return HTTP 400.

Required Env Vars

  • VIRUSTOTAL_API_KEY — required for /file-analysis
  • GEMINI_API_KEY — only needed for /ai-summary and /diagram-generator

Finding Types Generated by Adapter

Finding type Severity Description
malware_detection critical/high/info VT detection stats
av_detection high/medium Individual AV engine results
malware_clean info No VT detections
decompile_result medium/info Ghidra decompiled code
suspicious_string high URL/IP/C2 found in decompile

2. Steganography Analyzer

Source: backend/Steg-Analyzer/
Docker service: steg-analyzer
Host port: 5002 → container port 5000
Adapter: orchestrator/app/adapters/steg_adapter.py

Real Endpoints

Method Route Purpose
POST /api/steg-analyzer/upload Submit file, returns {hash}
GET /api/steg-analyzer/status/{hash} Poll analysis status
GET /api/steg-analyzer/result/{hash} Fetch final results

How the Orchestrator Calls It

The steg analyzer is asynchronous — upload, then poll:

# Step 1 — upload
r = requests.post(f"{_STEG_BASE}/upload", files={"file": open(path, "rb")})
hash_ = r.json()["hash"]

# Step 2 — poll until done
while True:
    r = requests.get(f"{_STEG_BASE}/status/{hash_}", timeout=10)
    if r.json()["status"] == "done":
        break
    time.sleep(2)

# Step 3 — fetch results
r = requests.get(f"{_STEG_BASE}/result/{hash_}", timeout=30)

Analysis Tools

  • LSB analysis (pixel-level encoding detection)
  • binwalk — file carving and embedded file extraction
  • zsteg (PNG) — steganography detection
  • steghide (JPEG/BMP) — extraction
  • ExifTool — metadata inspection
  • outguess, pngcheck, graphicsmagick

Dependencies

The Steg Analyzer runs with PostgreSQL (steg-postgres) + Redis (steg-redis) + RQ worker (steg-worker) for async job queuing.


3. Reconnaissance Analyzer

Source: backend/Recon-Analyzer/src/
Docker service: recon-analyzer
Host port: 5003 → container port 5000
API prefix: /api/Recon-Analyzer (capital R and A — exact) Adapter: orchestrator/app/adapters/recon_adapter.py

Real Endpoints

Method Route Purpose Input
GET /api/Recon-Analyzer/health Health check
POST /api/Recon-Analyzer/scan IP/domain threat intel {"query": "ip_or_domain"}
POST /api/Recon-Analyzer/footprint Email/phone/username OSINT {"query": "email"}

The request body key is query, not target.

How the Orchestrator Calls It

# IP or domain
requests.post(f"{_RECON_BASE}/scan", json={"query": ip_or_domain}, timeout=60)

# Email / phone / username OSINT (when AI chains from macro IOCs)
requests.post(f"{_RECON_BASE}/footprint", json={"query": email_or_username}, timeout=60)

Analysis Modules (wired into main.py)

Module What it checks
ipapi.py Country, ISP, ASN, city, timezone via ip-api.com
talos.py Cisco Talos IP blocklist (local talos.txt, auto-downloaded)
tor.py Tor exit node list (local tor.txt, auto-downloaded)
tranco.py Tranco domain ranking (domains only)
threatfox.py ThreatFox IOC lookup — malware family, confidence (domains only)
xposedornot.py Email breach check (email footprint)
phone.py NumVerify phone validation (phone footprint)
username.py Sagemode multi-site username OSINT (username footprint)

Supported Input Auto-Detection

  • Valid IPv4 regex → runs ipapi + talos + tor
  • Valid domain regex → resolves IP, runs ipapi + talos + tor + tranco + threatfox
  • Email regex → footprint: xposedornot breach check
  • Phone regex → footprint: NumVerify
  • Else → footprint: Sagemode username OSINT

Required Env Vars (optional)

  • NUMVERIFY_API_KEY — phone validation
  • THREATFOX_API_KEY — higher rate limit
  • ipAPI_KEY — ip-api.com Pro

4. Web Vulnerability Analyzer

Source: backend/Web-Analyzer/
Docker service: web-analyzer
Host port: 5005 → container port 5000
Adapter: orchestrator/app/adapters/web_adapter.py

Real Endpoint

Method Route Input
POST /api/web-analyzer/ JSON {"url": "https://..."}

Analysis Capabilities

  • HTTP response analysis (status code, headers, redirect chain)
  • Security header audit (CSP, HSTS, X-Frame-Options, X-Content-Type-Options, etc.)
  • Technology fingerprinting
  • Basic vulnerability scanning

Required Env Vars

  • GEMINI_API_KEY — used internally for enhanced analysis (optional)

5. Macro / Office Analyzer

Source: backend/macro-analyzer/
Docker service: macro-analyzer
Host port: 5006 → container port 5000
Adapter: orchestrator/app/adapters/macro_adapter.py

Real Endpoints

Method Route Purpose
GET /api/macro-analyzer/health Health check
POST /api/macro-analyzer/analyze Full VBA + VirusTotal analysis

How the Orchestrator Calls It

requests.post(f"{_MACRO_BASE}/analyze",
              files={"file": (original_name, open(path, "rb"))},
              timeout=60)

Supported File Types

.doc, .docx, .xls, .xlsx, .xlsm, .xlsb, .ppt, .pptx, .pptm, .rtf, .docm

Analysis Tools

Tool Purpose
oletools / olevba VBA macro extraction, indicator analysis, IOC extraction
VirusTotal API v3 SHA-256 hash lookup → upload → poll for analysis results

olevba Indicator Categories

Category Severity Meaning
AutoExec critical Macro runs automatically on open/close
Suspicious high Suspicious API calls (Shell, CreateObject, etc.)
IOC high Embedded URLs, IPs, file paths
Hex String medium Hex-encoded obfuscated content
Base64 String medium Base64-obfuscated content
Dridex String critical Dridex banking trojan string encoding

Risk Level Mapping

olevba risk_level Risk score Condition
malicious 9.5 (base) AutoExec + Suspicious flags both present
suspicious 6.5 (base) Suspicious or IOC or obfuscated
macro_present 3.0 Macros found, no suspicious flags
clean 0.5 No macros

If VirusTotal confirms malicious hits, risk score is raised: 1+ detections → max(base, 7.0); 5+ → max(base, 9.5).

Finding Types Generated by Adapter

Finding type Description
macro_malicious / macro_suspicious / macro_present Overall VBA verdict
macro_indicator_autoexec AutoExec indicators
macro_indicator_suspicious Suspicious API calls
macro_indicator_ioc Extracted IOCs
macro_ioc IOC chip list (enables AI to chain to recon/web)
macro_source Full VBA source (collapsible in report)
macro_xlm Excel 4 (XLM) deobfuscated macros
malware_detection VirusTotal stats table
av_detection Per-engine AV detection (up to 10)
payload_downloaded Always shown when file was fetched from a URL

Each service returns its own native JSON. An adapter inside the Orchestrator (orchestrator/app/adapters/<name>_adapter.py) translates that into the SecFlow contract:

{
    "analyzer": str,         # "malware" | "steg" | "recon" | "url" | "web"
    "pass": int,             # 1-indexed loop pass number
    "input": str,            # the exact value passed in
    "findings": list[dict],  # see per-analyzer finding format below
    "risk_score": float,     # aggregate risk for this pass, 0.0–10.0
    "raw_output": str        # concatenated raw tool output (for AI consumption)
}

Analyzer services must never crash the Orchestrator. The adapter must wrap the HTTP call in try/except and return an error-shaped finding dict if the service is unreachable or returns a non-200 response.


1. Malware Analyzer

Service: backend/malware-analyzer/POST http://malware-analyzer:5001/api/malware-analyzer/ Adapter: orchestrator/app/adapters/malware_adapter.py

Purpose

Detect malicious characteristics in executables, PE binaries, and extracted binary payloads.

Accepted Input

  • File path to: .exe, .dll, .bin, .elf, extracted payload from another analyzer pass

Analysis Techniques

Technique Description
File hashing Compute MD5, SHA1, SHA256
YARA scanning Match bundled YARA rule set
PE header analysis Parse PE sections, imports, exports, timestamps
String extraction Extract printable strings; flag suspicious patterns (URLs, IPs, registry keys, API names)
Entropy analysis High entropy sections → possible packing/encryption
(Optional) VirusTotal Hash lookup via VT API if key is configured

Finding Object Format

{
    "type": "signature_match" | "suspicious_string" | "pe_metadata" | "hash" | "entropy" | "error",
    "detail": str,     # human-readable description
    "severity": "low" | "medium" | "high" | "critical",
    "evidence": str    # raw evidence snippet
}

Example Findings

[
  { "type": "hash", "detail": "SHA256: abc123...", "severity": "info", "evidence": "" },
  { "type": "signature_match", "detail": "YARA rule: Trojan.GenericKDZ matched", "severity": "critical", "evidence": "offset 0x200" },
  { "type": "suspicious_string", "detail": "HTTP callout found", "severity": "high", "evidence": "http://192.168.1.100/beacon" }
]

Planned Libraries

  • yara-python — YARA rule matching
  • pefile — PE binary parsing
  • hashlib — File hashing (stdlib)
  • strings (system) or regex — String extraction

2. Steganography Analyzer

Service: backend/steg-analyzer/POST http://steg-analyzer:5002/api/steg-analyzer/ Adapter: orchestrator/app/adapters/steg_adapter.py

Purpose

Detect and extract hidden data embedded within image files using steganographic or watermarking techniques.

Accepted Input

  • File path to: .png, .jpg, .jpeg, .bmp, .gif, .tiff

Analysis Techniques

Technique Description
LSB analysis Detect least-significant-bit encoding in pixel data
Metadata inspection ExifTool — check for hidden data in EXIF/IPTC/XMP
Embedded file extraction binwalk — detect and extract appended/embedded files
Tool-based detection zsteg (PNG), stegdetect (JPEG), steghide (JPEG/BMP)
Strings scan Run strings on the image binary, flag suspicious patterns

Finding Object Format

{
    "type": "embedded_file" | "lsb_data" | "metadata_anomaly" | "suspicious_string" | "error",
    "detail": str,
    "severity": "low" | "medium" | "high" | "critical",
    "evidence": str,
    "extracted_path": str | None   # path to extracted file if applicable
}

Example Findings

[
  { "type": "embedded_file", "detail": "binwalk found embedded PE binary", "severity": "critical", "evidence": "offset 0x8200", "extracted_path": "/tmp/secflow/extracted/steg_payload.exe" },
  { "type": "metadata_anomaly", "detail": "EXIF GPS data present", "severity": "low", "evidence": "GPS: 37.7749,-122.4194", "extracted_path": null }
]

Planned Tools/Libraries

  • binwalk (system) — File carving, embedded file extraction
  • zsteg (system/gem) — PNG steg detection
  • stegdetect (system) — JPEG steg detection
  • steghide (system) — Steghide extraction
  • pyexiftool or exiftool (system) — Metadata inspection
  • Pillow — Image loading and pixel-level analysis

3. Reconnaissance Analyzer

Service: backend/recon-analyzer/POST http://recon-analyzer:5003/api/recon-analyzer/ Adapter: orchestrator/app/adapters/recon_adapter.py

Purpose

Gather OSINT and infrastructure intelligence on IPs, domains, and hostnames.

Accepted Input

  • IP address string (e.g., "192.168.1.100")
  • Domain or hostname string (e.g., "evil.example.com")

Analysis Techniques

Technique Description
WHOIS lookup Registrant, registrar, creation/expiry dates
DNS records A, AAAA, MX, NS, TXT, CNAME records
Reverse DNS PTR record lookup
Port scanning Top ports scan via nmap
Geolocation Country, ASN, ISP
Threat intel Shodan lookup (optional), AbuseIPDB (optional)
Certificate info TLS cert subjects and SANs (for domains)

Finding Object Format

{
    "type": "whois" | "dns" | "port" | "geolocation" | "threat_intel" | "cert" | "error",
    "detail": str,
    "severity": "info" | "low" | "medium" | "high" | "critical",
    "evidence": str
}

Example Findings

[
  { "type": "port", "detail": "Open ports detected", "severity": "medium", "evidence": "22/tcp open ssh, 80/tcp open http, 443/tcp open https" },
  { "type": "threat_intel", "detail": "IP found in Shodan with malware tag", "severity": "critical", "evidence": "tags: malware, c2" },
  { "type": "whois", "detail": "Domain registered 2 days ago", "severity": "high", "evidence": "created: 2026-03-04" }
]

Planned Libraries/Tools

  • python-whois — WHOIS lookups
  • dnspython — DNS queries
  • nmap (system) + python-nmap — Port scanning
  • shodan — Shodan API (optional; requires SHODAN_API_KEY)
  • requests — AbuseIPDB / threat intel APIs
  • socket — Reverse DNS

4. Web Vulnerability Analyzer

Service: backend/web-analyzer/POST http://web-analyzer:5005/api/web-analyzer/ Adapter: orchestrator/app/adapters/web_adapter.py

Purpose

Analyze URLs and web endpoints for vulnerabilities, misconfigurations, and security weaknesses.

Accepted Input

  • Full URL string (e.g., "http://192.168.1.100/beacon", "https://example.com/login")

Analysis Techniques

Technique Description
HTTP response analysis Status code, response headers, redirect chain
Security header audit Check for missing CSP, HSTS, X-Frame-Options, etc.
Technology fingerprinting Identify server, framework, CMS versions
Cookie security Inspect Secure, HttpOnly, SameSite flags
Basic vuln scanning nuclei (optional), common path probing
TLS/SSL inspection Certificate validity, weak ciphers
URL reputation VirusTotal URL scan (optional)

Finding Object Format

{
    "type": "missing_header" | "vuln" | "tech_fingerprint" | "tls_issue" | "redirect" | "cookie" | "error",
    "detail": str,
    "severity": "info" | "low" | "medium" | "high" | "critical",
    "evidence": str
}

Example Findings

[
  { "type": "missing_header", "detail": "Content-Security-Policy header absent", "severity": "medium", "evidence": "" },
  { "type": "tech_fingerprint", "detail": "Apache 2.4.49 detected (known CVE)", "severity": "critical", "evidence": "Server: Apache/2.4.49" },
  { "type": "tls_issue", "detail": "TLS 1.0 supported (deprecated)", "severity": "high", "evidence": "TLSv1.0 cipher accepted" }
]

Planned Libraries/Tools

  • requests — HTTP requests and response analysis
  • Wappalyzer (or builtwith) — Technology fingerprinting
  • nuclei (system, optional) — Template-based vuln scanning
  • sslyze or ssl (stdlib) — TLS/SSL analysis
  • urllib (stdlib) — URL parsing

Risk Score Calculation

Each analyzer computes a risk_score (0.0–10.0) for the pass based on the severity distribution of its findings:

Severity Weight
critical 4.0
high 2.5
medium 1.0
low 0.3
info 0.0

Score = min(10.0, sum of severity weights)

The Report Generator computes an overall risk score as the maximum risk score observed across all passes.


Adding a New Analyzer

  1. Create a new Docker service directory under backend/<name>-analyzer/ with its own Dockerfile and requirements.txt.
  2. Add the service to backend/compose.yml on the secflow-net network.
  3. Create orchestrator/app/adapters/<name>_adapter.py to translate the service's native response into the SecFlow contract.
  4. Add the analyzer name to the routing rules in orchestrator/app/classifier/rules.py.
  5. Add the analyzer name to the available tools list in orchestrator/app/ai/engine.py.
  6. Document the service and its endpoint in this file.