Synthetic SIEM endpoints in real-vendor shapes — for SOAR / agent integration testing without touching real customer data.
siemulator is a small FastAPI service that emulates two SIEM REST surfaces
from a single pool of synthetic CrowdStrike-flavoured detections and
hand-crafted multi-source attack narratives:
| Mount | Shape | Auth |
|---|---|---|
/logscale/* |
Falcon LogScale (Humio REST API) | Authorization: Bearer or ?token= |
/qradar/* |
IBM QRadar (offences + Ariel) | SEC header, Bearer, or ?token= |
It's the thing you point a SOAR ingestion job, a detection-engineering test harness, or an agent-chain integration test at when you want a stable, reproducible stream of realistic alerts without standing up real SIEMs or touching customer telemetry.
A small web UI at / lets humans browse the scenarios, run
endpoints interactively, and copy curl snippets — try the live demo at
https://siemulator-y7uhf.ondigitalocean.app.
Status: v0.1.0 · MIT-licensed · Python 3.10+ · Docker (amd64 + arm64).
- Why
- Quickstart
- Configuration
- Web UI
- Endpoints
- Response shape — quick reference
- Scenario modes
- What's in the box
- Use as a test fixture
- Wire it into your SIEM / SOAR
- Debug endpoints
- Record / replay / diff
- Access log
- Safety markers
- Architecture
- What siemulator IS / ISN'T
- Performance & limits
- Roadmap
- Deploy on DigitalOcean App Platform
- Development
- FAQ
- License
Real SIEMs are slow to stand up for tests, real customer data can't be
replayed across environments, and "just hit a record-and-replay fixture"
fails the moment your integration code starts negotiating shape
(SEC vs Bearer, start_time as int-ms vs string, id vs offense_id,
…). siemulator lets you:
- Pin shape regressions in CI. Every endpoint has a contract test — fork them as your integration's golden-shape pins.
- Replay 38 hand-crafted multi-source attack scenarios (phishing → MFA fatigue → token theft → UEFI bootkit → insider exfil → 0-day SSTI → ProxyShell → Golden Ticket → BEC + 10 more). Each is tagged with a stable offence ID so dedup-by-ID works across replays — your SOAR doesn't create 47 incidents from one scenario when the poller runs every 60 s.
- Cross-token acceptance — either token works on either surface. Config-paste mistakes during initial integration setup don't burn you; both surfaces serve synthetic data so cross-acceptance has zero security impact.
- Three auth channels per surface —
Authorization: Bearer,SECheader (QRadar canonical), and?token=query param. The query-param channel survives forward proxies that stripAuthorization/Sec-*headers in egress. - One-shot dedup mode —
?scenarios=allreturns each scenario ID exactly once per process lifetime, so a cron poller can drain the whole scenario library over N polls without re-ingesting the same incidents on every cycle.
pip install siemulator
python -m siemulator # listens on :8080 by defaultOr with Docker (multi-arch, amd64 + arm64):
docker run -p 8080:8080 ghcr.io/sirp-labs/siemulator:latest
# or
docker compose upThen:
# Health (no auth)
curl http://localhost:8080/logscale/api/v1/status
# LogScale alerts (default token: logscale-dev-token)
curl -H "Authorization: Bearer logscale-dev-token" \
"http://localhost:8080/logscale/api/v1/repositories/detections/alerts?limit=3"
# QRadar offences
curl -H "SEC: qradar-dev-token" \
"http://localhost:8080/qradar/api/siem/offenses"
# All 38 multi-source attack scenarios
curl "http://localhost:8080/qradar/api/siem/scenarios?token=qradar-dev-token"All via env vars. Defaults work for local testing — override in production.
| Variable | Default | Purpose |
|---|---|---|
SIEMULATOR_LOGSCALE_TOKEN |
logscale-dev-token |
Bearer token for /logscale/* |
SIEMULATOR_QRADAR_TOKEN |
qradar-dev-token |
SEC / Bearer token for /qradar/* |
SIEMULATOR_ADMIN_KEY |
(empty — disabled) | Admin key for /qradar/_debug/* |
SIEMULATOR_LOGSCALE_PREFIX |
/logscale |
URL prefix override |
SIEMULATOR_QRADAR_PREFIX |
/qradar |
URL prefix override |
SIEMULATOR_HOST |
0.0.0.0 |
Bind host |
SIEMULATOR_PORT |
8080 |
Bind port |
SIEMULATOR_UI_ENABLED |
true |
Web UI at /. Set false for pure-API mode |
SIEMULATOR_ACCESS_LOG_ENABLED |
true |
Capture every API request to a ring + stdout (see Access log) |
SIEMULATOR_ACCESS_LOG_SIZE |
5000 |
In-memory ring capacity |
SIEMULATOR_ACCESS_LOG_SKIP_HEALTH |
false |
Skip /status / /api/help to reduce noise |
SIEMULATOR_SESSIONS_ENABLED |
true |
Record / replay / diff (see Record / replay / diff) |
SIEMULATOR_SESSIONS_DIR |
./siemulator-sessions |
JSONL persistence directory |
See .env.example.
Prefix overrides are useful if you're emulating an existing
integration that was pointed at non-default URLs and you don't want to
change the consumer-side config. Setting
SIEMULATOR_LOGSCALE_PREFIX=/api/v1/falcon-logscale and
SIEMULATOR_QRADAR_PREFIX=/siem-mock is supported — both prefixes can
take any path.
GET / serves a single-page UI when SIEMULATOR_UI_ENABLED=true
(the default). It's a zero-dependency dark-themed page with:
- Hero + quickstart with copy-able curl snippets (auto-populated with whatever token you paste in the form).
- An interactive Try it panel that runs requests against the same origin — pick endpoint, paste token, see formatted JSON + status + latency.
- A scenario browser for the 38 multi-source attack narratives: click S1/S2/.../TEST-J/DEMO-A/SCAN-A/ENRICH-A chips to expand each chain with per-alert source labels and raw-alert JSON.
- A detection templates table with the 6 templates and their MITRE tactic + technique IDs.
- A debug-endpoint probe under a collapsed
<details>block (pasteX-Admin-Key, hit the gated endpoints).
For pure-API deployments, set SIEMULATOR_UI_ENABLED=false — / then
returns the same JSON metadata as /api/info (the always-JSON
machine-readable endpoint).
/api/info is always JSON regardless of UI state — use it for
liveness probes that should never see HTML.
| Method | Path | Auth | Purpose |
|---|---|---|---|
| GET | /api/v1/status |
— | Health (Humio version shape) |
| GET | /api/v1/repositories |
— | List repos (always [{detections}]) |
| GET | /api/v1/repositories/{repo}/alerts?limit=N |
✅ | Synthetic Humio events (1-50) |
| GET | /api/v1/repositories/{repo}/query?q=…&limit=N |
✅ | Same shape; q accepted but ignored |
| POST | /api/v1/repositories/{repo}/queryjobs |
✅ | Async submit → returns {id} |
| GET | /api/v1/repositories/{repo}/queryjobs/{id} |
✅ | Poll — stable across repeated reads |
| Method | Path | Auth | Purpose |
|---|---|---|---|
| GET | /api/help / /api/help/capabilities |
— | Health |
| GET | /api/siem/offenses[?scenarios=all|batch|replay|mix] |
✅ | Active offences + scenario modes |
| GET | /api/siem/offenses/{id} |
✅ | Single offence (id echoed back) |
| GET | /api/siem/scenarios |
✅ | All 38 multi-source attack narratives |
| GET | /api/siem/source_addresses |
✅ | IP context (3 synthetic rows) |
| POST | /api/ariel/searches |
✅ | Submit (returns COMPLETED immediately) |
| GET | /api/ariel/searches/{id} |
✅ | Status |
| GET | /api/ariel/searches/{id}/results |
✅ | Results {events: [...]} |
LogScale alerts (/logscale/api/v1/repositories/detections/alerts)
return a Humio-style envelope:
{
"events": [
{
"@timestamp": "2026-06-07T17:42:01.234Z",
"@id": "8a3f4b5c6d7e8f90a1b2c3d4",
"@rawstring": "2026-06-07T17:42:01.234Z CrowdStrike Falcon Sensor — Detection: Credential Dumping via Mimikatz on WIN-DESKTOP-01.example.local by EXAMPLE\\analyst",
"#repo": "detections",
"#type": "kv",
"metadata.eventType": "DetectionSummaryEvent",
"event.DetectId": "ldt:5b6c7d8e…",
"event.DetectName": "Credential Dumping via Mimikatz",
"event.Severity": 5,
"event.SeverityName": "Critical",
"event.Tactic": "Credential Access",
"event.TacticId": "TA0006",
"event.Technique": "OS Credential Dumping: LSASS Memory",
"event.TechniqueId": "T1003.001",
"event.ComputerName": "WIN-DESKTOP-01.example.local",
"event.UserName": "EXAMPLE\\analyst",
"event.CommandLine": "mimikatz.exe \"sekurlsa::logonpasswords\" exit",
"event.MD5String": "a1b2c3d4…",
"event.SHA256String": "0f2dd7587…",
"event.FalconHostLink": "https://falcon.crowdstrike.com/activity/detections/detail/ldt:…",
"x-mock-source": "siemulator"
}
],
"metadata": {
"totalWork": 1,
"doneWork": 1,
"workInProgress": 0,
"extraData": {
"x-mock-source": "siemulator",
"x-mock-version": "1.0",
"x-server-timestamp": 1780839721234
}
}
}QRadar offences (/qradar/api/siem/offenses) return a list (not
an envelope — matches QRadar's actual API):
[
{
"id": 95693,
"offense_id": 95693,
"description": "Lateral Movement via PsExec — PsExec service binary created on remote host; followed by service start from non-administrative user context.",
"source_ip": "10.42.83.12",
"destination_ip": "172.16.55.91",
"severity": 7,
"magnitude": 8,
"credibility": 7,
"relevance": 8,
"status": "OPEN",
"categories": ["Lateral Movement", "Custom Rule Engine"],
"rules": [{"type": "CRE_RULE", "id": 158472}],
"start_time": 1780839721000,
"start_epochtime": 1780839721000,
"event_count": 247,
"log_sources": [
{"type_name": "EventCRE", "id": 63, "name": "Custom Rule Engine-8 :: cre-primary", "type_id": 18},
{"type_name": "MicrosoftWindows", "id": 168, "name": "WinEventLog @ WIN-DESKTOP-01.example.local", "type_id": 12}
],
"domain_id": 1,
"domain_name": "EXAMPLE",
"_detection": {
"DetectName": "Lateral Movement via PsExec",
"Tactic": "Lateral Movement",
"TechniqueId": "T1021.002",
"MD5String": "75b55bb34dac9d029396fbb98ab8b8ff"
},
"x-mock-source": "siemulator"
}
]Shape pins worth knowing (break these and downstream ingestion
crashes — they're in tests/test_qradar.py):
idisint, not string — consumers doa['offense_id'] = a['id'].start_timeis INT MILLISECONDS EPOCH — consumers dodatetime.fromtimestamp(a['start_time']/1000).severityisint 1-10, not the LogScale"Critical"/"High"string.
/qradar/api/siem/offenses?scenarios=…:
all— One-shot. Returns fresh scenarios only; each offence ID served once per process lifetime. Use for cron-style pollers that would otherwise create duplicate incidents on every cycle. Reset viaPOST /qradar/_debug/reset_scenarios(admin-key gated).batch— Rotate one scenario per call (round-robin through all 22). Useful for slow-drip ingestion testing.replay— All 38 scenarios in one response, ignoring the one-shot dedup set. Useful for one-shot ad-hoc bulk ingestion tests.mix— All scenarios + N synthetic templates (N from theRange: items=0-Nheader). Useful for testing how your consumer handles a mixed pool.
Six templates form the rotating pool that LogScale /alerts and QRadar
default-mode /offenses draw from. Each carries MITRE tactic + technique
IDs, realistic command lines, MD5/SHA256, and host context. All shipped
in siemulator/templates.py — add your own by appending to
ALERT_TEMPLATES.
| Tactic | Technique | DetectName | Severity |
|---|---|---|---|
| TA0006 Credential Access | T1003.001 OS Credential Dumping: LSASS Memory | Credential Dumping via Mimikatz | Critical |
| TA0002 Execution | T1059.001 PowerShell | Suspicious PowerShell with Base64 Encoded Command | High |
| TA0008 Lateral Movement | T1021.002 SMB/Windows Admin Shares | Lateral Movement via PsExec | High |
| TA0001 Initial Access | T1566.001 Spearphishing Attachment | Phishing — Suspicious Outlook Attachment | Medium |
| TA0011 Command and Control | T1071.001 Application Layer Protocol: Web | Beaconing C2 Traffic to Known Bad Domain | Critical |
| TA0003 Persistence | T1547.001 Registry Run Keys / Startup Folder | Suspicious File Write to Startup Folder | Medium |
Thirty-eight hand-crafted offences spread across five batches, each tagged
with a stable offense_id in 90011-90098 and a _scenario_id label.
Each offence carries a _raw_alert block preserving the original
vendor-specific schema (Proofpoint TAP, Defender for Endpoint,
CrowdStrike Falcon, Zscaler ZIA, Entra ID, Eclypsium, Purview, WAF,
CloudWatch, …) so a downstream agent can analyse the multi-vendor
narrative end-to-end.
_scenario_id |
offence IDs | Narrative |
|---|---|---|
| S1 | 90011 → 90015 | Living-off-the-land supply chain (5 alerts, 47 min). Proofpoint clean email → Defender signed download → CrowdStrike DLL side-load + persistence → Defender LOLBin recon + certutil exfil → Zscaler Notion-API stego exfil. |
| S2 | 90021 | Identity attack chain. MFA fatigue → PRT theft → OAuth app consent → mailbox forwarding rule → admin role grant. |
| S3 | 90031 | UEFI firmware bootkit (BlackLotus-class) — pre-boot persistence detected by Eclypsium. |
| S4 | 90041 | Insider threat + steganographic exfiltration — ML-model weights hidden in PNG attachments. |
| S5 | 90051 → 90054 | Zero-day chain (4 alerts). WAF-blocked SSTI probe → WAF-bypassed SSTI success → webshell + XMRig + crontab persist → CloudWatch CPU spike + $847/day cost anomaly. |
Independent single-offence narratives covering attacker tradecraft where a single sophisticated event tells the whole story.
_scenario_id |
offence ID | Narrative |
|---|---|---|
| TEST-A | 90061 | Golden Ticket — Kerberos persistence (T1558.001) |
| TEST-B | 90062 | Exchange ProxyShell — webshell + backdoor user (CVE-2021-34473) |
| TEST-C | 90063 | DNS tunneling — dnscat2 exfiltration 12.4 MB (T1048.003) |
| TEST-D | 90064 | SIM swap → MFA bypass → Okta/AWS admin (T1111 + T1098) |
| TEST-E | 90065 | Linux LKM rootkit — syscall hooks + SSH key persistence (T1014) |
| TEST-F | 90066 | BEC CEO wire fraud — .CO TLD + Gmail reply-to (zero IOCs — tests semantic analysis) |
| TEST-G | 90067 | CI/CD compromise — GitHub Actions secret exfil + supply chain |
| TEST-H | 90068 | Medical infusion pump — drug-limit override 10×↑, patient at risk (CVE-2022-26390) |
| TEST-I | 90069 | Deepfake vishing — AI-synthesised CEO voice + BEC multi-channel |
| TEST-J | 90070 | GPO abuse — domain-wide scheduled-task + persistence (T1484.001) |
Eight synthetic-IOC fixtures with
enrichment-bypass testing.
Every IOC uses a deliberately synthetic pattern that public TI sources
have no record of — RFC 5737 TEST-NET IPs (198.51.100.x,
192.0.2.x, 203.0.113.x), NetBIOS-shape names (CORPA /
CORPB / *.example.local), 48-char placeholder hashes (not
valid SHA-256/SHA-1/MD5), and fictional domains
(update-check-cdn.net, acme-portal-secure.net). Each IOC is
annotated with a pattern tag (rfc5737_testnet / fictional /
placeholder_48char / netbios_internal / tor_exit_node) so
downstream enrichment-bypass code can pattern-match and short-circuit
public-TI round-trips.
_scenario_id |
offence ID | Category | Narrative |
|---|---|---|---|
| DEMO-A | 90081 | 107 Malware | Admin-tool execution on managed-services workstation, 4 IOCs, expects BENIGN_AUTHORIZED roll-up |
| DEMO-B | 90082 | 108 Phishing → 123 | Credential-harvest link, RFC 5737 sender IP, fictional brand-spoofed domain |
| DEMO-C | 90083 | 110 Network Anomaly → 114 Cloud Sec | Outbound TCP 443 to RFC 5737 destination + fictional domain |
| DEMO-D | 90084 | 107 Malware | HR-workstation unsigned binary, placeholder hash + NetBIOS user identifier |
| DEMO-E | 90085 | 114 Cloud Security → 111 | AWS IAM AttachUserPolicy privilege-escalation from RFC 5737 source IP |
| DEMO-F | 90086 | 107 Malware | EDR-quarantined binary, synthetic-hash-only IOC |
| DEMO-G | 90087 | 107 Malware | CORPB service account running ad-hoc PowerShell AD-recon (NetBIOS-only IOC) |
| DEMO-H | 90088 | 108 Phishing | Sender IP is a real Tor exit node — the only IOC in the corpus that public TI consistently identifies (public TI tags it via TorProject) |
Three sibling alerts from the same actor (SECTEAM\\pentester-01) and same
source IP (10.50.5.42) targeting three different production hosts
over ~20 minutes. Designed to drive Entity Agent (Q1 actor lookup
via am_name: user IOCs) AND give related_incidents a
same_source_ip + same_user clustering anchor so the disposition
recommender can roll all three into a single "authorized pentest"
verdict.
Each independently looks like genuine internal recon (SEV3 magnitude
5, borderline s3_score 50-60, VERIFICATION_REQUIRED expected
verdict) so the auto-close decision matters; in aggregate the three
share enough anchors that any related-incidents-aware recommender
should close them as true_positive_benign_authorized.
_scenario_id |
offence ID | QRadar categories | Target | Scan technique | Expected disposition |
|---|---|---|---|---|---|
| SCAN-A | 90091 | Port Scan · Network Reconnaissance | WIN-PROD-DB-01 | TCP SYN scan (nmap -sS) | true_positive_benign_authorized |
| SCAN-B | 90092 | Network Reconnaissance · Suspicious Network Activity | WIN-PROD-APP-01 | Service version (nmap -sV) | true_positive_benign_authorized |
| SCAN-C | 90093 | Network Reconnaissance · Port Scan | WIN-PROD-WEB-01 | Vuln scripts (nmap --script vuln) | true_positive_benign_authorized |
Each _raw_alert ships a qradar_categories override (read by
_wrap_as_qradar_offence) so the QRadar offence's categories field
carries the realistic recon labels instead of the default
Sophisticated-Test tag. Each carries 4 typed IOCs
(source_ip / destination_ip / user / process) including the
load-bearing am_name: "user" artifact that triggers Entity Agent.
The positive-path complement to the DEMO batch — where DEMO deliberately uses synthetic IOCs that public TI sources can't enrich (so the enrichment-bypass detector has positive test cases), ENRICH uses real, well-documented, historical IOCs that AlienVault OTX / abuse.ch / VirusTotal / GreyNoise / TorProject reliably tag.
Every ENRICH IOC carries a pattern: "ti_*" tag — the discriminator
that tells downstream consumers "round-trip this to public TI" vs
DEMO's synthetic_* tags that mean "short-circuit, don't waste a
public lookup."
_scenario_id |
offence ID | Theme | Key IOCs (with pattern tag) |
Expected verdict / disposition |
|---|---|---|---|---|
| ENRICH-A | 90094 | WannaCry ransomware (historical) | ti_known_wannacry SHA-256 ed01ebfb…b9faaaaa + ti_known_wannacry_killswitch kill-switch domain |
MALICIOUS_CONFIRMED / true_positive_confirmed (SEV1) |
| ENRICH-B | 90095 | Stuxnet C2 callback (historical) | ti_known_stuxnet × 2: mypremierfutbol.com + todaysfutbol.com (sinkholed since 2010) |
MALICIOUS_CONFIRMED / true_positive_confirmed_historical (SEV1) |
| ENRICH-C | 90096 | EICAR test file (universal positive control) | ti_eicar_test SHA-256 275a021b…f651fd0f + matching MD5 — every AV identifies |
MALICIOUS_CONFIRMED / true_positive_benign_test (SEV4) |
| ENRICH-D | 90097 | Outbound to confirmed Tor exit | ti_tor_exit_real IP in 185.220.101.0/24 (TorProject directory + GreyNoise tag) |
SUSPICIOUS / true_positive_requires_review (SEV2) |
| ENRICH-E | 90098 | Inbound from documented benign scanner | ti_known_scanner IP 71.6.146.185 (GreyNoise: Shodan-affiliated mass scanner) |
BENIGN_AUTHORIZED / false_positive_scanner_noise (SEV5) |
Each scenario lists expected_ti_sources (e.g.
["VirusTotal", "AlienVault OTX", "ThreatFox", "MalwareBazaar"]) —
the test contract is that the enrichment agent should round-trip to
those sources and receive positive attribution. Verdicts span the
full disposition spectrum (MALICIOUS_CONFIRMED × 3 / SUSPICIOUS /
BENIGN_AUTHORIZED) so the entire enrichment-to-disposition pipeline
is exercised, not just the malicious path.
Pattern legend — the bypass-vs-enrich routing key:
| Prefix | Meaning | Batches using it | Enrichment behaviour |
|---|---|---|---|
synthetic_* |
Synthetic placeholder (no real-world TI attribution) | DEMO-A through DEMO-G | Short-circuit, return synthetic_fixture |
rfc5737_* |
Synthetic — RFC 5737 documentation IPs | DEMO scenarios | Short-circuit |
netbios_* |
Synthetic — NetBIOS-shape internal names | DEMO scenarios | Short-circuit |
fictional |
Synthetic — fabricated domain | DEMO scenarios | Short-circuit |
placeholder_* |
Synthetic — placeholder file hashes | DEMO scenarios | Short-circuit |
ti_* |
Real public-TI-attributable — should round-trip | ENRICH-A through ENRICH-E | Full enrichment |
tor_exit_* |
Tor exit node | DEMO-H, ENRICH-D | Enrich (TorProject + GreyNoise) |
authorized_pentest_* |
Authorized internal pentest actor | SCAN-A through SCAN-C | Cross-reference with related_incidents |
internal_corp_* |
Internal corp address ranges | SCAN scenarios | Skip external TI (internal-only) |
recon_tool |
Known recon tool (nmap, masscan, ...) | SCAN scenarios | Tool-attribution lookup |
The full scenario corpus lives in siemulator/scenarios.py — JSON-defined
payloads parsed once at import. Add your own by extending the registry
at the bottom of that file.
The most common consumer pattern: spin up siemulator in a pytest fixture and point your integration under test at it.
# tests/conftest.py
import os, pytest
from fastapi import FastAPI
from fastapi.testclient import TestClient
@pytest.fixture
def siem(monkeypatch):
monkeypatch.setenv("SIEMULATOR_QRADAR_TOKEN", "ci-token")
monkeypatch.setenv("SIEMULATOR_LOGSCALE_TOKEN", "ci-token")
from siemulator.app import create_app
return TestClient(create_app())
def test_my_qradar_ingest(siem):
# Your ingestion code under test:
from myapp.ingest import poll_qradar_offences
offences = poll_qradar_offences(
base_url=str(siem.base_url),
token="ci-token",
)
assert len(offences) >= 1
# Pin the shape contract — siemulator guarantees these on every poll:
assert isinstance(offences[0]["id"], int)
assert offences[0]["start_time"] > 1_000_000_000_000 # ms epoch
assert "x-mock-source" in offences[0]# docker-compose.test.yml
services:
siemulator:
image: ghcr.io/sirp-labs/siemulator:latest
environment:
SIEMULATOR_QRADAR_TOKEN: ci-token
SIEMULATOR_LOGSCALE_TOKEN: ci-token
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/logscale/api/v1/status').read()"]
interval: 2s
retries: 5
my-app-under-test:
build: .
environment:
SIEM_URL: http://siemulator:8080
SIEM_TOKEN: ci-token
depends_on:
siemulator:
condition: service_healthydocker run -d --name siem --rm -p 8080:8080 ghcr.io/sirp-labs/siemulator:latest
curl -H "SEC: qradar-dev-token" "http://localhost:8080/qradar/api/siem/offenses?scenarios=replay" \
| jq '.[] | {id, _scenario_id, description}'
docker stop siemFor pointing a real SOAR (Cortex XSOAR, Splunk SOAR, IBM Resilient), SIEM (Splunk Enterprise, Microsoft Sentinel, Elastic), or workflow tool (Tines, n8n) at siemulator for ingestion testing, see docs/ingestion-guide.md — copy-paste recipes for the common platforms plus the patterns that apply across all of them (auth-channel choice, polling cadence, scenario-dedup, verification, going-to-production checklist).
The live demo at https://siemulator-y7uhf.ondigitalocean.app lets you validate the integration end-to-end before deploying your own instance.
Gated on SIEMULATOR_ADMIN_KEY. Leave the env var empty to disable
them entirely (every /_debug/* then returns 403).
| Method | Path | Purpose |
|---|---|---|
| GET | /qradar/_debug/recent |
Last 100 requests this mock saw (path, headers, auth channel, response preview) |
| POST | /qradar/_debug/reset_scenarios |
Clear served-scenarios set so ?scenarios=all replays the pool |
| GET | /qradar/_debug/scenarios_state |
Served vs remaining scenario IDs |
All require X-Admin-Key: <SIEMULATOR_ADMIN_KEY>. Use _debug/recent
to diagnose "my poller hits siemulator but my SOAR shows zero incidents"
— you can see the exact path, query params, auth channel, and first-row
preview that went over the wire.
Turns siemulator into a regression-testing tool for SOC tooling teams. The core flow:
- Run your consumer (XSOAR / Splunk SOAR / Sentinel playbook / custom integration) against siemulator while a named session is recording.
- Upgrade or modify your consumer.
- Run the new version against siemulator with a different session name.
- Diff the two sessions — "did the consumer's request stream change?" is your regression signal.
| Method | Path | Purpose |
|---|---|---|
| POST | /api/sessions/{name}/start |
Begin recording into session name |
| POST | /api/sessions/{name}/stop |
Finalize → flush JSONL to disk |
| GET | /api/sessions |
List all sessions (in-memory + on-disk) |
| GET | /api/sessions/{name} |
Metadata + by_path + by_status summary |
| GET | /api/sessions/{name}/entries |
Full req+resp pairs (paginated; ?limit, ?offset) |
| DELETE | /api/sessions/{name} |
Remove from memory + disk |
| GET | /api/sessions/diff?a=X&b=Y |
Structured diff of two sessions |
Add ?replay_from=<session> to any bound endpoint. siemulator looks
up the first captured entry matching (method, path, query without meta-params) and returns the captured response verbatim —
preserved bytes, original status, original headers. Useful for
snapshot-pinning siemulator's own output so future code changes here
don't break your consumer's test suite.
curl -i "https://your-siemulator/qradar/api/siem/offenses?replay_from=xsoar-v1"
# Response headers include:
# X-Replay-Match: hit
# X-Replay-From: xsoar-v1
# X-Replay-Idx: 3# Capture v1 behaviour
curl -X POST -H "X-Admin-Key: $K" $URL/api/sessions/xsoar-v1/start
xsoar-playbook-run --target $URL # your CI step
curl -X POST -H "X-Admin-Key: $K" $URL/api/sessions/xsoar-v1/stop
# Upgrade XSOAR, capture v2
curl -X POST -H "X-Admin-Key: $K" $URL/api/sessions/xsoar-v2/start
xsoar-playbook-run --target $URL # same step, new XSOAR version
curl -X POST -H "X-Admin-Key: $K" $URL/api/sessions/xsoar-v2/stop
# Diff: did v2 send different requests than v1?
curl -fsS -H "X-Admin-Key: $K" \
"$URL/api/sessions/diff?a=xsoar-v1&b=xsoar-v2" | jq '.diffs'A non-empty diffs array means the upgrade changed your consumer's
request stream — investigate before promoting v2 to prod. Diff
surfaces method/path/query/status changes per-entry and body delta
(line + byte counts).
Sessions persist as JSONL to SIEMULATOR_SESSIONS_DIR (default
./siemulator-sessions/). Reload from disk on process restart. Mount
a persistent volume in your container if you want sessions to survive
redeploys.
Headers (Authorization, SEC, X-Admin-Key, Cookie) and
sensitive query params are recorded as *** markers, never as
values. Pinned regression confirms the literal secret strings never
echo through the captured entries.
| Variable | Default | Purpose |
|---|---|---|
SIEMULATOR_SESSIONS_ENABLED |
true |
Disable middleware + admin endpoints |
SIEMULATOR_SESSIONS_DIR |
./siemulator-sessions |
JSONL persistence directory |
Every request to /logscale/* and /qradar/* is captured into a
bounded in-memory ring AND emitted as a structured JSON line to stdout
(uvicorn forwards it to the platform log surface — DO Apps, Docker
logs, k8s, etc. pick it up for free).
Recorded per request: timestamp, method, path, redacted query
string, auth channel (bearer / sec / query / none), client IP
(X-Forwarded-For-aware), user-agent (truncated to 200 chars), status,
duration in ms, response bytes.
Never recorded: Bearer / SEC token values, ?token= query-param
value, X-Admin-Key, cookies, request body, response body. Pin
tests/test_access_log.py guarantees the
literal token strings never leak.
Admin endpoints (require SIEMULATOR_ADMIN_KEY set + sent on the
request; 403 otherwise):
| Method | Path | Purpose |
|---|---|---|
| GET | /api/access-log |
Recent entries, newest first. Filters: ?limit, ?since, ?path_prefix, ?status, ?auth |
| GET | /api/access-log/stats |
Aggregates: by_status, by_auth, top_paths, top_clients, top_user_agents, duration_ms (avg/p50/p95/p99/max), total_response_bytes |
| POST | /api/access-log/clear |
Wipe the in-memory ring (stdout log untouched) |
Example: "who consumed what" in the last hour:
curl -fsS -H "X-Admin-Key: $SIEMULATOR_ADMIN_KEY" \
"https://your-siemulator/api/access-log/stats" | jq '{
total,
top_clients,
top_user_agents,
by_auth,
by_status
}'Knobs (env vars):
| Variable | Default | Purpose |
|---|---|---|
SIEMULATOR_ACCESS_LOG_ENABLED |
true |
Disable everything — middleware + admin endpoints |
SIEMULATOR_ACCESS_LOG_SIZE |
5000 |
Ring capacity (~3 days at 60-s polling cadence) |
SIEMULATOR_ACCESS_LOG_SKIP_HEALTH |
false |
Skip noisy /status / /api/help (useful when DO Apps' 30-s probe would dominate the log) |
For platform-level retention beyond the in-memory ring, your platform log collector picks up the stdout JSON lines and routes them to your SIEM / log warehouse / Grafana Loki / wherever.
Every response carries X-Mock-Source: siemulator (HTTP header) and
"x-mock-source": "siemulator" (JSON field). Detection events embed it
per-row too. This is the contract test every consumer should pin —
it's how you guarantee in CI that you're not accidentally pointed at a
real SIEM. The siemulator string is stable across versions.
siemulator/
├── app.py # FastAPI factory — mounts UI + both API routers
├── config.py # All env var reads (one function per var; no caching)
├── logscale.py # /logscale/* — Humio REST shape
├── qradar.py # /qradar/* — QRadar offences + Ariel
├── templates.py # 6 detection templates + HOSTNAMES + USERS pool
├── scenarios.py # 38 multi-source attack narratives
├── ui.py # Single-page web UI at / (inlined HTML/CSS/JS)
├── access_log.py # Middleware + /api/access-log endpoints
├── fault_inject.py # Chaos engineering — middleware + /api/faults
├── sessions.py # Record / replay / diff — middleware + /api/sessions
├── splunk.py # /splunk/* — Splunk REST search API
└── __main__.py # `python -m siemulator` entrypoint
Both routers are built by a build_router() factory that reads the
prefix env var at construction time and returns a fresh APIRouter.
Env vars are re-read per-request, not cached at import — so you can
monkeypatch.setenv() mid-test and the next request reflects the
change.
State: each surface keeps two in-memory dicts — _query_jobs (LogScale
queryjobs, 256-entry FIFO cap) and _ariel_searches (QRadar Ariel
searches, 256-entry FIFO cap) — and the QRadar surface additionally
keeps a 100-entry request-capture deque and a set of served scenario
IDs. Everything dies with the process; no persistence.
IS:
- A test fixture for SOAR ingestion, detection-engineering pipelines, and agent-chain integration tests.
- A way to pin SIEM-response shapes in CI so vendor-shape regressions fail fast.
- A reproducible source of multi-source attack narratives for end-to-end SOC tooling tests.
- Safe to deploy as a long-running internal service (the debug endpoints are admin-key-gated and disabled by default).
IS NOT:
- A real SIEM. There's no event ingest, no search engine, no correlation rules, no storage. Search queries are accepted and ignored; the alert pool is fixed per-process.
- Production-ready as a public endpoint without putting it behind your own auth layer. The default tokens are public sentinels — change them.
- A canonical reference for vendor APIs. Field coverage is "everything a typical consumer reads" plus enough adjacent fields to look real; if you're building a real LogScale or QRadar client, read the upstream vendor docs.
- A red-team training environment. The synthetic data is shape- realistic, not behaviour-realistic — running detections against siemulator output will not validate that your detections work against real attacks.
| Resource | Cap | Behaviour at cap |
|---|---|---|
LogScale ?limit=N |
1 ≤ N ≤ 50 | Clamped silently |
QRadar Range: items=0-N |
1 ≤ N ≤ 50 | Clamped silently |
| In-memory LogScale queryjobs | 256 | Oldest evicted FIFO |
| In-memory Ariel searches | 256 | Oldest evicted FIFO |
Request capture (_debug/recent) |
100 most recent | Oldest dropped |
Served-scenarios set (?scenarios=all) |
22 (the pool size) | After 22, returns [] until reset |
Single-process throughput is whatever uvicorn + your CPU give you — typically a few thousand requests/second per worker on a modest host; each response is generated fresh (template choice + ID + timestamp), so there's no caching benefit from repeated requests.
Run multiple workers for higher throughput:
uvicorn siemulator.app:create_app --factory --workers 4 --port 8080Note that the in-memory state (queryjobs, served-scenarios, debug-ring) is per-worker — multi-worker deployments will see the served-scenarios one-shot dedup operate independently in each worker. For a true single- state deployment, run one worker.
Contributions especially welcome on the highest-leverage gaps:
- New SIEM vendor shapes. Splunk REST, Microsoft Sentinel Log
Analytics, Elastic Security, Google Chronicle. Each lives as a fresh
module under
siemulator/reusing the existing template + scenario pool. SeeCONTRIBUTING.md. - More detection templates. Realistic MITRE-mapped templates, especially for under-represented tactics (Defence Evasion, Discovery, Collection, Impact).
- More attack scenarios. Multi-source narratives covering ransomware deployment chains, cloud-native attack paths (IMDS abuse → cross-account role assumption → S3 exfil), and Mac / Linux endpoint chains. Open-source threat-intel reports are the best source.
- Streaming surface. A
/logscale/api/v1/repositories/{repo}/streamSSE/WebSocket endpoint that pushes events at a configurable rate, so consumers testing push-style ingestion (rather than poll) can exercise their code paths. - Deterministic mode. A
SIEMULATOR_SEEDenv var that makes template choice + IDs + timestamps reproducible across runs, so snapshot tests can pin exact responses instead of shape-only contracts.
A ready-to-apply spec ships at .do/app.yaml. It uses
GitHub auto-deploy from main, builds via the repo's Dockerfile, and
sizes for the stateless mock (1 instance, basic-xxs — ~$5/mo).
First-time create:
doctl apps create --spec .do/app.yaml
# → returns an app ID; note it down.Set the three SECRET env vars (tokens + admin key) via either
the dashboard (Settings → web → Environment Variables) or by editing a
local copy of .do/app.yaml to inline the values and running
doctl apps update <app-id> --spec <local-copy>. DO encrypts the values
and stores them as EV[1:...] ciphertext — don't commit those.
Custom domain (optional): point siemulator.example.com at the
default <app>.ondigitalocean.app ingress via CNAME, then add a
domains: block to the spec and doctl apps update. DO provisions a
Let's Encrypt cert automatically.
Why instance_count: 1. Each instance keeps its own in-memory
served-scenarios set for ?scenarios=all one-shot dedup. Running
multiple instances would break that contract — a round-robin poller
would see the same scenario re-served on every other hit. If you need
horizontal scale and don't use one-shot dedup, bump the count freely
(every other endpoint is request-local and safe to scale).
Health checks poll /logscale/api/v1/status (no-auth) every 30 s
with a 10-second initial delay. Failures auto-roll back to the last
healthy deployment.
Updates flow. Every push to main triggers a fresh build + zero-
downtime deploy. CI must be green; if pytest or ruff fail, the
build never reaches the platform. Multi-arch Docker images keep getting
published to ghcr.io/sirp-labs/siemulator:<sha> in parallel — pin a
specific tag in the spec if you want immutable deploys instead of
"latest-on-main."
git clone https://github.com/sirp-labs/siemulator.git
cd siemulator
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest # 35 contract tests
ruff check . # lintCI runs ruff + pytest on Python 3.10 / 3.11 / 3.12 plus a multi-arch
GHCR build with a container smoke test. See
.github/workflows/ for the exact pipeline.
See CONTRIBUTING.md for the contribution flow,
including adding new SIEM shapes, templates, and scenarios.
Q: Can I use this for load testing my SOAR?
A: It can produce as much traffic as your test harness can consume,
but every response is a fresh roll of dice. If you're load-testing
deduplication or correlation, prefer ?scenarios=batch or ?scenarios=replay
so the offence IDs are stable. For shape-only soak testing, the
default mode is fine.
Q: Does it support Splunk / Sentinel / Elastic?
A: Not yet. The architecture supports adding them — see the
Roadmap. Each new vendor shape is a fresh module under
siemulator/ that reuses the existing template + scenario pool. PRs
welcome.
Q: Are the scenarios real attacks? A: They're realistic narratives modeled on published threat-intel reports and incident postmortems, but the IOCs, hostnames, usernames, file hashes, and timestamps are synthetic. Don't match them against real-world threat-intel feeds.
Q: How do I add my own templates / scenarios?
A: For templates, append to ALERT_TEMPLATES in
siemulator/templates.py — the schema is documented in that file's
module docstring. For scenarios, append a (offence_id, scenario_label, raw_alert) tuple to the registry at the bottom of siemulator/scenarios.py
and import a JSON payload via the _j() helper. Both surfaces pick up
the additions automatically on next process start.
Q: Is it safe to expose publicly?
A: The data is synthetic, so there's no data-leak risk. The default
tokens (logscale-dev-token, qradar-dev-token) are PUBLIC sentinels —
change them before exposing publicly so casual scanners don't get
free use of your service. The debug endpoints are disabled by default
and require SIEMULATOR_ADMIN_KEY to be set + sent on every request,
so they're safe to leave wired in.
Q: Why one combined service instead of two repos for LogScale and QRadar? A: Both surfaces share the same template + scenario pool. Splitting them would force every new template / scenario to be added in two places. Keeping them together lets one detection-template addition serve every vendor surface for free.
Q: How do I pin a regression test that catches "my consumer breaks
when siemulator changes shape"?
A: Fork the relevant tests from
tests/test_logscale.py /
tests/test_qradar.py into your own test
suite, pointing them at your consumer's response-handling code instead
of at siemulator. Those tests are the contract — if your consumer
makes them pass, siemulator changes that break the contract will
break your tests too.
MIT — © 2026 SIRP Labs.