-
Notifications
You must be signed in to change notification settings - Fork 2
docs(rules): add per-rule README.md for all 26 rules #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| # R0001 — Unexpected Process Launched | ||
|
|
||
| | Field | Value | | ||
| |-------|-------| | ||
| | Severity | Low | | ||
| | MITRE Tactic | Execution (TA0002) | | ||
| | MITRE Technique | Command and Scripting Interpreter (T1059) | | ||
| | Platforms | Host/Kubernetes/ECS | | ||
| | Requires Application Profile | Yes | | ||
|
|
||
| ## Description | ||
|
|
||
| Detects any process executed inside a host or container that was not observed in that host/container's learned application profile. The rule fires on every `exec` event whose executable is not part of the container's baseline. It produces a strong signal in steady-state workloads where the set of legitimate processes is small and predictable, and is one of the broadest catch-alls for execution-based attacks such as command injection, web-shell drops, post-exploitation tooling, and lateral-movement payloads. | ||
|
|
||
| ## Attack Technique | ||
|
|
||
| Mapped to **MITRE T1059 — Command and Scripting Interpreter** under tactic **TA0002 — Execution**. Adversaries who land code in a container almost always need to execute *some* unexpected binary or interpreter to make progress: a downloaded shell script, a reverse-shell binary, a privilege-escalation tool, a network utility for reconnaissance, or a cryptominer. Because the baseline is built from real execution traces of the running workload, anything outside that set surfaces here — including techniques the rule library has no signature for. | ||
|
|
||
| ## How It Works | ||
|
|
||
| The node agent builds a per-container **application profile** during a learning window, recording every `exec` event observed. After the profile is finalized, every subsequent `exec` event is evaluated against the profile. | ||
|
|
||
| Simplified CEL: | ||
|
|
||
| ``` | ||
| !ap.was_executed(containerId, parse.get_exec_path(args, comm)) | ||
| && (exepath == "" || !ap.was_executed(containerId, exepath)) | ||
| ``` | ||
|
|
||
| Two paths are checked because `argv[0]` and the kernel-resolved `exepath` can disagree: | ||
|
|
||
| - **Relative `argv[0]`** (e.g. the process invoked itself as `./python`): the profile stores the resolved absolute path, so `argv[0]` would miss. `exepath` (`/usr/bin/python3`) catches it. | ||
| - **Empty `argv[0]`** (e.g. via `fexecve()` with `AT_EMPTY_PATH`, common from `sshd → unix_chkpwd`): again, the profile stores the resolved path, and the fallback to `exepath` matches. | ||
|
|
||
| The rule fires when neither lookup succeeds. | ||
|
|
||
| ## Investigation Steps | ||
|
|
||
| 1. **Identify the process and its parent.** Look at the alert's `event.comm`, `event.exepath`, `event.pid`, and the parent process (`pcomm`/`ppid`). A new shell spawned by a web server (e.g. `bash` from `nginx`) is much more concerning than a known internal tool fired by a known parent. | ||
| 2. **Confirm the binary was not legitimately added.** Cross-reference the workload's recent deployments — a new image version may legitimately introduce a process the profile never saw. Check container image digests and recent CI/CD activity. | ||
| 3. **Inspect the executable on disk.** If accessible, hash the binary and look it up against threat-intel sources. Check its path (`/tmp`, `/dev/shm`, container working dir) — non-standard locations strengthen the signal. | ||
| 4. **Pull surrounding events.** Look for other alerts on the same container in the same time window — file writes, network connections, capability changes, or other R0001 hits. Adversary tooling rarely fires only one rule. | ||
| 5. **Decide: legitimate change or attack.** If legitimate, retrain the profile (see Remediation). If suspicious, isolate the container and begin incident response. | ||
|
|
||
| ## Remediation | ||
|
|
||
| **If the process is malicious:** isolate the container (network policy or seccomp profile), preserve disk/memory for forensics, rotate any credentials accessible from the container (see "blast radius"), and begin standard incident response. The parent process and ingress vector (which previous event allowed this `exec`?) usually reveal the entry point. | ||
|
|
||
| **If the process is legitimate but the profile is stale:** | ||
|
|
||
| - Suppress this specific binary for the workload via a per-rule allowlist policy. | ||
|
|
||
| **Do not blanket-disable R0001 on a workload** — it provides the deepest catch-all coverage in the rule library, and disabling it removes detection for a wide class of execution-based attacks. That said, some workloads are by definition not fit for anomaly detection: software orchestrators, CI/CD tools, runners — anything where the process-invocation cycle is detached from the container or host run cycle. | ||
|
|
||
| ## False Positives | ||
|
|
||
| - **Periodic jobs that did not run during learning.** Weekly cron tasks, monthly batch jobs, ad-hoc maintenance scripts, and infrequent administrative tools can be missed by a short learning window. Lengthen the learning window or pre-warm the profile with the expected workload. | ||
| - **Runners or execution orchestrators.** Some software, by definition, has a different run cycle than the monitored host or container and may trigger false positives. One example is Apache Spark, where each job can ship its own binaries. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # R0002 — Files Access Anomalies in container | ||
|
|
||
| | Field | Value | | ||
| |-------|-------| | ||
| | Severity | Low | | ||
| | MITRE Tactic | Collection (TA0009) | | ||
| | MITRE Technique | Data from Local System (T1005) | | ||
| | Platforms | Host, Kubernetes, ECS | | ||
| | Requires Application Profile | Yes | | ||
|
|
||
| ## Description | ||
|
|
||
| Detects reads of sensitive system locations inside a host or container that were not observed during the learning window. The rule watches a curated set of high-value directories (`/etc/`, `/var/log/`, `/var/run/`, `/run/`, `/var/spool/cron/`, `/var/www/`, `/var/lib/`, `/opt/`, `/usr/local/`, `/app/`, plus the marker files `/.dockerenv` and `/proc/self/environ`) and fires whenever a process opens a path under one of them that the application profile did not record. The rule is disabled by default because steady-state false-positive rate depends heavily on the workload's file-access patterns. | ||
|
|
||
| ## Attack Technique | ||
|
|
||
| Mapped to **MITRE T1005 — Data from Local System** under **TA0009 — Collection**. Once an adversary has code execution they typically read configuration, application code, secrets, and logs to understand the environment and to harvest material. This rule surfaces that reconnaissance against the workload's own observed file-access shape, catching reads of files the workload itself never needs. | ||
|
|
||
| ## How It Works | ||
|
|
||
| The node agent records every `open` event in the watched prefixes during the learning window. After the profile is finalized, every subsequent `open` in those prefixes is checked against the profile, with three explicit suppressions baked into the rule body: | ||
|
|
||
| ``` | ||
| event.path is under one of the watched prefixes | ||
| AND event.path is NOT under /run/secrets/kubernetes.io/serviceaccount | ||
| AND event.path is NOT under /var/run/secrets/kubernetes.io/serviceaccount | ||
| AND event.path is NOT under /tmp | ||
| AND !ap.was_path_opened(containerId, event.path) | ||
| ``` | ||
|
Comment on lines
+23
to
+29
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Specify a fenced-code language for MD040 compliance Line 23 uses a bare code fence. Add a language tag ( Suggested fix-```
+```text
event.path is under one of the watched prefixes
AND event.path is NOT under /run/secrets/kubernetes.io/serviceaccount
AND event.path is NOT under /var/run/secrets/kubernetes.io/serviceaccount
AND event.path is NOT under /tmp
AND !ap.was_path_opened(containerId, event.path)🧰 Tools🪛 markdownlint-cli2 (0.22.1)[warning] 23-23: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🤖 Prompt for AI Agents |
||
|
|
||
| The `/tmp` and Kubernetes service-account paths are excluded because they have their own dedicated rules; including them here would produce duplicate alerts on the same activity. | ||
|
|
||
| ## Investigation Steps | ||
|
|
||
| 1. **Identify the process and the file.** Look at `event.comm`, `event.pid`, and `event.path`. A new shell or a network-facing process reading `/etc/passwd`, `/etc/nginx/`, or `/var/lib/postgresql` is much more concerning than a known internal tool opening a known path. | ||
| 2. **Confirm the path is sensitive in context.** `/etc/` reads can be benign (libc reading `/etc/nsswitch.conf`) or critical (`/etc/shadow`, `/etc/cron.d/*`). Use the path semantics, not just the prefix, to triage. | ||
| 3. **Look at the parent process and user.** A `nobody`-uid process spawned by a web server reading `/var/www/` configuration is essentially never legitimate. | ||
| 4. **Pull surrounding events** for the same container in the same window: exec events, network connections, or other R0002 hits in adjacent directories indicate active enumeration. | ||
| 5. **Decide: legitimate change or attack.** If legitimate, suppress the specific file or workload via a per-rule allowlist. If suspicious, isolate and begin incident response. | ||
|
|
||
| ## Remediation | ||
|
|
||
| **If the access is malicious:** isolate the container (network policy or seccomp profile), preserve disk and memory for forensics, rotate any credentials reachable from the file content (see "blast radius"), and begin standard incident response. The opening process's ancestry usually identifies the ingress vector. | ||
|
|
||
| **If the access is legitimate:** suppress the specific path or process for the workload via a per-rule allowlist policy. Do not retrain the profile as a remediation step. | ||
|
|
||
| Some workloads are by definition not fit for application-profile anomaly detection: software orchestrators, CI/CD tools, runners — anything where the process and file-access cycle is detached from the container or host run cycle. R0002 is best disabled on such workloads rather than allowlisted item-by-item. | ||
|
|
||
| ## False Positives | ||
|
|
||
| - **Periodic jobs that did not run during learning.** Weekly cron tasks, monthly batch jobs, ad-hoc admin scripts, and infrequent maintenance tools may open files under the watched prefixes for the first time after the profile is finalized. Lengthen the learning window or pre-warm the profile. | ||
| - **Self-updating runtimes and package managers.** Some interpreters and language toolchains touch `/etc/` or `/usr/local/` files on first invocation; these may not appear in the recorded baseline. | ||
| - **Build orchestrators and runners.** Workloads like Apache Spark or CI runners that ship per-job binaries and config will read configuration files the profile never saw. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # R0003 — Syscalls Anomalies in container | ||
|
|
||
| | Field | Value | | ||
| |-------|-------| | ||
| | Severity | Low | | ||
| | MITRE Tactic | Execution (TA0002) | | ||
| | MITRE Technique | Command and Scripting Interpreter (T1059) | | ||
| | Platforms | Host, Kubernetes, ECS | | ||
| | Requires Application Profile | Yes | | ||
|
|
||
| ## Description | ||
|
|
||
| Detects any system call invoked inside a host or container that was not observed during the learning window. The rule fires the first time a given syscall appears that is not part of the application profile's recorded syscall set. It is a very broad anomaly signal: the syscall surface a real workload uses is typically narrow and predictable, so deviations strongly indicate code outside the workload's normal behavior. | ||
|
|
||
| ## Attack Technique | ||
|
|
||
| Mapped to **MITRE T1059 — Command and Scripting Interpreter** under **TA0002 — Execution**. Adversaries who land code in a container almost always reach for syscalls the workload itself does not need: namespace manipulation for escape, raw socket operations for tunneling, ptrace for injection, mount syscalls for filesystem games, or kernel-instrumentation primitives. Because the baseline is built from actual workload behavior, anything the workload never demonstrated surfaces here. | ||
|
|
||
| ## How It Works | ||
|
|
||
| The node agent records every distinct syscall observed during the learning window. After the profile is finalized, every syscall event is checked against the recorded set: | ||
|
|
||
| ``` | ||
| !ap.was_syscall_used(containerId, syscallName) | ||
| ``` | ||
|
Comment on lines
+23
to
+25
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a language to the fenced block to satisfy markdown lint Line 23 opens a fenced block without a language, which triggers MD040. Please annotate it (e.g., Suggested fix-```
+```text
!ap.was_syscall_used(containerId, syscallName)🧰 Tools🪛 markdownlint-cli2 (0.22.1)[warning] 23-23: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🤖 Prompt for AI Agents |
||
|
|
||
| The rule fires on any miss. Because most workloads use only a small subset of the ~350 available syscalls, the suppression catches steady-state activity while novel syscalls light up. | ||
|
|
||
| ## Investigation Steps | ||
|
|
||
| 1. **Identify the syscall and the process.** `event.syscallName`, `event.comm`, and `event.pid` are the starting point. Some syscalls (`ptrace`, `mount`, `unshare`, `keyctl`, `bpf`) are diagnostic on their own; others (`fchmod`, `fchown`) need context. | ||
| 2. **Map the syscall to a technique.** Many security-relevant syscalls have a specific abuse pattern: `ptrace` for process injection, `unshare`/`setns` for container escape, `bpf` for kernel rootkits, `mount` for filesystem masquerade. The technique narrows the investigation immediately. | ||
| 3. **Look at the process ancestry.** The parent process and its parent often reveal whether the syscall is benign (a new dependency the workload now uses) or hostile (a planted binary). | ||
| 4. **Pull surrounding events.** Other R0003 hits, exec events, or file/network anomalies on the same container in the same window indicate active tradecraft rather than a quiet workload change. | ||
| 5. **Decide: legitimate change or attack.** If legitimate, suppress the specific syscall via a per-rule allowlist. If suspicious, isolate and begin incident response. | ||
|
|
||
| ## Remediation | ||
|
|
||
| **If the syscall is malicious:** isolate the container (network policy or seccomp profile), preserve disk and memory for forensics, rotate credentials reachable from the workload (see "blast radius"), and begin incident response. The use of an unusual syscall usually indicates the attacker has already executed payload code, so investigate that ingress. | ||
|
|
||
| **If the syscall is legitimate:** allowlist it via a per-rule policy for the affected workload. Prefer a tight allowlist (one syscall, one workload) over broad exceptions. Do not retrain the profile as a remediation step. | ||
|
|
||
| Some workloads are by definition not fit for syscall anomaly detection: software orchestrators, CI/CD tools, runners — anything where the workload runs arbitrary user-supplied code and the syscall surface is unbounded by design. | ||
|
|
||
| ## False Positives | ||
|
|
||
| - **Periodic operations that did not run during learning.** A nightly backup that calls `fdatasync` or a weekly maintenance task that calls `setrlimit` may not appear in the baseline if those syscalls were not exercised during the learning window. | ||
| - **New library or runtime versions.** A dependency upgrade can introduce syscalls (e.g. `io_uring_*` after a libc bump) that the baseline does not know about. | ||
| - **Runners and execution orchestrators.** Workloads like Apache Spark, build runners, or function-as-a-service containers execute user code whose syscall set cannot be predicted from learning. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # R0004 — Linux Capabilities Anomalies in container | ||
|
|
||
| | Field | Value | | ||
| |-------|-------| | ||
| | Severity | Low | | ||
| | MITRE Tactic | Execution (TA0002) | | ||
| | MITRE Technique | Command and Scripting Interpreter (T1059) | | ||
| | Platforms | Host, Kubernetes, ECS | | ||
| | Requires Application Profile | Yes | | ||
|
|
||
| ## Description | ||
|
|
||
| Detects Linux capabilities exercised inside a host or container that were not observed during the learning window. Capabilities are the kernel's way of breaking root privileges into smaller, individually-grantable pieces (`CAP_NET_RAW`, `CAP_SYS_ADMIN`, `CAP_DAC_OVERRIDE`, and ~40 others). A workload's real capability set is usually narrow; an attacker exercising a capability the workload never needed indicates either successful exploitation or a misconfigured container with too many capabilities available to abuse. | ||
|
|
||
| ## Attack Technique | ||
|
|
||
| Mapped to **MITRE T1059 — Command and Scripting Interpreter** under **TA0002 — Execution**. Adversaries who land code in a container often reach for capabilities the workload itself doesn't need: `CAP_NET_RAW` for crafted-packet operations, `CAP_SYS_PTRACE` for process injection, `CAP_SYS_MODULE` for kernel modules, `CAP_DAC_READ_SEARCH` for bypassing filesystem permissions. Detecting on the deviation from baseline catches the attacker even when the rule library has no specific signature for the technique. | ||
|
|
||
| ## How It Works | ||
|
|
||
| The node agent records every distinct capability exercised during the learning window. After the profile is finalized, every capability check is matched against the recorded set: | ||
|
|
||
| ``` | ||
| !ap.was_capability_used(containerId, capName) | ||
| ``` | ||
|
Comment on lines
+23
to
+25
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a language identifier to the fenced code block. Line 23 uses a fenced block without a language, which triggers Suggested fix-```
+```cel
!ap.was_capability_used(containerId, capName)🧰 Tools🪛 markdownlint-cli2 (0.22.1)[warning] 23-23: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🤖 Prompt for AI Agents |
||
|
|
||
| The rule fires the first time a capability is exercised that the profile did not see during learning. | ||
|
|
||
| ## Investigation Steps | ||
|
|
||
| 1. **Identify the capability and the process.** `event.capName`, `event.syscallName`, `event.comm`, and `event.pid` together describe what was attempted. Some capabilities (`CAP_SYS_ADMIN`, `CAP_SYS_PTRACE`, `CAP_SYS_MODULE`) are essentially diagnostic of post-exploitation activity in most workloads. | ||
| 2. **Check whether the container was even granted the capability.** If the pod spec or container runtime denied the capability, the syscall would have failed and the event indicates an attempted, not successful, use. Either way, the *attempt* is a strong signal. | ||
| 3. **Map capability to attack pattern.** `CAP_NET_BIND_SERVICE` on a non-privileged port is benign; `CAP_NET_RAW` from a workload that never raw-sockets points at scanning or packet crafting. | ||
| 4. **Pull surrounding events.** Capability use rarely happens in isolation: an exec event right before, a syscall anomaly right after, or a network anomaly nearby usually points at the broader attack. | ||
| 5. **Decide: legitimate change or attack.** If legitimate, suppress the specific capability for the workload. If suspicious, isolate and begin incident response. | ||
|
|
||
| ## Remediation | ||
|
|
||
| **If the capability use is malicious:** isolate the container (network policy or seccomp profile), preserve disk and memory for forensics, rotate credentials reachable from the workload (see "blast radius"), and begin incident response. As a hardening follow-up, drop the capability from the container spec so a future intruder cannot exercise it at all. | ||
|
|
||
| **If the capability use is legitimate:** allowlist it via a per-rule policy. The better long-term fix is usually to reduce the container's granted capabilities to the actual minimum, not to allow the workload to use anything it pleases. | ||
|
|
||
| Some workloads (orchestrators, CI/CD runners, debugging tools) by design exercise a broad capability set and are unsuited for this anomaly detection. | ||
|
|
||
| ## False Positives | ||
|
|
||
| - **Periodic privileged operations.** A monthly admin task that needs `CAP_SYS_TIME` or `CAP_SETPCAP` may not have run during the learning window. | ||
| - **Container runtime helpers.** A few sidecar and init-container patterns briefly exercise capabilities not seen in steady state. These are best allowlisted by process name. | ||
| - **Newly-deployed features.** A code path added after profile finalization that requires a capability the workload previously did not use. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify language for the CEL block
Line 25 starts a fenced block without a language, so markdownlint flags MD040.
Suggested fix
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
@pkg/rules/r0001-unexpected-process-launched/README.mdaround lines 25 - 28,The fenced code block containing the CEL expression (the lines using
ap.was_executed, parse.get_exec_path, args, comm, and exepath) lacks a language
tag; update the opening triple-backtick to include the language identifier "cel"
(i.e., change
tocel) so the block is recognized as CEL and resolves themarkdownlint MD040 warning.