From 61938c499ad25547df2b6ef7496a519e5b5fd9b6 Mon Sep 17 00:00:00 2001 From: Jiahui Wu Date: Sat, 6 Jun 2026 18:57:44 +0800 Subject: [PATCH] docs: add ephemeral cloud forensic evidence gates --- .../forensics-checklist/SKILL.md | 45 ++++++++ .../ephemeral-cloud-workload-edge-cases.md | 108 ++++++++++++++++++ 2 files changed, 153 insertions(+) create mode 100644 skills/incident-response/forensics-checklist/tests/ephemeral-cloud-workload-edge-cases.md diff --git a/skills/incident-response/forensics-checklist/SKILL.md b/skills/incident-response/forensics-checklist/SKILL.md index f8556322..2b6d4444 100644 --- a/skills/incident-response/forensics-checklist/SKILL.md +++ b/skills/incident-response/forensics-checklist/SKILL.md @@ -339,6 +339,46 @@ gcloud logging read 'timestamp>="YYYY-MM-DDT00:00:00Z" AND timestamp<="YYYY-MM-D - Multi-region deployments require evidence collection across all regions - Serverless environments (Lambda, Cloud Functions) produce only invocation logs -- there is no disk to image +### 3.5 Ephemeral Cloud Workload Evidence + +For Kubernetes, managed containers, and serverless functions, a cloud disk snapshot or provider activity log is not sufficient by itself. Preserve immutable workload identity, runtime configuration, and short-retention logs before pods are evicted, tasks are replaced, aliases move, or log retention expires. + +**Kubernetes evidence to preserve:** +- Pod YAML/spec, namespace, node name, service account, owner references, labels, annotations, priority class, tolerations, and scheduler events +- Container statuses, restart counts, current and previous container logs, init containers, sidecars, command/entrypoint, args, and image digests +- Volume mounts, PVC references, `emptyDir` usage, projected config/secret references (record references and hashes where possible, not plaintext secrets) +- Network policies, ingress/egress controls, service bindings, admission controller decisions, and Kubernetes audit events for the incident window +- Deployment, ReplicaSet, StatefulSet, Job, or CronJob manifests that created the affected pod + +**Managed container service evidence to preserve:** +- ECS task definition revision, Fargate task metadata, Cloud Run revision, AKS/EKS/GKE workload manifest, or equivalent immutable service revision +- Image digest, registry repository metadata, signature or provenance status, pull history, and deployment timestamp +- Runtime command, environment and secret references, execution role or service account, network configuration, security group/firewall policy, and volume mounts +- Centralized logs, sidecar logs, service mesh telemetry, container runtime events, and cloud audit events for create/update/delete actions + +**Serverless evidence to preserve:** +- Function version or immutable revision, alias mapping at incident time, deployment package or code hash, runtime, layers, and last modified timestamp +- Environment variable and secret references (redacted), execution role/service account, IAM policy snapshot, VPC/network configuration, and concurrency settings +- Trigger or event source mapping, queue/topic/subscription configuration, invocation logs, platform audit logs, and deployment history +- Cold start/runtime errors, timeout and memory settings, dead-letter queue records, retry configuration, and failed invocation payload references where legally authorized + +**Ephemeral workload evidence gates:** + +| Evidence Item | What to Preserve | Common Gap | +|---|---|---| +| Kubernetes pod identity | Pod YAML, owner refs, namespace, node, service account, labels/annotations | Pod is evicted and only the backing node snapshot remains | +| Container runtime state | Current/previous logs, container statuses, restart counts, command/args, image digest | Report captures `latest` tag but no immutable digest or runtime config | +| Workload configuration | Deployment/task/function revision, env/secret refs, volume mounts, network policy | Mutable service configuration changed after containment | +| Serverless identity | Function version/revision, alias mapping, code hash, layers, execution role | Report names only `prod` alias or `$LATEST` | +| Provider audit trail | CloudTrail, Azure Activity Log, GCP Audit Logs, Kubernetes audit events | Audit logs are not exported before retention or region/account gaps | +| Log retention | Invocation logs, pod logs, previous container logs, service mesh telemetry | Logs rotate before acquisition or were never enabled | + +**Findings conditions:** +- Classify as P1 when a container, Kubernetes, or serverless incident cannot be tied to an immutable workload version, image digest, or function revision. +- Classify as P1 when volatile workload logs were available but not preserved before eviction, task replacement, alias movement, or log retention expiry. +- Classify as P2 when evidence uses mutable identifiers such as `latest`, `$LATEST`, or an alias without an incident-time mapping to immutable artifacts. +- Classify as P2 when cloud evidence is collected from only one region, account, subscription, project, namespace, or cluster despite indicators spanning more than one scope. + --- ## 4. Findings Classification @@ -401,6 +441,11 @@ the order of collection, and any evidence that could not be obtained.] | Cloud Provider | Resource | Evidence Type | Collected | Notes | |---|---|---|---|---| | [AWS/Azure/GCP] | [Resource ID] | [Snapshot/Logs/Config] | [Yes/No] | [Notes] | + +### Ephemeral Workload Evidence (if applicable) +| Platform | Workload | Immutable Identifier | Logs Preserved | Runtime Config Preserved | Gaps | +|---|---|---|---|---|---| +| [Kubernetes/ECS/Fargate/Cloud Run/Lambda/Cloud Functions] | [Name/ID] | [Image digest/function version/revision] | [Yes/No] | [Yes/No] | [Notes] | ``` --- diff --git a/skills/incident-response/forensics-checklist/tests/ephemeral-cloud-workload-edge-cases.md b/skills/incident-response/forensics-checklist/tests/ephemeral-cloud-workload-edge-cases.md new file mode 100644 index 00000000..b2d92fc1 --- /dev/null +++ b/skills/incident-response/forensics-checklist/tests/ephemeral-cloud-workload-edge-cases.md @@ -0,0 +1,108 @@ +# Ephemeral Cloud Workload Edge Cases + +Use these cases to validate that `forensics-checklist` does not treat cloud snapshots or activity logs as complete forensic evidence for short-lived workloads. + +## Case 1: Kubernetes pod evicted before workload evidence capture + +**Input** + +```yaml +incident: suspicious outbound traffic from checkout pod +cloud_evidence: + node_snapshot: captured + cloud_audit_logs: exported +kubernetes: + pod_name: checkout-7c9f5d6b8c-f2x4q + pod_status: evicted + pod_yaml: missing + pod_events: missing + current_logs: missing + previous_logs: missing + image_digest: missing + service_account: unknown + owner_refs: unknown +``` + +**Expected result** + +Fail the evidence completeness check and classify as P1. The node snapshot and provider logs do not preserve pod identity, container runtime state, previous logs, image digest, service account, or controller ownership. + +## Case 2: Serverless alias captured without immutable version + +**Input** + +```yaml +incident: suspicious S3 writes from Lambda function +cloud_evidence: + cloudtrail: exported +serverless: + provider: aws + function_name: invoice-processor + alias: prod + function_version: missing + code_sha256: missing + runtime_layers: missing + environment_refs: missing + execution_role_policy_snapshot: missing + event_source_mapping: missing + invocation_logs: partial +``` + +**Expected result** + +Classify as P1 if the incident cannot be tied to an immutable function version or package hash. A mutable alias is not enough to prove what code and configuration executed during the incident window. + +## Case 3: Container image tag recorded without digest + +**Input** + +```yaml +incident: suspicious process in managed container task +managed_container: + platform: fargate + service: public-api + task_definition_revision: captured + image: registry.example.com/api:latest + image_digest: missing + registry_metadata: missing + command: missing + environment_refs: partial + runtime_logs: captured +``` + +**Expected result** + +Classify as P2 or higher depending on impact. The report must identify that `latest` is mutable and request image digest, registry metadata, runtime command, and complete environment or secret references. + +## Case 4: Complete ephemeral workload evidence record + +**Input** + +```yaml +incident: confirmed credential misuse from containerized worker +kubernetes: + pod_yaml: captured + pod_events: captured + current_logs: captured + previous_logs: captured + image_digest: sha256:111122223333444455556666777788889999aaaabbbbccccddddeeeeffff0000 + service_account: worker-prod + owner_refs: deployment/worker + network_policy: captured + audit_events: captured +serverless: + function_version: "42" + alias_mapping_at_incident: prod -> 42 + code_sha256: abc123 + layers: captured + execution_role_policy_snapshot: captured + event_source_mapping: captured + invocation_logs: captured +cloud_scope: + accounts: all affected accounts checked + regions: all affected regions checked +``` + +**Expected result** + +Pass the ephemeral workload evidence gate. The report should still list any legal, retention, or access limitations, but it has immutable workload identity, logs, runtime configuration, and provider audit context.