Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions skills/incident-response/forensics-checklist/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,46 @@ gcloud logging read 'timestamp>="YYYY-MM-DDT00:00:00Z" AND timestamp<="YYYY-MM-D
- Multi-region deployments require evidence collection across all regions
- Serverless environments (Lambda, Cloud Functions) produce only invocation logs -- there is no disk to image

### 3.5 Ephemeral Cloud Workload Evidence

For Kubernetes, managed containers, and serverless functions, a cloud disk snapshot or provider activity log is not sufficient by itself. Preserve immutable workload identity, runtime configuration, and short-retention logs before pods are evicted, tasks are replaced, aliases move, or log retention expires.

**Kubernetes evidence to preserve:**
- Pod YAML/spec, namespace, node name, service account, owner references, labels, annotations, priority class, tolerations, and scheduler events
- Container statuses, restart counts, current and previous container logs, init containers, sidecars, command/entrypoint, args, and image digests
- Volume mounts, PVC references, `emptyDir` usage, projected config/secret references (record references and hashes where possible, not plaintext secrets)
- Network policies, ingress/egress controls, service bindings, admission controller decisions, and Kubernetes audit events for the incident window
- Deployment, ReplicaSet, StatefulSet, Job, or CronJob manifests that created the affected pod

**Managed container service evidence to preserve:**
- ECS task definition revision, Fargate task metadata, Cloud Run revision, AKS/EKS/GKE workload manifest, or equivalent immutable service revision
- Image digest, registry repository metadata, signature or provenance status, pull history, and deployment timestamp
- Runtime command, environment and secret references, execution role or service account, network configuration, security group/firewall policy, and volume mounts
- Centralized logs, sidecar logs, service mesh telemetry, container runtime events, and cloud audit events for create/update/delete actions

**Serverless evidence to preserve:**
- Function version or immutable revision, alias mapping at incident time, deployment package or code hash, runtime, layers, and last modified timestamp
- Environment variable and secret references (redacted), execution role/service account, IAM policy snapshot, VPC/network configuration, and concurrency settings
- Trigger or event source mapping, queue/topic/subscription configuration, invocation logs, platform audit logs, and deployment history
- Cold start/runtime errors, timeout and memory settings, dead-letter queue records, retry configuration, and failed invocation payload references where legally authorized

**Ephemeral workload evidence gates:**

| Evidence Item | What to Preserve | Common Gap |
|---|---|---|
| Kubernetes pod identity | Pod YAML, owner refs, namespace, node, service account, labels/annotations | Pod is evicted and only the backing node snapshot remains |
| Container runtime state | Current/previous logs, container statuses, restart counts, command/args, image digest | Report captures `latest` tag but no immutable digest or runtime config |
| Workload configuration | Deployment/task/function revision, env/secret refs, volume mounts, network policy | Mutable service configuration changed after containment |
| Serverless identity | Function version/revision, alias mapping, code hash, layers, execution role | Report names only `prod` alias or `$LATEST` |
| Provider audit trail | CloudTrail, Azure Activity Log, GCP Audit Logs, Kubernetes audit events | Audit logs are not exported before retention or region/account gaps |
| Log retention | Invocation logs, pod logs, previous container logs, service mesh telemetry | Logs rotate before acquisition or were never enabled |

**Findings conditions:**
- Classify as P1 when a container, Kubernetes, or serverless incident cannot be tied to an immutable workload version, image digest, or function revision.
- Classify as P1 when volatile workload logs were available but not preserved before eviction, task replacement, alias movement, or log retention expiry.
- Classify as P2 when evidence uses mutable identifiers such as `latest`, `$LATEST`, or an alias without an incident-time mapping to immutable artifacts.
- Classify as P2 when cloud evidence is collected from only one region, account, subscription, project, namespace, or cluster despite indicators spanning more than one scope.

---

## 4. Findings Classification
Expand Down Expand Up @@ -401,6 +441,11 @@ the order of collection, and any evidence that could not be obtained.]
| Cloud Provider | Resource | Evidence Type | Collected | Notes |
|---|---|---|---|---|
| [AWS/Azure/GCP] | [Resource ID] | [Snapshot/Logs/Config] | [Yes/No] | [Notes] |

### Ephemeral Workload Evidence (if applicable)
| Platform | Workload | Immutable Identifier | Logs Preserved | Runtime Config Preserved | Gaps |
|---|---|---|---|---|---|
| [Kubernetes/ECS/Fargate/Cloud Run/Lambda/Cloud Functions] | [Name/ID] | [Image digest/function version/revision] | [Yes/No] | [Yes/No] | [Notes] |
```

---
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Ephemeral Cloud Workload Edge Cases

Use these cases to validate that `forensics-checklist` does not treat cloud snapshots or activity logs as complete forensic evidence for short-lived workloads.

## Case 1: Kubernetes pod evicted before workload evidence capture

**Input**

```yaml
incident: suspicious outbound traffic from checkout pod
cloud_evidence:
node_snapshot: captured
cloud_audit_logs: exported
kubernetes:
pod_name: checkout-7c9f5d6b8c-f2x4q
pod_status: evicted
pod_yaml: missing
pod_events: missing
current_logs: missing
previous_logs: missing
image_digest: missing
service_account: unknown
owner_refs: unknown
```

**Expected result**

Fail the evidence completeness check and classify as P1. The node snapshot and provider logs do not preserve pod identity, container runtime state, previous logs, image digest, service account, or controller ownership.

## Case 2: Serverless alias captured without immutable version

**Input**

```yaml
incident: suspicious S3 writes from Lambda function
cloud_evidence:
cloudtrail: exported
serverless:
provider: aws
function_name: invoice-processor
alias: prod
function_version: missing
code_sha256: missing
runtime_layers: missing
environment_refs: missing
execution_role_policy_snapshot: missing
event_source_mapping: missing
invocation_logs: partial
```

**Expected result**

Classify as P1 if the incident cannot be tied to an immutable function version or package hash. A mutable alias is not enough to prove what code and configuration executed during the incident window.

## Case 3: Container image tag recorded without digest

**Input**

```yaml
incident: suspicious process in managed container task
managed_container:
platform: fargate
service: public-api
task_definition_revision: captured
image: registry.example.com/api:latest
image_digest: missing
registry_metadata: missing
command: missing
environment_refs: partial
runtime_logs: captured
```

**Expected result**

Classify as P2 or higher depending on impact. The report must identify that `latest` is mutable and request image digest, registry metadata, runtime command, and complete environment or secret references.

## Case 4: Complete ephemeral workload evidence record

**Input**

```yaml
incident: confirmed credential misuse from containerized worker
kubernetes:
pod_yaml: captured
pod_events: captured
current_logs: captured
previous_logs: captured
image_digest: sha256:111122223333444455556666777788889999aaaabbbbccccddddeeeeffff0000
service_account: worker-prod
owner_refs: deployment/worker
network_policy: captured
audit_events: captured
serverless:
function_version: "42"
alias_mapping_at_incident: prod -> 42
code_sha256: abc123
layers: captured
execution_role_policy_snapshot: captured
event_source_mapping: captured
invocation_logs: captured
cloud_scope:
accounts: all affected accounts checked
regions: all affected regions checked
```

**Expected result**

Pass the ephemeral workload evidence gate. The report should still list any legal, retention, or access limitations, but it has immutable workload identity, logs, runtime configuration, and provider audit context.