Skip to content

abyo-software/s4-logs

S4 Logs

CI E2E License: Apache-2.0 Rust

Cut the CloudWatch Logs bill by moving logs to zstd-compressed S3 — without changing your applications. CloudWatch charges $0.50/GB to ingest and $0.03/GB-month to store. Most logs are written once and almost never read. S4 Logs archives them to S3 as standard zstd (readable by zstd -dc and Athena, no S4 tooling required) at a storage cost 1–2 orders of magnitude lower.

Honest framing: the "70–90% off the bill" number applies to Mode B (bypassing ingest, the dominant cost for most accounts). Mode A alone only cuts the storage line — by ~50–70% on S3 Standard, ~90% with --storage-class glacier-ir — the ingest you already paid is gone either way. The break-even table below tells you whether either mode is worth your time.

S4 Logs is the second product in the S4 family (S4 does the same thing to your S3 bill; S4 Logs does it to your CloudWatch Logs bill) and reuses s4-codec's S4IX range-read index format.

Install

Linux (x86_64 / aarch64) — one-line installer (downloads the latest release tarball, verifies its sha256, drops the static binary in ~/.local/bin):

curl -fsSL https://raw.githubusercontent.com/abyo-software/s4-logs/main/scripts/install.sh | sh

Knobs (all optional):

  • S4LOGS_VERSION=v0.3.0 — pin a release tag instead of latest.
  • S4LOGS_INSTALL_DIR=/usr/local/bin — install elsewhere (default ~/.local/bin). The script mkdir -ps it and is safe to re-run.

The installer is POSIX sh, set -eu, checks for curl/tar/sha256sum (or shasum), verifies the checksum before extracting, and prints a PATH hint if the install dir isn't on PATH. Read it first if you'd rather not pipe to a shell: scripts/install.sh.

Releases ship static musl binaries for x86_64-unknown-linux-musl and aarch64-unknown-linux-musl (no glibc dependency). Prefer to do it by hand? Grab the matching s4logs-<version>-<target>.tar.gz from the Releases page, verify its .sha256, and extract s4logs.

macOS / other — no prebuilt binary yet; build from source (the installer will tell you this and stop):

cargo install --git https://github.com/abyo-software/s4-logs s4logs-cli

Dockerdocker build -t s4logs . (see Dockerfile; the runtime image is ~176 MB and defaults to the Mode B gateway).

How it works

Two independent modes; run either or both.

Mode A — Drain (archive what's already in CloudWatch):

CloudWatch Logs ──FilterLogEvents──▶ s4logs drain ──zstd──▶ S3 (data + index sidecars)
                                          │
                                          └─▶ PutRetentionPolicy (e.g. 90d → 7d)
                                              only AFTER every affected window has
                                              a verified manifest (fail-closed)
  • Work unit = (log group, UTC-aligned 1h window). Each window's manifest proves it is fully archived; re-runs skip manifested windows (idempotent).
  • Archive first, shrink retention after. PutRetentionPolicy is gated on complete manifest coverage of everything older than the proposed cutoff; any gap means nothing happens. Default is report-only (--apply-retention opt-in).

Mode B — Bypass (avoid CloudWatch ingest entirely):

Fluent Bit / CW Agent / SDK ──PutLogEvents (same wire protocol)──▶ s4logs serve
                                                                      │
                              first-match TOML routing per group/stream:
                                  ├── "s3"          → zstd → S3   ($0 ingest)
                                  ├── "cloudwatch"  → passthrough to real CW
                                  ├── "both"        → both
                                  └── "drop"
  • The gateway speaks the CloudWatch Logs AWS JSON 1.1 API subset (PutLogEvents, CreateLogGroup, CreateLogStream, DescribeLogGroups, DescribeLogStreams) — agents migrate with an endpoint override only, the same customer experience as S4's S3-compatible endpoint.
  • Keep the streams you alert on in CloudWatch via routing rules; everything else skips the $0.50/GB toll.

S3 layout (open, query-engine-friendly)

{prefix}data/account={acct}/loggroup={g}/dt=YYYY-MM-DD/{name}.jsonl.zst      ← standard zstd, JSONL inside
{prefix}index/account={acct}/loggroup={g}/dt=YYYY-MM-DD/{name}.jsonl.zst.s4index   ← byte-range index (S4IX)
{prefix}index/...same.../{name}.jsonl.zst.s4lts                              ← per-frame timestamp ranges
{prefix}manifest/account={acct}/loggroup={g}/window={start}-{end}.json       ← drain idempotency + retention gate

Index sidecars live under a separate prefix so Athena/Spark pointed at data/ never see them. Each JSONL line is one event: {"timestamp":…,"stream":"…","message":"…","ingestion_time":…,"event_id":"…"} (epoch milliseconds; optional fields omitted when absent).

Quickstart

The s4logs CLI (crate s4logs-cli) wraps the library crates below. Flags shown are the documented interface (DESIGN.md §9) — run s4logs --help for the authoritative list.

Start here: see your savings (read-only, no charges)

# DescribeLogGroups + GetMetricData only — no writes, no bucket, read-only
# credentials suffice. storedBytes IS CloudWatch's storage billing basis,
# so the current-cost column needs no compression assumptions.
s4logs plan --all

Prints your log groups sorted by current monthly cost (storage on CW's gzip-billed bytes + ingest projected from IncomingBytes), with projected Mode A (S3 Standard and Glacier IR columns) and Mode B savings. Every assumption behind the projections is printed in the footer.

Drain a log group (Mode A)

# Report-only first: what would be archived, what would it save?
s4logs drain --log-group /aws/lambda/payments \
  --bucket my-archive-bucket --prefix s4logs \
  --from 2026-01-01T00:00:00Z --to 2026-06-01T00:00:00Z \
  --window 1h --concurrency 2 --dry-run

# Archive for real. Re-running is safe (manifested windows are skipped).
s4logs drain --log-group /aws/lambda/payments \
  --bucket my-archive-bucket --prefix s4logs

# Archive straight to Glacier Instant Retrieval ($0.004/GB·mo, list price
# as of 2026-06, us-east-1). Applies to data objects only — index sidecars
# and manifests stay S3 Standard (see Cost model for why).
s4logs drain --log-group /aws/lambda/payments \
  --bucket my-archive-bucket --prefix s4logs --storage-class glacier-ir

# Shrink CW retention — only succeeds if every older window is archived.
s4logs drain --log-group /aws/lambda/payments \
  --bucket my-archive-bucket --retention-days 7 --apply-retention

# Whole-account sweeps: --log-group takes a glob, or use --all. Per-group
# failures are reported and skipped; exit code 1 if any group failed.
s4logs drain --all --bucket my-archive-bucket --group-concurrency 2

# Big backlogs: page each window's streams in parallel shards and watch
# progress. Drain speed is FilterLogEvents-latency-bound (a real 5 GiB
# drain ran 94.6 min unsharded with zero throttling), so shards ≈ linear.
s4logs drain --log-group /app/api --bucket my-archive-bucket \
  --shard-streams 8 --progress

# Late-arrival repair: re-page manifested windows, dedup against the
# archive by event id, append only what is missing (see Limitations).
s4logs drain --log-group /app/api --bucket my-archive-bucket --reconcile

# What have I archived so far, and what is it saving? Reads manifests only —
# zero CloudWatch API calls, zero S3 data reads.
s4logs report --all --bucket my-archive-bucket

Run the bypass gateway (Mode B)

s4logs serve --listen 0.0.0.0:8080 \
  --bucket my-archive-bucket --prefix s4logs --account 123456789012 \
  --routing-config routing.toml --flush-bytes 8MiB --flush-interval 60s \
  --wal-dir /var/lib/s4logs/wal

--wal-dir makes acknowledged events crash-durable (fsync before the ack, replay on restart — at-least-once). To require signed requests, add --auth-mode sigv4 with S4LOGS_AUTH_ACCESS_KEY / S4LOGS_AUTH_SECRET; agents and SDKs already sign, so no client change is needed beyond matching credentials.

routing.toml (first match wins):

default_action = "s3"            # s3 | cloudwatch | both | drop

[[rule]]
log_group = "/aws/lambda/payments-*"   # glob
action = "cloudwatch"                  # keep alert-critical streams in CW

Point Fluent Bit at it (endpoint override is the entire migration):

[OUTPUT]
    Name              cloudwatch_logs
    Match             *
    region            us-east-1
    log_group_name    /app/api
    log_stream_prefix node-
    auto_create_group On
    endpoint          http://s4logs-gateway.internal:8080

or the CloudWatch Agent:

{ "logs": { "endpoint_override": "http://s4logs-gateway.internal:8080" } }

Search and restore

# grep without downloading whole objects: timestamp sidecar prunes frames,
# byte-range sidecar turns the survivors into S3 Range GETs. Output is
# timestamp-ordered across objects (streaming k-way merge, bounded memory).
s4logs grep 'ERROR.*timeout' --log-group /aws/lambda/payments \
  --from 2026-03-01T00:00:00Z --to 2026-03-02T00:00:00Z --output text

# Restore raw JSONL locally (the primary restore path):
s4logs restore --log-group /aws/lambda/payments \
  --from 2026-03-01T00:00:00Z --to 2026-03-02T00:00:00Z --to-file out.jsonl

Local E2E

./scripts/e2e.sh    # LocalStack up → full #[ignore] suite → down

Cost model

Per-GB economics (AWS list prices as of 2026-06, us-east-1 throughout this section). One subtlety that most cost write-ups miss, and that we initially got wrong too: CloudWatch bills archived storage on gzip-level-6 compressed bytes, not raw (AWS pricing page footnote). Typical text logs gzip ~3–5×, so the honest comparison is CW-gzip vs S3-zstd:

Where the archive lives (--storage-class)

Drained data objects can go to any instant-access S3 tier (--storage-class standard | standard-ia | glacier-ir, default standard):

Tier $/GB·mo (list, 2026-06, us-east-1) ≈ $/GB-raw·mo at 6.2× zstd
CloudWatch Logs storage $0.03 on gzip-6 bytes ~$0.006–0.010 (gzip 3–5×)
S3 Standard (default) $0.023 ~$0.0037
S3 Standard-IA $0.0125 ~$0.0020
S3 Glacier Instant Retrieval $0.004 ~$0.0007

Glacier Instant Retrieval keeps millisecond first-byte latency — it is not the hours-long restore-request Glacier — so s4logs grep, restore and Athena work unchanged against it. The catches, each stated plainly:

  • 90-day minimum storage charge. A GIR object deleted (or transitioned) before day 90 is billed for the remainder of the 90 days anyway. A log archive satisfies this trivially — you drained the data precisely because you intend to keep it for months or years; if you'd delete it inside a quarter, you should be shortening retention instead of archiving.
  • Retrieval costs $0.03/GB, plus pricier GET requests. Every data byte read back out of GIR — by grep, restore, or Athena — pays it. This is where the sidecar design earns its keep: the timestamp sidecar prunes frames before any data byte is fetched, and the byte-range sidecar turns survivors into Range GETs. In the real-AWS experiment below, a time-pruned query touched 78.6 MB of a 1.6 GiB archive — about $0.002 of GIR retrieval instead of ~$0.05 for a full scan. If you grep the archive heavily and broadly, Standard-IA ($0.01/GB retrieval) or Standard may price better; GIR is the write-once-read-rarely default.
  • 128 KiB minimum billable object size. GIR bills any smaller object as 128 KiB. This is why S4 Logs applies the storage class to data objects only: the .s4index/.s4lts sidecars and window manifests are KiB-scale and read on every query plan, so they always stay S3 Standard — they'd cost more in GIR and would put retrieval pricing on the hot planning path. Their footprint is a rounding error next to the data.

The bill, by mode

CloudWatch as-is Mode A (drained) Mode B (bypassed)
ingest $0.50/GB (raw bytes + 26 B/event) $0.50/GB (already paid — not recoverable) $0 (avoided) + negligible S3 PUTs
storage $0.03/GB·mo on gzip-6 bytes ≈ $0.006–0.010/GB-raw·mo S3 Standard ≈ $0.0037/GB-raw·mo; Glacier IR ≈ $0.0007/GB-raw·mo (6.2× zstd) same
reduction ~50–70% of the storage line on Standard, ~90% on Glacier IR; retention shortening then removes the CW line entirely for drained data ~90%+ of the whole bill (ingest dominates)

Worked example, 1 TiB of raw logs: CloudWatch storage ≈ $7.7/mo (at gzip 4×) → S3 Standard at our measured 6.2× zstd $3.8/mo → Glacier Instant Retrieval $0.66/mo (~91% off the storage line). Shrink CW retention after draining and the CW line goes to ~0 for the drained range. In Mode B the same TiB skips $512 of one-time ingest — storage-class tuning moves single dollars; ingest is where the bill lives.

Break-even honesty (same policy as S4's README): if your CloudWatch Logs bill is under $500/month, the OSS version is all you need — and even that may be more moving parts than your problem deserves. Mode A on its own is a storage optimization — ~90% with glacier-ir, but of what is usually the smaller line; the compelling moves remain retention shortening (Mode A's gate makes it safe) and Mode B ingest avoidance. Draining itself costs ~$0 in API charges: FilterLogEvents has no per-call price (verified on a real 5 GiB drain — see below), only time and account quota.

Measured compression (synthetic corpora)

Single-threaded ChunkWriter (zstd-3, 4 MiB frames, content checksum on), ~64–78 MiB synthetic corpora, AMD Ryzen 9 9950X, 2026-06-10 (cargo test -p s4logs-e2e --release -- --ignored bench --nocapture to reproduce):

Shape Input Ratio Throughput
nginx access log 75 MiB 8.3× 546 MiB/s
JSON app logs 78 MiB 9.1× 604 MiB/s
java app + stacktraces 71 MiB 11.0× 742 MiB/s

Synthetic generators underestimate real-log redundancy (real fleets repeat themselves far more): the S4 family reference on a real corpus is s4's measured 155× on 256 MiB of nginx logs at 3.7 GB/s (cpu-zstd-3, 2026-05-13). Treat 8–11× as the floor and 155× as what repetitive access logs actually do.

Verified against real AWS (controlled experiment, 2026-06-10)

We ran the full Mode A pipeline against a real us-east-1 account — controlled and synthetic (we seeded the data ourselves; labeled as such, not passed off as an organic workload):

Step Measured
Seed: PutLogEvents, 16 streams 5.00 GiB message bytes, 33,163,647 events, 0 rejections, 592 s (8.7 MiB/s aggregate)
Backdated-event visibility events ingested with past timestamps took 3–5.5 min to appear in FilterLogEvents (see Limitations — this interacts with drain manifests)
Drain: 5 windows, --concurrency 4 94.6 min wall, 0 ThrottlingExceptions, $0 API charges (FilterLogEvents is unmetered; per-page latency is the bottleneck)
Archive 9.7 GiB JSONL → 1.6 GiB zstd (6.2×), 41 objects
Fidelity spot 60 s slice: CW 160,000 = archive 160,000; Athena full count = drain count = 33,163,613 (34 events = 0.0001% vs the seeder's own count remain unattributed — see Limitations)
FilterLogEvents semantics endTime verified inclusive with a live probe (the drain's window math depends on it)
14-day PutLogEvents rejection confirmed live (tooOldLogEventEndIndex) — the restore design constraint is real
Retention gate PutRetentionPolicy(1 day) applied through the coverage gate on the real API
Athena DDL + count(*) + LIKE query ran against the real archive; partition pruning scanned 1.68 GB / 78.6 MB respectively

Total experiment cost: ~$2.60 (5 GiB × $0.50 ingest + cents of S3/Athena).

Mode B + restore against real AWS (2026-06-12)

The gateway and restore paths — previously LocalStack-only — were validated against a real us-east-1 account (KB-scale, ~cents):

Step Result
Gateway /health + /ready 200; /ready succeeded against the real S3 ListObjectsV2 probe
PutLogEvents via the gateway (aws CLI, endpoint override) events landed in real S3 under the correct dt= layout; objects decode with plain zstd -dc
both routing the routed event reached real CloudWatch (passthrough) and S3
s3-only routing no log group was created in real CloudWatch (verified absent) — routing isolates correctly
grep over the gateway-written archive matched via real-S3 range reads (1 frame fetched, no full-object fallback)
restore --to-log-group to real CW 2 events re-ingested at current time with the {original_timestamp, original_stream, message} wrap — the 14-day-constraint design works live
Graceful shutdown flushed all buffers on SIGTERM

All resources were deleted afterward (bucket, log groups, IAM detached).

2-hour sustained soak (2026-06-12)

A 2 h LocalStack soak at 100 req/s × 10 events across 3 log groups: 715,817 requests / 7,158,170 events acked / 0 failures, all 7,158,170 events durable (0 loss), 3,878 flushes, RSS delta 2.3 MiB over the run (no leak). The 24 h Marketplace-gate soak uses the same harness (soak.yml, S4LOGS_SOAK_SECONDS=86400) and needs a self-hosted runner (GitHub-hosted jobs cap at 6 h).

No lock-in: your data is plain zstd

Format stability: from v1.0 the on-disk formats — the standard-zstd data objects and their JSONL schema, the .s4index / .s4lts sidecars, the manifest JSON, and the S3 key layout — are frozen for the 1.x series. Any 1.x release reads data any other 1.x release wrote; new fields are only ever added as optional. Breaking changes wait for 2.0 and keep a 1.x read path. Full contract: DESIGN.md §14. (The CLI flags, output text, and metric names are implementation detail and may evolve under semver-minor; only the persisted formats are frozen.)

Data objects are concatenated standard RFC 8878 zstd frames — not a custom container. If S4 Logs disappears tomorrow:

aws s3 cp s3://bucket/s4logs/data/.../1781042400000-000000.jsonl.zst - | zstd -dc | head

works, today (this exact property is asserted in the E2E suite, including after deliberately deleting the index sidecar — sidecars only make reads fast, they are never required). This is one notch stronger than S4 proper, whose S4F2 container needs the ~1k-LOC Apache-2.0 s4-codec decoder; S4 Logs objects need nothing at all.

Query the archive in place with Athena (Hive-style partitions, zstd detected by the .zst extension). The catch: log group names are percent-encoded in the loggroup= partition values (/aws/lambda/foo%2Faws%2Flambda%2Ffoo), which trips MSCK REPAIR TABLE on some Athena versions. The clean fix is partition projection — Athena computes partitions from TBLPROPERTIES, so there is no crawler, no MSCK, and no encoding ambiguity:

CREATE EXTERNAL TABLE s4logs_archive (
  `timestamp` bigint,
  stream string,
  message string,
  ingestion_time bigint,
  event_id string
)
PARTITIONED BY (account string, loggroup string, dt string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://YOUR_BUCKET/s4logs/data/'
TBLPROPERTIES (
  'projection.enabled' = 'true',
  'projection.account.type' = 'injected',   -- query by exact 12-digit id
  'projection.loggroup.type' = 'injected',  -- query by the %-encoded value
  'projection.dt.type' = 'date',
  'projection.dt.format' = 'yyyy-MM-dd',
  'projection.dt.range' = '2024-01-01,NOW',
  'projection.dt.interval' = '1',
  'projection.dt.interval.unit' = 'DAYS',
  'storage.location.template' =
    's3://YOUR_BUCKET/s4logs/data/account=${account}/loggroup=${loggroup}/dt=${dt}/'
);

SELECT from_unixtime(`timestamp` / 1000) AS t, stream, message
FROM s4logs_archive
WHERE account = '123456789012'
  AND loggroup = '%2Faws%2Flambda%2Fpayments'   -- percent-encoded
  AND dt BETWEEN '2026-06-01' AND '2026-06-09'
  AND message LIKE '%ERROR%'
LIMIT 100;

account and loggroup use the injected projection type: their value sets are sparse and user-specific (and loggroup is percent-encoded), so rather than enumerate them in DDL you pass the exact literal in WHERE — which also means a query must filter on both (you query one log group at a time anyway). dt is a real date range so BETWEEN/>= prune to just the days touched. Full DDL, the readable-loggroup view trick, and the explicit-ADD PARTITION alternative are in docs/athena.md.

What's verified, honestly: the explicit-ADD PARTITION form (table LOCATION at the loggroup= level, one ADD PARTITION per dt= directory — see docs/athena.md §2) is what we ran end-to-end on real Athena (2026-06-10): against a real 41-object / 1.6 GiB archive, count(*) returned exactly the drained record count and a LIKE query with partition pruning scanned 78.6 MB. The partition-projection config above is standard Athena syntax that we validated only for parse-correctness — it has not been run against a live Athena endpoint, so sanity-check a count(*) against a known window before relying on it.

Restore and the 14-day PutLogEvents constraint

The CloudWatch PutLogEvents API rejects events older than 14 days (or older than the group's retention). Restoring 90-day-old logs to CloudWatch with their original timestamps is therefore impossible — for anyone, not just us. S4 Logs handles this honestly:

  • Primary restore path is local: s4logs restore --to-stdout / --to-file emits raw JSONL. Combined with s4logs grep, this covers most investigations.
  • --to-log-group is provided but ingests events at the current time, wrapping each message as {"original_timestamp":…,"original_stream":"…","message":"…"} so Logs Insights can still filter on original_timestamp. --raw disables the wrap and then only events newer than 14 days will be accepted.

Limitations (read before deploying)

  • SigV4 verification is opt-in and single-key. --auth-mode sigv4 verifies incoming signatures against one static key pair — enough for agents/SDKs, but there is no IAM integration, no session tokens, no presigned URLs. Default remains no verification: then run it behind TLS and a network boundary (security group / private subnet / mTLS mesh).
  • Durability is opt-in via --wal-dir. With it, events are fsynced before the ack and replayed on restart (at-least-once: duplicates are possible after a crash, never silent loss). The segment file and its parent directory are fsynced on create and delete, so the lifecycle is power-loss-safe on ext4/xfs (FUSE/NFS mounts that reject directory fsync degrade to best-effort, counted in s4logs_wal_dir_fsync_errors_total). Without a WAL, a crash loses up to one flush window per (group, day) buffer. both routing keeps a CW copy if you need belt and braces.
  • Late-arriving events can be missed by a too-eager drain. CloudWatch indexes backdated events with a lag we measured at 3–5.5 minutes (and agents may deliver much later). A window drained before its stragglers arrive gets a manifest, and manifests are skipped on re-runs — so those events never reach the archive. Mitigations: drain data that is at least hours old (the normal archival pattern satisfies this trivially), and run s4logs drain --reconcile over suspect ranges — it re-pages manifested windows, dedups against the archive by event identity, and appends only what is missing. Reconcile re-pages the full window (same FilterLogEvents time cost as draining it), so it is a repair tool, not a cheap verification sweep.
  • Drain speed is latency-bound, not quota-bound. A real 5 GiB drain took 94.6 min at --concurrency 4 with zero throttling and $0 in API charges; FilterLogEvents page latency is the bottleneck, so TB-scale initial drains are a time budget (raise --concurrency), not a money one.
  • One unattributed 0.0001% count gap. In the controlled experiment the archive matched CloudWatch exactly on every cross-check we ran (slice-level FilterLogEvents counts, Athena vs drain totals), but 34 of 33,163,647 seeder-counted events (1×10⁻⁶) were never observed in CloudWatch reads. We could not attribute them (seeder accounting vs CW ingestion); recorded here rather than rounded away.
  • Storage class is a per-deployment write-time choice; report now prices per class. --storage-class only affects new data objects; archives with mixed classes read fine (grep/restore/Athena are class-agnostic for the instant-access tiers). s4logs report prices each object by its recorded class — STANDARD $0.023, STANDARD_IA $0.0125, GLACIER_IR $0.004 per GiB·mo (us-east-1 list, storage only) — and shows a per-class breakdown when a group mixes classes. Objects from pre-5L manifests (and any drained without --storage-class) have no recorded class and are billed at Standard, with the count noted as "N object(s) assume Standard (pre-storage-class manifest)". Retrieval / request / per-object-minimum charges are still out of scope.
  • grep / restore accept a glob or --all, like drain/report. --log-group takes an exact name or a globset glob, and --all sweeps the whole account; results stay globally timestamp-ordered across groups (one shared k-way merge). An exact name reads S3 only — no CloudWatch call — so the common single-group case keeps the read-only-S3 property; a glob or --all first enumerates groups via DescribeLogGroups. With multiple source groups, restore --to-log-group funnels every event into the single target and records original_log_group in the wrap JSON (single-group restores wrap byte-identically to before).
  • Single account per deployment (P1). AWS Organizations multi-account drain is part of the planned commercial tier, not the OSS core.
  • Compression numbers above are synthetic except where explicitly labeled (s4's real nginx corpus; the real-AWS experiment used synthetic data too and is labeled as such).
  • Bill-line confirmation, partial. Cost Explorer (captured 2026-06-11) confirmed the experiment's Athena line exactly ($0.008 for the 1.68 GB scan at $5/TB) and the S3 request pennies; the CloudWatch ingest line (list-price math: $2.50) had not yet posted when we tore the experiment down — CW billing lines can lag 24–48 h and we deleted the resources rather than keep them alive for a screenshot. storedBytes (the gzip billing basis) never materialized before the 1-day retention expired the test data, so the gzip-billed-storage statement in the cost model rests on the AWS pricing page footnote, not on our own measurement.
  • Restore to CloudWatch cannot reproduce original timestamps older than 14 days (AWS API constraint — see previous section).

IAM

Minimal-permission policy documents ship in docs/, one per role (replace YOUR_BUCKET / YOUR_ACCOUNT_ID, and scope log-group:* down to the groups you actually drain):

File Grants Used by
docs/iam-policy-drain.json logs:FilterLogEvents, logs:DescribeLogGroups, logs:PutRetentionPolicy; S3 Put/Get + prefix-scoped List s4logs drain
docs/iam-policy-gateway.json S3 Put/Get + prefix-scoped List; optional logs:PutLogEvents / logs:CreateLogGroup / logs:CreateLogStream (delete that statement if no route targets CloudWatch) s4logs serve
docs/iam-policy-restore.json S3 Get + prefix-scoped List; logs:PutLogEvents / logs:Create* for --to-log-group s4logs restore

Observability

The gateway serves /health (unconditional 200), /ready (503 until the sink probe — a cached ListObjectsV2 max-keys=1 — succeeds and the last flush did) and Prometheus /metrics: s4logs_events_total{action=}, s4logs_flush_total, s4logs_flush_bytes_total{kind=raw|compressed}, s4logs_cw_passthrough_errors_total, s4logs_backpressure_total, and the WAL family (s4logs_wal_appends_total, s4logs_wal_replayed_events_total, s4logs_wal_torn_lines_total, s4logs_wal_fsync_errors_total, s4logs_wal_dir_fsync_errors_total). Logs via tracing (--log-format json|pretty). A ready-to-import Grafana dashboard ships in contrib/grafana/.

Development

cargo test --workspace          # unit + proptest (no network)
./scripts/e2e.sh                # LocalStack E2E (docker compose)
./scripts/soak.sh               # sustained-load soak (S4LOGS_SOAK_SECONDS, default 60)
cargo test -p s4logs-e2e --release -- --ignored bench --nocapture   # bench table
cargo test -p s4logs-core --test fuzz_bolero   # fuzz targets, test mode
cargo bench -p s4logs-core -- --quick          # criterion benches
docker build -t s4logs .                        # 176 MB runtime image

Workspace crates: s4logs-core (format + S3 layout + read path), s4logs-drain (Mode A), s4logs-gateway (Mode B), s4logs-cli (binary), s4logs-e2e (LocalStack suites). Format and behavior contract: DESIGN.md.

License

Apache-2.0 — see LICENSE and NOTICE.

Amazon CloudWatch, Amazon S3 and AWS are trademarks of Amazon.com, Inc. or its affiliates. S4 Logs is not affiliated with, endorsed by, or sponsored by Amazon; CloudWatch is referenced solely to describe interoperability.

About

CloudWatch Logs cost offloader — drain or bypass log groups into zstd-compressed S3. OSS sibling of S4 (Squished S3).

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages