Skip to content

abyo-software/ferro-stash

FerroStash

CI License: Apache-2.0 Rust

Run your existing Logstash pipelines in Rust — no JVM, no config rewrite

FerroStash runs most production-common Logstash pipeline.conf files unchanged — parsing the Logstash DSL natively (same input → filter → output model, same event model: @timestamp, tags, [a][b] field references, %{field} interpolation) — but without a JVM. The result is a single ~14 MB static binary that starts in milliseconds and holds tens of MB of RAM instead of ~1 GB.

Low-risk to try: FerroStash reads the same config and emits the same events as Logstash, so you can run it beside your current pipeline and diff the output before trusting it with anything. Start there — see When NOT to use FerroStash for the honest caveats first.

☁️ Run it on AWS Marketplace — AWS-billed, nothing to self-manage: AMI for EC2 · Container for Amazon EKS (Helm) — both billed through your AWS account. The open-source build is free for local and self-managed use and is the full engine. Marketplace adds AWS procurement, consolidated billing, and a supported commercial path; the Marketplace container and AMI use the default no-Ruby build, while the repo-root Dockerfile includes the optional Ruby filter.

  inputs  ──▶   filters   ──▶   outputs
  stdin, file,      grok, mutate, json,     elasticsearch, kafka,
  tcp, http,        kv, dissect, date,      s3, http, file, datadog,
  syslog, kafka,    ruby (mruby) +          tcp, stdout, …
  beats, redis, …   native `script`, …
  • Logstash-config compatible — runs most pipeline.conf files unchanged; output verified field-for-field against Logstash 9.4.2 across 24 parity fixtures (runtime-only fields like @timestamp/host normalized).
  • A fraction of the footprint — ~8–13× lower RSS and ~700× faster cold start than Logstash in our benchmark (see Performance).
  • Keep your Ruby, or go fast — an embedded Artichoke (mruby) interpreter runs your ruby { } unchanged for migration; the native script { } filter (Painless subset) runs custom logic ~3.6× faster than JRuby.
  • No JVM, no GC pauses — one static binary, deterministic latency.

Status

v1.0.0 — first stable release, with a SemVer-stable surface (config DSL, event model, CLI flags, and plugin set are frozen for the 1.x line). This is a contract on API/behaviour stability, not a production track record: single-developer project, no public production deployments yet — run it beside your existing pipeline before trusting it with irreplaceable data. cargo test --workspace runs 1,400+ tests, 0 failing, with cargo clippy -D warnings, cargo fmt --check, and cargo deny check clean, and output verified against Logstash 9.4.2 fixture outputs (24/24 parity fixtures, runtime-only fields normalized; Docker side-by-side covers a 13-fixture subset).

The ten previously-stubbed connector plugins (input/output kafka, redis, s3; output datadog; filters geoip, dns, elasticsearch) now have real external integrations and are live-validated against real services (real Apache Kafka, Redis, AWS S3, Elasticsearch, and the DataDog intake) via #[ignore], env-gated smoke tests. Those smoke tests are run manually with the service available — they are not part of CI (which has no brokers or credentials) and verify reachability + a round-trip, not exhaustive conformance. See Honest limitations for exactly what was validated and the feature residuals per plugin; read those caveats before deploying any connector.

The OSS build (Apache-2.0) is the full engine. AWS Marketplace adds AWS-native procurement, billing, and commercial support; the Marketplace container and AMI intentionally follow the default no-Ruby feature set (the repo-root Dockerfile builds with the optional Ruby filter).

Why teams use FerroStash

Need What FerroStash gives you
Logstash's JVM holds ~1 GB RAM per pipeline A native binary that holds tens of MB — pack far more shippers per host
~8–30 s JVM cold start hurts sidecars, autoscaling, short-lived jobs Sub-second start (~10 ms in practice)
~350 MB install + a JDK to ship everywhere One ~14 MB static binary, no runtime to install
You can't rewrite hundreds of existing pipeline.conf files Most run unchanged — same DSL and event model, field-for-field verified against Logstash 9.4.2
Custom ruby { } logic is your escape hatch mruby runs it as-is for migration; rewrite the hot path in native script { } for ~3.6× JRuby
GC pauses jitter your tail latency No GC — deterministic latency

At a glance (vs Logstash on the JVM):

Property Logstash (JVM) FerroStash (native)
Migration cost — (already running it) Most existing .conf run unchanged (field-for-field verified)
Runtime JVM (Java) + JRuby Native Rust binary
Idle memory (RSS) ~0.5–1 GB+ ~10–50 MB
Cold start ~8–30 s (JVM warm-up) < 1 s (~10 ms)
Install / binary size ~300–400 MB (+JVM) ~12–15 MB stripped
GC pauses Yes (G1GC) None (deterministic)

Figures are measured on a single host; treat all comparative numbers as evidence from one environment (see Performance), not a universal guarantee.

Performance

Measured on a single dedicated host (AWS c7i.2xlarge, 8 vCPU, x86-64) against Logstash 9.4.2 — identical pipeline and byte-identical input on both engines, output to null (so the sink is never the bottleneck), 8 workers, throughput startup-subtracted and reported as the mean of 3 runs. Reproduce with bench/ (./bench/run_bench.sh). These are one-environment numbers, not a universal guarantee.

Throughput and memory (native filters, 5M events)

Filter Logstash 9.4.2 FerroStash Throughput Logstash RSS FerroStash RSS Memory
grok 193k ev/s 332k ev/s 1.7× 1,113 MB 98 MB 11× less
dissect 193k 318k 1.6× 1,098 MB 106 MB 10× less
json 174k 255k 1.5× 1,100 MB 133 MB 8× less
kv 171k 258k 1.5× 1,170 MB 126 MB 9× less
csv 25k 79k 3.2× 1,471 MB 117 MB 13× less
grok + mutate 186k 251k 1.35× 1,099 MB 120 MB 9× less

Cold start is ~0.01 s for FerroStash vs ~7 s (JVM warm-up) for Logstash on every workload. The throughput edge is modest-but-consistent; the decisive wins are ~8–13× lower memory and near-instant startup.

Custom logic: Painless vs Ruby

Logstash's only in-pipeline custom-logic path is ruby { }. FerroStash runs that same Ruby (Artichoke/mruby, for drop-in migration) and offers a native script { } filter (a Painless subset) for the hot path. Same transformation, same input:

Engine Throughput
FerroStash script (native Painless) 525k ev/s
Logstash ruby (JRuby) ~145k ev/s
FerroStash ruby (mruby) 11k ev/s
  • script is ~3.6× faster than Logstash's ruby/JRuby (and ~48× faster than FerroStash's own mruby filter) — native execution, no JVM, the script is parsed once and the cached AST is reused per event.
  • ruby (mruby) is ~13× slower than JRuby. It exists for migration compatibility — your existing ruby { } configs run unchanged — not for speed. Move hot-path logic to script { } to get native, JRuby-beating throughput. The mruby gap is inherent (no JIT + per-event marshalling); parallelizing it across workers narrows but does not close it.

The JRuby custom-logic figure is corroborated across runs (~143–147k); see bench/ for the full methodology and configs.

Smaller instance, lower bill: AWS Marketplace AMI vs OSS Logstash

TL;DR — drop an instance class, pay less, ship more events. FerroStash on c7g.medium outperforms OSS Logstash on c7g.large by 4.2× throughput at 33 % the cost (Marketplace software fee included), using 20× less RAM. With OpenSearch as the sink the ratio is 1.7× throughput / 2.5× events-per-dollar / 14× less RAM on the same smaller instance.

Real bench on freshly-launched AWS EC2 Graviton (us-east-1, 2026-06-24), 500 k Apache-combined-log lines through grok COMBINEDAPACHELOG → date → mutate → file_out, 3 iterations per cell, median reported, output verified byte-equivalent down to @version / @timestamp / all COMBINEDAPACHELOG sub-fields.

AMI (per-hour cost includes EC2 on-demand + Marketplace software)

Setup $/hr Throughput RSS Events per $ vs OSS Logstash baseline
OSS Logstash c7g.large (baseline) $0.0723 12.7 k ev/s 1 058 MB 632 M
FerroStash c7g.medium 🏆 $0.0481 52.8 k ev/s 49 MB 3 953 M 6.3× more / $
FerroStash t4g.small $0.0568 38.2 k ev/s 116 MB 2 420 M 3.8× more / $
FerroStash c7g.large $0.1323 61.1 k ev/s 61 MB 1 663 M 2.6× more / $

The 🏆 row is the operational sweet-spot: smaller machine, lower bill, 4.2× the work. The c7g.large Marketplace row exists to show the apples-to-apples ceiling (4.8× the OSS engine on identical hardware).

Container — same story, 6× smaller image

Published Marketplace ferro-stash-container:1.0.2 vs official docker.elastic.co/logstash/logstash:9.4.2 on the same c7g.large host:

Container Image Throughput RSS $/pod-hr (MP + node share) vs OSS Logstash
FerroStash 1.0.2 🏆 142 MB 54.9 k ev/s 51 MB $0.1123 3.2× more / $
Logstash 9.4.2 (Elastic OCI) 899 MB 11.0 k ev/s 1 044 MB $0.0723 1× baseline

5.0× throughput, 20× less RAM, 6.3× smaller image. Packing more FerroStash pods per node (~50 MB each) widens the dollar gap further; Logstash's ~1 GB / pod floor makes that impractical below c7g.2xlarge.

Output → OpenSearch (the real-life pipeline)

The file-out numbers above isolate the engine. Real Logstash deployments mostly write to Elasticsearch / OpenSearch — so we ran the same workload with each engine pushing into a shared OpenSearch 2.18 sink on a dedicated c7g.xlarge node (sink is fast enough that neither engine is sink-starved at this load). Result:

Client setup Sink Throughput (indexed in OpenSearch) RSS $/hr Events per $
FerroStash c7g.medium 🏆 OpenSearch 2.18 11 655 ev/s 79 MB $0.0481 872 M
Logstash c7g.large (logstash-output-opensearch) OpenSearch 2.18 6 922 ev/s 1 098 MB $0.0723 345 M

500 000 / 500 000 docs verified via _count on every iteration, both engines, no drops. Even with OpenSearch as the sink (where the JVM client should be at its most competitive), FerroStash on the smaller instance still delivers 1.7× more indexed events per second and 2.5× more indexed events per dollar, at 14× less RAM.

Full methodology, per-iteration numbers, t4g.nano row, CPU-credit caveats, OpenSearch sink details, harness scripts, reproduction guide — docs/aws-benchmarks.md.

Logstash compatibility scope

FerroStash targets the production-common subset of Logstash, not its full plugin catalogue. It implements ~88% of the plugins bundled with Logstash 9.4.2 (98 / 111) — codecs 100%, filters 97%, inputs 74%, outputs 87% — weighted toward the parse/filter hot path; the long tail of connectors (enterprise messaging / SNMP, …) is the main gap. A config that uses a missing plugin fails fast at load, so check the full compatibility matrix before migrating. The default event shape (@timestamp, tags, bracket-notation field references [a][b], %{field} interpolation) and the .conf DSL follow Logstash semantics; the docker-driven regression harness asserts field-by-field equality against Logstash 9.4.2 for a curated fixture set (see below). Throughput/memory benchmarks (see Performance) were also run against Logstash 9.4.2. There is no single pinned "target Logstash version" — the config language and event model are broadly stable across Logstash 5.x–9.x, and compatibility is asserted per-fixture rather than claimed wholesale.

Verified parity evidence

Two harnesses run the same fixtures. The in-process tests/e2e/logstash_compat_test.rs runs all 24 fixtures against committed golden files (each golden file is generated from the real Logstash oracle), covering stdin → stdout(json), grok, mutate (rename/case/gsub/convert/copy/strip), json, kv, dissect, fingerprint, date, clone, csv, truncate, translate, split, urldecode, drop, conditional if/else if/else, and unicode inputs. The Docker side-by-side harness tests/e2e/logstash_docker_compat_test.rs pipes the same pipeline.conf

  • input through both target/debug/ferro-stash and docker.elastic.co/logstash/logstash:9.4.2 and asserts each event payload equal field-by-field after stripping only runtime-only fields (@timestamp, @version, host, event.original); it currently wires a 13-fixture subset of the 24. The in-process harness runs in default CI; the Docker harness is #[ignore] (it requires Docker). Run the Docker harness with:
cargo build --bin ferro-stash
docker pull docker.elastic.co/logstash/logstash:9.4.2
cargo test -p ferro-stash-e2e --test logstash_docker_compat_test \
    -- --ignored --nocapture --test-threads=1

The docker-driven regression harness under tests/logstash-compat/ is the authoritative, runnable record of what this evidence does and does not substantiate.

How FerroStash compares to Vector

Vector (maintained by Datadog) is an excellent, mature, Rust-based observability pipeline — if you're starting fresh, it's a strong default. The difference is the migration path, not the runtime:

  • Vector asks you to rewrite your pipelines in VRL (Vector Remap Language) and its own config format. Great for greenfield; a real project if you already run Logstash.
  • FerroStash runs most of your existing Logstash pipeline.conf unchanged, with output verified field-for-field against Logstash 9.4.2. It's built for teams with an existing Logstash investment who don't want to rewrite it just to drop the JVM.

If you have no Logstash config to preserve, Vector is the more battle-tested, broader-ecosystem choice and you should probably use it. If you do, FerroStash lets you keep what you have.

When NOT to use FerroStash

Honest list of cases where FerroStash isn't the right call — better to know now:

  • You need the full Logstash plugin catalogue. FerroStash implements the production-common subset (see Plugins); the 200+ community plugins are out of scope. If your pipeline leans on a niche plugin, check the list first.
  • You need a battle-tested tool with a production track record today. This is a single-developer project with no public production deployments yet. For irreplaceable data, run it alongside your existing pipeline first.
  • Your hot path is heavy custom Ruby. The ruby { } filter (mruby) is for migration and is ~13× slower than Logstash's JRuby — port hot logic to the native script { } filter, or stay on Logstash if you can't.
  • You're already throughput-saturated and memory isn't a concern. The native throughput edge is modest (~1.4–1.7×); FerroStash's decisive wins are memory (~8–13× less) and cold start (~700×). If neither helps you, the upside is small.
  • You need SOC2 / ISO 27001 / FedRAMP evidence. Those reports don't exist yet.

If you run Logstash on the JVM, want your existing .conf to keep working, and care about memory or startup time, FerroStash is built for you — point one pipeline at it next to your current one and compare.

Plugins

Counts below reflect what is registered in the plugin factories (create_input / create_filter / create_output / create_codec), verified against source. These "registered" totals include FerroStash-only plugins and a few not bundled with Logstash, so they are higher than the "Logstash-9.4.2-bundled covered" figures in Logstash compatibility scope / COMPATIBILITY.md (codecs 19/19, filters 34/35) — the two count different things. The ten connector plugins that were formerly stubs now perform real external integrations and are live-validated (live-validated in the Status column) against real services via #[ignore], env-gated smoke tests — run manually with the service available, not in CI. See the Notes column and Honest limitations for exactly what each smoke test exercises and the per-plugin feature residuals.

Input plugins (25 registered)

Plugin Status Notes
stdin functional one event per line
file functional tailing, glob, sincedb, rotation detection
tcp functional TLS via rustls
udp functional datagram input
exec functional runs command via sh -c every intervals (or schedule => { every => "Ns" }); stdout → events (codec plain = one event per line, json = NDJSON). [@metadata][exec][command] + [@metadata][exec][duration]. stdout only
pipe functional runs a long-running command and streams its stdout line-by-line (codec plain/json), relaunching the child with a small backoff when it exits
unix (Unix-only) functional Unix domain socket path; mode server (default, accept + read lines) or client (connect + read); codec line/plain/json. The factory returns a clear error on non-Unix platforms
http functional HTTP POST (JSON / plain)
http_poller functional periodically requests configured urls (string or {url,method,headers}) on interval / schedule => { every }, decodes each response via codec (default json), tags events with http_poller_name. Per-request failures are logged + skipped (no synthetic failure event)
gelf functional Graylog GELF over UDP (default) or TCP (NUL-delimited frames); gzip/zlib auto-detect, short_messagemessage, _customcustom. Single-datagram only — chunked GELF reassembly is not implemented
graphite functional Carbon plaintext over TCP: metric value timestampmetric/value(float)/timestamp(int)
syslog functional RFC 3164 / RFC 5424, TCP + UDP
generator functional synthetic events for test/bench
heartbeat functional periodic events
beats functional Lumberjack v2 (Beats) protocol over TCP
elasticsearch functional search_after + Point-in-Time pagination (reqwest)
dead_letter_queue functional reads from the on-disk DLQ
pipeline functional pipeline-to-pipeline (multi-pipeline mode)
kafka real (live-validated) rdkafka async StreamConsumer: subscribe, recv loop, codec decode, auto offset commit. Live round-trip validated against real Apache Kafka 3.9.1 (and redpanda) via an #[ignore] smoke test (KAFKA_BROKERS); not run in CI. consumer_threads/max_poll_records parsed but not yet wired; no SASL/SSL passthrough; auto-commit only
redis real (live-validated) async client: BLPOP (list), SUBSCRIBE/PSUBSCRIBE (channel/pattern), AUTH + SELECT. Password-only AUTH (no username/ACL), no TLS (rediss://), pub/sub key is a single channel/pattern
s3 real (live-validated) aws-sdk-s3: paginated ListObjectsV2 + GetObject poll, in-memory seen-key dedup, optional delete_after_read. Seen-key set is not persisted (reprocesses non-deleted objects after restart — no sincedb); no SQS-notification mode
sqs real (compile/unit-validated) aws-sdk-sqs long-poll ReceiveMessage, codec-decode the body, emit, DeleteMessage to ack (delete_after_read, default true). queue (name → GetQueueUrl) or queue_url; endpoint for LocalStack. Unit-tested; #[ignore] LocalStack live smoke (SQS_TEST_QUEUE_URL)
jdbc real (sqlite-tested) native sqlx Any (PostgreSQL/MySQL/SQLite by URL scheme — no Java driver; JDBC URLs translated). Polls statement on interval / schedule, :sql_last_value incremental tracking (persisted), optional LIMIT/OFFSET paging, row→event column mapping. SQLite-tested in CI
rabbitmq real (compile/unit-validated) lapin AMQP: queue_declare (+ optional exchange/key bind), basic_consume, codec decode, basic_ack (when ack). host/port/vhost/user/password/durable. rustls TLS stance. No TLS (amqps)/x-args/prefetch tuning yet. Unit-tested; #[ignore] live smoke (RABBITMQ_URL)
cloudwatch real (compile/unit-validated) aws-sdk-cloudwatch GetMetricStatistics polled every intervals over [now-interval, now], one event per datapoint (metric/namespace/statistic values/unit). namespace (required), metric_names(alias metrics), period, statistics. No dimension filtering / GetMetricData discovery yet. Unit-tested; #[ignore] live smoke

Filter plugins (37 registered)

Plugin Status Notes
grok functional ~50 built-in patterns (IP, TIMESTAMP_ISO8601, COMBINEDAPACHELOG, …) via the regex crate
http functional one HTTP request per event (reqwest); url/headers/body with %{field} interpolation, response → target_body (default http_response, JSON parsed when possible) + optional target_headers; tags _httprequestfailure on transport error or non-2xx
mutate functional rename/replace/uppercase/lowercase/strip/gsub/convert/split/join/add/remove
json functional parse JSON strings into fields
date functional ISO8601, UNIX, UNIX_MS, custom formats
dissect functional delimiter-based extraction (no regex)
kv functional key=value extraction
drop functional drop events
clone functional duplicate events
ruby functional full Ruby via embedded Artichoke interpreter
script / painless functional native Painless-style DSL (ferro-script), parsed once + interpreted natively (a Cranelift JIT path exists for numeric scoring)
sleep functional rate limiting / delay
aggregate functional stateful cross-event aggregation
throttle functional rate-based throttling
translate functional dictionary / file-based lookup
fingerprint functional MD5, SHA1, SHA256, etc.
useragent functional UA parsing via built-in regex patterns (not the full uap database)
csv functional CSV field extraction
urldecode functional percent-decoding
split functional split a field into multiple events
truncate functional length capping
prune functional allowlist/denylist of fields
xml functional XML parsing into fields
metrics functional meter/counter events
de_dot functional replace . in field names
json_encode functional serialize a field to a JSON string
bytes functional parse human byte sizes (e.g. 1.5kB)
cidr functional match address(es) against CIDR network(s) (IPv4/IPv6); on match applies add_field / add_tag
uuid functional set a v4 UUID into target (with overwrite)
syslog_pri functional decode syslog PRI into facility/severity codes + labels (default PRI 13)
anonymize functional replace field values with a consistent hash (SHA1/256/384/512, MD5, MURMUR3; optional HMAC key)
geoip real (live-validated) maxminddb lookups against a configured .mmdb (database field), full Logstash-style subfields. Falls back to private/loopback/public classification when no database is set. Validated against a real GeoLite2-City database
dns real (live-validated) hickory-resolver forward (A/AAAA) and reverse (PTR) lookups, custom nameserver, Replace/Append action. Validated against 8.8.8.8
elasticsearch real (live-validated) reqwest _search with host failover, query-template %{field} sprintf, hits→field mapping. Live-validated against real Elasticsearch 8.15.3 (a seeded hit is mapped into the target field) via an #[ignore] smoke test (ES_URL); not run in CI
memcached real (compile/unit-validated) sync memcache client (multi-host hosts, consistent hashing) via tokio::task::spawn_blocking: get (key→field) and set (field→key) maps with %{field}-aware keys, namespace prefix, ttl. Plaintext only (OpenSSL tls feature disabled to stay rustls-only). Unit-tested; #[ignore] live smoke (MEMCACHED_HOST)
jdbc_streaming real (sqlite-tested) per-event SQL enrichment via native sqlx Any (Postgres/MySQL/SQLite, no Java driver); :param placeholders rewritten to positional binds, parameters resolved from event fields / %{}; matched rows → target array; bounded FIFO/TTL result cache (cache_size/cache_expiration); tags _jdbcstreamingfailure on error. SQLite-tested in CI
jdbc_static real (sqlite-tested) loads loaders reference tables into memory once (lazy refresh via refresh_interval), enriches events via in-memory keyed local_lookups (key_column + %{}/field key) → target array. Subset: no local in-memory SQL DB / joins (single keyed lookup per entry); tags _jdbcstaticfailure on load error. SQLite-tested in CI

Output plugins (21 registered)

Plugin Status Notes
stdout functional json, rubydebug, line, dots
elasticsearch (aliases ferrosearch, opensearch) functional Bulk _bulk API via reqwest
file functional JSON lines or custom format
graphite functional Carbon plaintext over TCP (metric value timestamp); metrics map (%{field}-aware) or fields_are_metrics for all numeric fields
http functional POST/PUT/PATCH
tcp functional TLS via rustls
udp functional codec-encoded datagrams via tokio::net::UdpSocket (best-effort, fire-and-forget)
csv functional append CSV rows to a file; fields define column order, csv_options (separator/quote)
null functional discard (benchmarking)
pipe functional writes each codec-encoded event (codec json/line, or a %{} message_format) to a long-lived sh -c <command> child's stdin, relaunching once on broken pipe
pipeline functional pipeline-to-pipeline (multi-pipeline mode)
kafka real (live-validated) rdkafka FutureProducer: codec serialize, key sprintf, compression/acks/retries, flush. Live round-trip validated against real Apache Kafka 3.9.1 (and redpanda) via an #[ignore] smoke test (KAFKA_BROKERS); not run in CI
redis real (live-validated) async ConnectionManager: RPUSH (list) / PUBLISH (channel). Password-only AUTH (no username/ACL), no TLS (rediss://), key is a single channel
s3 real (live-validated) aws-sdk-s3 PutObject on rotation/flush (+gzip when encoding => "gzip"). New endpoint / force_path_style fields for MinIO/LocalStack/S3-compatible stores. Single PutObject (no multipart upload) in v1. Live-validated against real AWS S3 (write/list/read-back) and MinIO via an #[ignore] smoke test
datadog real (live-validated) reqwest POST to /api/v2/logs (DD-API-KEY, batched, retry/backoff). Live-validated against the real DataDog Log Intake (AP1) via an #[ignore] smoke test; a site shorthand selects the region
rabbitmq real (compile/unit-validated) lapin AMQP basic_publish to exchange with a %{field}-aware routing key; codec-encoded body; persistent delivery mode; lazy connection/channel. rustls TLS stance. No TLS (amqps)/publisher-confirm batching tuning yet. Unit-tested; #[ignore] live smoke (RABBITMQ_URL)
email real (compile/unit-validated) lettre SMTP, one message per event; to/subject/body/htmlbody are %{field}-aware (both body + htmlbody → multipart/alternative). With username/password → STARTTLS + SMTP AUTH (rustls); otherwise plaintext. Default from logstash@ferro-stash. Unit-tested; #[ignore] live smoke (SMTP_HOST)
sqs real (compile/unit-validated) aws-sdk-sqs SendMessage per event (codec-encoded body); queue (name → GetQueueUrl) or queue_url, endpoint for LocalStack. One send per event (no batch). Unit-tested
sns real (compile/unit-validated) aws-sdk-sns Publish per event to topic_arn (codec-encoded message, optional subject); endpoint for LocalStack. Unit-tested
jdbc real (sqlite-tested) native sqlx Any (PostgreSQL/MySQL/SQLite, no Java driver); ["INSERT … VALUES (?,?)", "field_a", "field_b"] statement, per-field bind + execute per event. SQLite-tested in CI
cloudwatch real (compile/unit-validated) aws-sdk-cloudwatch PutMetricData (batched ≤20); metricname/value/unit/dimensions derived per event via %{field} lookups; events with empty/unresolved name or non-numeric value are skipped. Unit-tested; #[ignore] live smoke (LocalStack/AWS)

Codecs (21 registered)

plain/line, json/json_lines, multiline, csv, script/ruby, rubydebug, dots, bytes, es_bulk, msgpack, fluent, graphite, cef, netflow (v5/v9/IPFIX), collectd, avro, protobuf, cloudfront, cloudtrail, nmap, edn/edn_lines.

Configuration

  • Logstash DSL (.conf) — input/filter/output blocks, plugin options with =>, hash and array literals.
  • YAML — an alternative structured format.
  • Conditionalsif / else if / else chains with mutually exclusive branch semantics; operators ==, !=, <, >, >=, <=, =~, !~, in, not in, and, or, nand, xor.
  • Field references — bracket notation [a][b][c].
  • Interpolation%{field} in strings; ${ENV_VAR} / ${ENV_VAR:default} environment expansion.

Deploy

The same engine ships three ways; pick by how you want it billed and run, not by feature set:

  • Self-managed (free, OSS) — build the single static binary with cargo build --release (see Quick start), or docker build the included Dockerfile for a container image. Apache-2.0, no fee, no entitlement check — the full engine.
  • AWS Marketplace — Container on Amazon EKS (Helm) — the default no-Ruby image plus AWS Marketplace entitlement metering, billed through your AWS account per pod-hour, for teams that want it on their AWS bill with a supported commercial path → listing. The Marketplace container verifies entitlement once at startup (AWS RegisterUsage) and fails closed if the copy is not entitled; the OSS image has no such check and runs unrestricted. Use the OSS image/root Dockerfile or AMI if you need the optional ruby filter.
  • AWS Marketplace — AMI (EC2) — the default no-Ruby binary as a Graviton/arm64 AMI, metered by AWS per instance-hour (no entitlement code), for high-throughput or VM-based deployments → listing.

Quick start

# Build (requires a C compiler for the Artichoke/mruby FFI and cmake for
# the rdkafka-backed kafka plugins — see Prerequisites)
cargo build --release

# Run with a Logstash DSL config
./target/release/ferro-stash -f config/example.conf

# Run with a YAML config
./target/release/ferro-stash -f config/example.yml

# Inline pipeline
./target/release/ferro-stash -e 'input { stdin { } } output { stdout { } }'

# Validate a config without running it
./target/release/ferro-stash --config.test_and_exit -f config/example.conf

# Enable the metrics API on loopback
./target/release/ferro-stash -f config/example.conf --api.enabled --api.http.host 127.0.0.1:9600

The CLI mirrors Logstash flag names (-f/--path.config, -e/--config.string, -w/--pipeline.workers, -b/--pipeline.batch.size, --log.level, --config.reload.automatic, etc.). The monitoring API exposes unauthenticated read-only stats for Logstash compatibility; keep it on loopback or a trusted network. Runtime log-level changes via PUT /_node/logging are off by default and require --api.runtime_logging.enabled=true.

Logstash DSL example

input {
  file {
    path => "/var/log/nginx/access.log"
    start_position => "beginning"
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
    match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
  }
  mutate {
    convert => { "response" => "integer" }
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "logs-%{+%Y.%m.%d}"
  }
}

The Ruby / Artichoke compatibility story

Logstash's ruby { code => "..." } filter is used heavily in real deployments, so FerroStash embeds a Ruby interpreter to run that code unchanged. It does not shell out to CRuby or JRuby; instead it links Artichoke, an mruby-based Ruby implementation written in Rust, through the ferro-stash-ruby crate. Events are marshalled to a Ruby Hash at the FFI boundary and read back afterwards, so Ruby code cannot corrupt Rust memory.

Performance trade-off (honest): the Artichoke (mruby) interpreter has no JIT and pays a per-event Rust↔Ruby serialization cost, so the Ruby filter is measurably slower than Logstash's JRuby on the same code (~13× slower against Logstash 9.4.2 in our benchmark — see Performance). The Ruby filter exists for migration compatibility, not throughput. For custom logic that needs to be fast, prefer the native script (Painless-style) filter: it is executed natively (parsed once, no JVM) and in our benchmark ran ~3.6× faster than Logstash's JRuby and ~48× faster than the mruby filter.

Optional, off by default. The Ruby filter lives behind the ruby cargo feature and is not built by default, so the common build is light and needs no extra toolchain:

cargo build                                        # default — no Ruby/Artichoke
cargo build -p ferro-stash --features ruby         # CLI with the Ruby filter

A pipeline that uses ruby { ... } in a binary built without the feature fails fast with a clear "rebuild with --features ruby" error rather than silently dropping the filter.

Fork dependency (maintenance note): ferro-stash-ruby depends on a fork of Artichoke pulled as a rev-pinned git dependency, so a fresh clone builds the Ruby feature with no sibling checkout:

# crates/ferro-stash-ruby/Cargo.toml
artichoke-backend = { git = "https://github.com/abyo-software/artichoke-extended", rev = "245b894...", ... }
artichoke-core    = { git = "https://github.com/abyo-software/artichoke-extended", rev = "245b894..." }

The fork (branch extended) carries local patches needed for Logstash Ruby-filter compatibility. Notes:

  • A fresh clone buildscargo build --features ruby fetches the pinned fork revision automatically; there is no submodule or sibling-checkout requirement. The default build doesn't fetch it at all.
  • Reproducible pin. The exact rev is recorded in Cargo.toml / Cargo.lock; bump it deliberately to adopt fork updates.
  • Bus-factor and upstream risk. The Ruby filter's long-term viability is tied to maintaining this fork. It is deliberately isolated in its own crate (and behind a feature) so the rest of the pipeline is unaffected if Ruby support is dropped or reworked.

Architecture

┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐
│  Input  │──▶│  Codec  │──▶│ Filter  │──▶│ Output  │
│ plugins │   │ decode  │   │ plugins │   │ plugins │
└─────────┘   └─────────┘   └─────────┘   └─────────┘
        tokio async runtime (mpsc channels + backpressure)
Crate Responsibility
ferro-stash-core Event model, plugin traits, pipeline engine, conditions, buffering, metrics, DLQ
ferro-stash-config Logstash DSL parser + YAML config parser
ferro-stash-codec Codecs (21 registered)
ferro-stash-input Input plugins
ferro-stash-filter Filter plugins
ferro-stash-output Output plugins
ferro-stash-ruby Artichoke (mruby) Ruby interpreter bridge for the ruby filter
ferro-script Native Painless-style scripting engine — tree-walking interpreter (parsed once, reused per event); a Cranelift JIT path exists for numeric scoring. Powers the script filter/codec
ferro-stash-cli ferro-stash binary: CLI, signal handling, metrics API
ferro-stash-e2e Integration / Logstash-parity test harness (no library code)

More detail: docs/ARCHITECTURE.md.

Build, test, run

# Default build: light, no Ruby/Artichoke (operates on default-members)
cargo build
cargo test
cargo clippy --all-targets
cargo fmt --all -- --check
cargo deny check

# Optional Ruby filter (pulls Artichoke from its git dependency; needs clang/gcc + cmake)
cargo build -p ferro-stash --features ruby
cargo test  -p ferro-stash-filter --features ruby

The default build excludes ferro-stash-ruby (it is not a default-member), so it is fast and toolchain-light. cargo build --workspace additionally compiles the Ruby crate.

Prerequisites

  • Rust stable — the default build needs 1.75; building cargo build --workspace or --features ruby (the ferro-stash-ruby / Artichoke crate, edition 2024) needs 1.88. (The rust-1.75 badge reflects the default build.)
  • cmake — required by the kafka plugins, which pull rdkafka and build a vendored librdkafka via CMake. (TLS in the connectors uses rustls, so no system OpenSSL is needed.)
  • A C compiler (clang or gcc) — only for the optional ruby feature (Artichoke/mruby FFI). The default build does not need it.
  • Runtime, not build-time: the geoip filter needs a user-supplied .mmdb (GeoLite2/GeoIP2) database file at the configured database path; it is not vendored.

Fuzzing

Four cargo-fuzz targets live under fuzz/: codec_decode, logstash_dsl_parse, netflow_decode, and cef_decode. The 2026-05-02 wave surfaced and fixed three production DoS panics (protobuf/avro offset overflow, DSL UTF-8 char-boundary panic); regression seeds are committed.

FerroSearch integration

The elasticsearch output (aliases ferrosearch / opensearch) speaks the Elasticsearch Bulk (_bulk) API and is the intended sink for FerroSearch:

Data sources → FerroStash → FerroSearch → Applications

Honest limitations

  • Connector plugins are live-validated, but via manual smoke tests (not continuous CI). The ten formerly-stub plugins (kafka, redis, s3 input and output; datadog output; geoip, dns, elasticsearch filters) perform real external integrations and are now live-validated against real services by #[ignore], env-gated smoke tests. Those tests are run manually with the service available (locally via Docker, or against a real cloud account for S3/DataDog) — they are not part of the automated CI run, which has no brokers or credentials. The smoke tests verify reachability + a real round-trip (and, for the ES filter, that a seeded hit is mapped); they are not exhaustive conformance suites. What was validated, and the feature residuals that remain regardless of validation:
    • kafka in/out — produce/consume round-trip against real Apache Kafka 3.9.1 (and redpanda). Residuals: consumer_threads/max_poll_records parsed but not yet wired; no SASL/SSL security.protocol passthrough; auto-commit only.
    • redis in/out — against real Redis. Residuals: password-only AUTH (no username/ACL); no TLS (rediss://); a pub/sub key is a single channel/pattern (no comma-split list).
    • s3 in/out — against real AWS S3 (object written, listed, and read back) and MinIO (via endpoint/force_path_style, supported on both input and output). Residuals: input seen-key dedup is in-memory, so non-deleted objects are reprocessed after a restart (no sincedb); no SQS-notification mode; delete_after_read deletes immediately after emit; output is a single PutObject (no multipart) in v1.
    • datadog output — against the real DataDog Log Intake (AP1 account). A site shorthand (us1/us3/us5/eu/ap1/us1-fed) selects the intake host; host overrides it for proxies.
    • elasticsearch filter + output — against real Elasticsearch 8.15.3: the filter maps a seeded hit into the target field; the output bulk-indexes an event and the document is counted back.
    • geoip filter — maxminddb lookups, against a real GeoLite2-City database (user-supplied .mmdb at database, not vendored; falls back to private/loopback/public classification when unset).
    • dns filter — hickory-resolver forward (A/AAAA) and reverse (PTR) lookups, against 8.8.8.8.
  • Plugin catalogue is scoped. The registered surface covers production-common Logstash usage, not Logstash's full 200+ plugin ecosystem. There is no dynamic plugin loading; everything is compiled in.
  • Logstash DSL coverage is a subset. Common syntax (plugin blocks, conditionals, hash/array literals, interpolation, field references) is supported; exotic operators and unusual array-indexing forms are not exhaustively covered.
  • Ruby filter is slower than JRuby and depends on a maintained fork. See the Ruby/Artichoke section. The fork is pulled as a rev-pinned git dependency and is optional/off by default.
  • Enterprise features absent. No centralized/Kibana management, no X-Pack security, no keystore. A persistent queue and DLQ exist in the core crate but are not a full Logstash-parity feature set.
  • Persistent queue provides at-least-once delivery (duplicates possible, not exactly-once). queue.type: persisted advances its durable cursor only after an event reaches a terminal state — not when it is dequeued for processing. An entry is terminal once every matching output's output() returned Ok for it (the delivery point), OR it was intentionally dropped by a filter, OR its delivery/filter failure was durably captured in the DLQ. An entry that is read but not yet terminal when the process crashes is replayed on restart. Consequences to know:
    • Duplicates. An event delivered in the window after delivery but before its ack is checkpointed re-delivers on restart; with multiple outputs, replaying an entry that one output already took re-delivers to that output. Make outputs idempotent (e.g. document IDs) where it matters — exactly-once is not provided.
    • Buffering outputs are flushed before their entries are acked. Most outputs deliver synchronously within output(); s3 buffers and uploads on rotation. To keep the guarantee for s3, the pipeline flushes every output to durability before each durable ack (periodic and at shutdown) and only advances the cursor when the flush succeeds — so an entry is acked only after the output it was delivered to has durably persisted it. The cost: with a persistent queue, s3 uploads on the ack cadence (pipeline.batch.delay) rather than only on its own rotation, so set pipeline.batch.delay high enough to keep object sizes reasonable. A crash between a successful flush and the checkpoint just re-delivers (duplicate), never loses.
    • Durability scope: process crash by default; power loss with fsync. By default the PQ segments/checkpoint and the DLQ are flushed to the OS (page cache), not fsync'd — so the at-least-once guarantee covers a process crash/restart but not a power loss / kernel panic. Set queue.fsync: true (and dead_letter_queue.fsync: true) to fsync every append and an atomic, fsync'd checkpoint (temp→fsync→rename→dir-fsync), at a significant throughput cost (a disk sync per append) — use it when the host can lose power and committed events must survive.
    • Failure handling. A failed delivery (or filter error) is acked only if it is durably captured in the DLQ; if there is no DLQ, the DLQ is full, or the DLQ write fails, the entry is left un-acked and replays. A persistently failing output with no DLQ therefore backs the queue up (the durable buffer) rather than dropping. Enable the dead-letter queue to capture failures (with the real event payload) for replay via the dead_letter_queue input.
    • Sizing. Because entries are retained until terminal (not just until read), size queue.max_bytes for the in-flight/undelivered window: if the queue reaches max_bytes while delivery lags, new events fall back to the non-durable in-memory path (best-effort, not replayable), re-opening a durability gap. The duplicate/replay window is otherwise bounded by the output flush interval (pipeline.batch.delay).
  • Single developer; no production deployments. Bus factor 1; no operational history. Performance numbers come from one benchmark environment.
  • Parity evidence is per-fixture. The 24 field-for-field fixtures (run in-process by logstash_compat_test and end-to-end by runner.py) cover ~17 filters and the stdin/stdout path against Logstash 9.4.2; they do not cover every implemented plugin, codec, or edge case. Each golden file is generated from the real Logstash oracle via tests/logstash-compat/gen_expected.py. See the compatibility matrix for the explicit scope.
  • Dotted JSON keys are auto-nested. The json filter expands a key containing dots (e.g. "app.name") into a nested object (app: { name }), whereas Logstash keeps it as a single literal field name. Consequently the de_dot filter — whose purpose is to flatten such keys — is a no-op on keys that arrived through json, since they are already nested by the time it runs. de_dot still works on genuinely flat dotted field names. Aligning the json filter's dotted-key handling with Logstash is tracked as future work.

Documentation

Area Docs
Get started Quick start · onboarding / build · configuration
Reference architecture · plugins · Logstash compatibility scope · compatibility matrix
Proof & trust Performance · parity harness · benchmarks · honest limitations
Project CHANGELOG · release notes · security · contributing

More from abyo software

FerroStash is part of a family of Rust infrastructure tools from abyo software; several ship on AWS Marketplace under one seller account — browse the catalog at abyo software on AWS Marketplace.

Product What it does
FerroStash This project: a Logstash-compatible data pipeline in Rust.
S4 — Squished S3 Transparent GPU/CPU compression gateway in front of S3 — cut storage 50–80%.
S4 Logs CloudWatch Logs → S3 archiver that cuts log-storage cost.
S4 Scan Amazon Athena scan-cost reducer.
S4 NAT Cost-optimized NAT for Amazon VPC.
S4 MockAPI Security API simulator for testing and demos.

Contributing

Pull requests welcome — see docs/CONTRIBUTING.md for setup, conventions, and the test/fuzz protocol. Contributions are licensed under Apache-2.0 (no separate CLA).

Security

Found a vulnerability? Please do not open a public issue — follow docs/SECURITY.md for coordinated disclosure.

License

Apache-2.0 — see LICENSE. Third-party license summary: LICENSES.md. The optional ruby feature pulls a fork of the Artichoke (mruby) interpreter at build time (Apache-2.0/MIT; see its repo). Changelog: CHANGELOG.md; GA release notes: RELEASE_NOTES_1.0.0.md.

"FerroStash" is an unregistered trademark of abyo software 合同会社. "Logstash", "Elasticsearch", and "Elastic" are trademarks of Elasticsearch B.V.; FerroStash is an independent reimplementation and is not affiliated with, endorsed by, or sponsored by Elastic.

Authors

  • abyo software 合同会社 — sponsoring organization, commercial distribution
  • masumi-ryugo — original author / maintainer

About

FerroStash — a Logstash-compatible data pipeline written in Rust

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors