Skip to content

spec: plugin protocol for out-of-tree backends (RFC — v2.0) #26

@awbx

Description

@awbx

Deliverable: an RFC Q-002 entry in spec/OPEN_QUESTIONS.md, with a fleshed-out design, defining an out-of-tree backend protocol so the community can ship backends without bloating cronix core.

Why v2.0: companion to multi-strategy fire (#25). Together they answer "support every cron in existence" — plugins for backends (where the scheduler entry lives), multi-strategy for fire mechanisms (how the job is invoked). Independent of #25; neither blocks the other.

Why this is the right strategic move: a single maintainer cannot sustain a fast-growing in-tree backend list. Every new backend in core is API drift, auth quirks, rate limits, error messages, and edge-case PR review. Terraform proved that a stable plugin protocol + a community registry scales to 4,000+ providers while HashiCorp maintains under 30 of them. The same pattern fits cronix: core ships the spec, the trigger shim, the SDK, and a curated set of reference backends; everything else lives out-of-tree.

Design principles

  • Stable contract. Plugin protocol gets its own semver, separate from cronix core. A v1 plugin runs against any v2.x.y cronix without recompile.
  • Language-neutral. A plugin can be written in Go, Rust, Python, anything — the protocol is gRPC, not a Go interface.
  • Local-only. Plugin binary runs on the same host as cronix apply, communicated via Unix domain socket or pipe. No network, no daemon, no remote plugin server.
  • Out-of-tree means out-of-conformance until proven. Plugin backends pass the same spec/conformance/ vectors as in-tree backends before they can claim "cronix-compatible."

Protocol shape (proposal — Terraform-plugin-shaped)

  • Discovery: cronix looks for plugin binaries in (1) $CRONIX_PLUGIN_DIR if set, (2) ~/.cronix/plugins/, (3) /usr/local/share/cronix/plugins/. Binary naming: cronix-backend-<name> (matches the Terraform terraform-provider-<name> shape).
  • Handshake: cronix execs the plugin binary, plugin prints a versioned handshake to stdout (CRONIX-PLUGIN-V1 <socket-path>), then listens on the socket. cronix dials in.
  • gRPC service: BackendPlugin with the same RPC set as the in-tree Backend interface: Read, Plan, Apply, Drift, Prune, Adopt, VerifyOwnership (see reconciler: cronix verify-ownership — operator-side audit of owned entries #20).
  • Lifecycle: one plugin process per cronix apply invocation. cronix kills the plugin on exit. No long-lived plugin daemons.
  • Errors: gRPC status codes mapped to cronix error categories. Plugin panics propagate as Internal with the panic message + plugin binary version for bug reports.

Required content for the Q-002 entry

  • .proto file with the BackendPlugin service and message shapes
  • Handshake spec (stdout format, signal handling, timeout)
  • Discovery spec (search paths, naming, override env vars)
  • Protocol version negotiation (plugin advertises supported protocol versions; cronix picks the highest both speak)
  • Conformance: plugin backends pass spec/conformance/ vectors via the same runner (spec: extract spec/conformance/ runner — language-neutral harness #6); the runner gets a plugin-target mode in addition to its existing in-tree mode
  • Trust model — see below

Trust and security (this is the hard part)

A plugin binary runs with the same privileges as cronix apply. That's full access to the host's filesystem, network, and credentials. Today the in-tree backends have this access too — but you trust them because they live in internal/backend/ and ship via signed releases. Plugins don't.

Three viable models, ordered by complexity:

  1. Checksums in manifest. The cronix manifest gains an optional plugins: block declaring expected SHA-256 hashes of backend binaries. cronix refuses to load a plugin whose hash doesn't match. Operators are responsible for pinning the version they trust. Simple, mechanical, no new infrastructure. Recommended starting point.
  2. Cosign-signed plugin releases. Plugin authors sign their releases with cosign (mirroring cronix's own supply-chain: cosign-sign release artifacts via GoReleaser #2 supply-chain work). cronix verifies the signature against a key declared in the manifest, or against a community-maintained trusted-keys file.
  3. Sandboxed execution. Plugin runs in a seccomp / Landlock / cgroup-restricted environment. cronix declares the syscalls / paths / network endpoints the plugin needs at handshake; the kernel enforces. Much harder, much later. Out of scope for v2.0.

The RFC should pick model 1 for v2.0 and explicitly defer 2/3 as follow-ups, with a written commitment that the manifest gains a plugins: checksum block only via additive schema change.

Registry / discovery (the "where do I find plugins" question)

  • v2.0: no registry. Plugins are GitHub repos with the topic cronix-backend. cronix docs maintain a curated list under docs-site/content/community/plugins.md, gated on conformance-vector passing + reasonable security hygiene.
  • Later (v2.x): optional centralized index at plugins.cronix.dev if community traction warrants it. No commitment to build this in v2.0.

Open questions to surface in Q-002

  • gRPC vs alternatives (HTTP, stdio JSON-RPC, native FFI). gRPC is the obvious pick but worth recording why we rejected alternatives.
  • Should the plugin protocol number be tied to cronix major version (plugin-v2 = cronix-v2.x) or independent (plugin-v1 stays valid through cronix-v2-v3-v4)? Independent is the Terraform model and likely correct, but should be argued explicitly.
  • Configuration plumbing: how does a plugin backend receive its config (API keys, region, etc.)? Probably: cronix CLI accepts --backend-config <name>=<value> flags, passes them via gRPC at session open. Manifest does NOT carry plugin credentials.
  • Reentrant cronix adopt (reconciler: cronix adopt — take ownership of pre-existing entries #11) — a plugin must implement Adopt for users to migrate to it. Make this part of the conformance bar?
  • Plugin failure modes: what happens if the plugin binary crashes mid-apply? Probably: cronix aborts the apply, no entry is partially-modified, operator gets the panic + plugin version in the error.

Acceptance criteria for this issue (the RFC, not the implementation)

  • Q-002 entry committed to spec/OPEN_QUESTIONS.md
  • .proto draft committed to spec/plugin/backend.proto (does NOT make plugin protocol "official" until D-NNN promotion)
  • Trust model decision documented (model 1 recommended, 2/3 deferred with rationale)
  • Discovery + handshake + lifecycle + error mapping all spelled out
  • Public RFC discussion happens via this issue's comments
  • When resolved: promote to D-NNN in spec/DECISIONS.md. Implementation work spawns child issues (host-side plugin loader, conformance runner plugin mode, reference "echo" plugin, etc.).

Stretch: reference plugin for testing the protocol

A trivial cronix-backend-echo reference plugin that logs every RPC and reconciles to an in-memory map is the right way to validate the protocol design before depending on it. Could be filed as a follow-up child issue once Q-002 lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/reconcilercronix apply / plan / drift / prunearea/specRFC, decisions, conformance vectorskind/specSpec change (manifest, header, signing, backend semantics)needs-rfcSpec change pending Q-NNN entrytriageAwaiting maintainer review

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions