Add opt-in MariaDB-compatible services#138
Conversation
c765f98 to
d00e0ed
Compare
|
FYI I also built + tested this on Amazon Linux |
|
Thanks for the contribution @bherila! Let me review it |
|
Thanks. I would really like to make temps my standard devprod stack so I appreciate everything you have built so far, and it's always nice to not have to maintain a disparate fork :) Open to switching to mysql if that is what you prefer-- Mariadb is really just for nicer licensing. I have a few more coming too. |
8c045b7 to
8a580b1
Compare
|
A few points before this PR is considered ready:
|
|
No WAL-G equivalent exists for MariaDB. The faithful analog of "base backup + WAL archive" is mariadb-backup (physical base) + binary-log archiving, with binlog replay for PITR. However, the mariadb:lts stock image already bundles mariadb-backup/mysqlbinlog, so we can use that. |
Wire MariaDB into the canonical V2 BackupEngine system. Previously
`resolve_engine_key("mariadb")` returned Unsupported, so scheduled
MariaDB backups were silently skipped by enqueue_scheduled_run.
- mariadb_physical: physical `mariadb-backup --stream=mbstream | gzip`
base backup to S3 (PITR engine; analog of postgres_walg). Records
binlog coordinates (file/position/GTID) parsed from mariadb-backup
stderr into the metadata.json companion as the PITR replay anchor.
Success is verified via the "completed OK!" marker since the
container's dash shell has no pipefail.
- mariadb_dump: logical `mariadb-dump` fallback (analog of
postgres_pgdump), no PITR.
- mariadb_exec: shared standalone docker-exec helpers (stream stdout to
file, capture stderr) + binlog-position parser. Kept in temps-backup
to avoid a circular dep on temps-providers.
- dispatch: new "mariadb" arm probing both mariadb-backup and
mariadb-binlog on container `mariadb-{name}`; falls back to
mariadb_dump when the PITR tools are absent.
- plugin: register both engines (7 -> 9).
Credentials flow via MYSQL_PWD env, never argv (cf. PR gotempsh#149); unit
tests pin that invariant plus binlog-coordinate parsing and dispatch
fallback.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
…tting Make Temps-managed MariaDB containers PITR-capable by enabling binary logging, the MariaDB analog of Postgres WAL archiving. - New BinlogArchiveInterval setting (1m/5m/15m/60m, default 5m) exposed in the parameter schema as an editable dropdown — the user controls PITR granularity (recovery-point objective) per service at runtime. - binlog_expire_logs_seconds is derived from the interval (>= 6x, floor 1h) so a binlog segment is never purged locally before the archiver ships it (the PITR continuity invariant). - Container cmd now appends --log-bin, --binlog-format=ROW, a stable non-zero --server-id derived from the service name, --sync-binlog=1 (durable binlog tail), and the derived retention. - Container env pins TZ=UTC so binlog timestamps — and therefore `mysqlbinlog --stop-datetime` PITR targets (RecoveryTarget::Time is UTC) — are unambiguous. Design note: binlog is enabled by default at (re)create rather than lazily on first backup. This avoids a cross-crate enable trigger (the backup runs in temps-backup; the container lifecycle lives here) and the silent-drop risk of marker-driven args. Existing containers adopt binlog on their next recreate (e.g. image upgrade); a healthy running container is not force-recreated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Add the "frequent scheduled ship" half of MariaDB PITR: a per-service
background task that archives closed binary-log segments to S3 so
point-in-time recovery has the logs to replay.
- MariaDbService::archive_binlogs: one idempotent run — FLUSH BINARY
LOGS, SHOW BINARY LOGS, ship each closed segment (download via
bollard tar stream -> gzip -> S3 PUT), update a manifest. Advances
the manifest's last_shipped_file only past segments that actually
uploaded, stopping at the first failure so the replay chain stays
contiguous.
- S3 layout (the contract the restore path consumes):
{prefix}/external_services/mariadb/{svc}/binlog/{file}.gz and
.../binlog/manifest.json (BinlogManifest).
- Pure, unit-tested helpers reused by restore: parse_show_binary_logs,
closed_binlog_files, binlogs_to_ship, binlog_object_key,
binlog_manifest_key.
- Trigger: ExternalServiceHealthMonitor runs the archiver for running
standalone mariadb services on their binlog_archive_interval cadence
(cheap in-memory interval gate checked before the backup-schedule DB
scan). S3 destination is discovered from a covering enabled backup
schedule; no schedule => skip. All archiver failures are swallowed so
health monitoring is never disrupted.
Credentials flow only via MYSQL_PWD/MARIADB_PWD exec env, never argv.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Implement the restore side of MariaDB PITR on the ExternalService trait. - restore_capabilities: pitr=true, in-place + to-new-service. - restore_from_s3: detects a physical base.mbstream.gz vs a logical .sql.gz and dispatches; logical path refactored into a shared helper. - restore_to_new_service: clones the source config onto a fresh port, creates a new container, restores the base into it. - restore_pitr (core): guards that the base is a physical mariadb-backup with binlog coordinates (logical-only backups are rejected with a greppable "PITR ... physical" error); validates the recovery target BEFORE destroying data; restores the base; then forward-rolls. Physical base restore mirrors the postgres swap: disable restart policy -> stop -> ephemeral helper container (volumes_from, root) that mbstream-extracts, `mariadb-backup --prepare`, wipes the datadir, `--copy-back`, and `chown -R mysql:mysql` -> re-enable restart policy (always, even on failure) -> start -> wait healthy. The gunzipped mbstream is uploaded onto the created-but-not-started helper's writable layer (volumes_from can't add binds; avoids binary-over-exec corruption). Forward-roll reads the binlog manifest, downloads every shipped segment >= the base's binlog_file, and replays them in a SINGLE `mysqlbinlog --disable-log-bin --start-position=<base pos> [--stop-datetime| --stop-position] file1 file2 ... | mariadb` invocation (replayed events are not re-logged; a non-zero exit surfaces as a failure, never a silent half-apply). Recovery-target mapping: Time -> --stop-datetime (UTC); Lsn -> file:pos via --stop-position on the final segment; Xid (GTID) and Name fail fast with typed errors pending the Docker E2E hardening pass. Credentials flow only via MYSQL_PWD/MARIADB_PWD exec env, never argv. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Add a binary-log health probe mirroring postgres_wal_health, surfacing
operational PITR risks to the UI/health metadata:
- BinlogDisabled (Critical) — log_bin off, PITR impossible.
- NonRowBinlogFormat — binlog_format != ROW, PITR fidelity risk.
- LargeBinlogBacklog { segment_count, total_bytes } — local binlogs
accumulating (> 50 segments or > 10 GiB), e.g. archiver behind or
retention too high / disk pressure.
probe_binlog_health connects via sqlx MySQL (root), reads log_bin /
binlog_format / binlog_expire_logs_seconds / gtid_strict_mode and
SHOW BINARY LOGS (count + total bytes), returns None on any connection
failure. Pure compute_warnings is unit-tested (14 tests); a
testcontainers integration suite (3 tests) covers binlog-off, binlog-on,
and bad-connection against real mariadb:lts. Credentials never logged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Restore-validation rehearsal against real mariadb:lts revealed the replay path hardcoded `mysqlbinlog`, which does NOT exist in the official image (only `mariadb-binlog`/`mariadb` are present; `mysqlbinlog`/`mysql` are not) — PITR replay would have failed on every real deployment. - replay_binlogs now resolves the tool at run time: prefer mariadb-binlog (and mariadb client), fall back to mysqlbinlog/mysql for non-MariaDB images. - Replay decodes to an intermediate file before feeding the client, so under `set -e` a failed decode aborts and surfaces as an error instead of being masked by the client's exit code in a pipe (no silent half-apply). The prior comment claimed this guard but the code piped. Verified end-to-end with a shell rehearsal: physical base backup + binlog archive + restore (mbstream extract -> --prepare -> copy-back -> chown) + mariadb-binlog --stop-datetime replay correctly recovers to a point in time (rows before T kept, after T dropped); the mariadb-backup binlog-position stderr format matches parse_binlog_position. Also adds 25 unit tests bringing MariaDB to parity with the Postgres provider: full database-name validation matrix, config/schema defaults and editability, address/env routing (baremetal + docker), import parsing, and the upgrade-is-not-implemented (MARIADB_AUTO_UPGRADE) decision. 74 mariadb tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
MyISAM/Aria tables are not crash-safe and don't recover consistently under point-in-time recovery — a documented cause of PITR failure. The binlog-health probe now counts non-InnoDB user tables (excluding system schemas) and emits a NonInnodbTables warning so operators can convert them to InnoDB before relying on PITR. Complements the other PITR-failure guards already in place: sync_binlog=1 and binlog_format=ROW (commit 489f344) address lost-binlog-on-crash and non-deterministic temp/in-memory/distributed-transaction replay; the LargeBinlogBacklog warning flags binlog disk pressure. +2 unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
…PITR Make MariaDB point-in-time recovery work out of the box. Previously a MariaDB service had no base backup unless an operator manually created a schedule, so binlog archiving had nothing to anchor to. A reconcile loop (5-min tick in the backup plugin) ensures every MariaDB service has a covering daily full-backup schedule once a default S3 source is configured: - New `external_services.default_backup_provisioned` boolean (migration m20260623_000001, NOT NULL DEFAULT false, idempotent IF NOT EXISTS) is a one-shot latch: reconcile only provisions services where it's false and sets it true after creating the schedule, so a schedule the operator later deletes is never recreated. - reconcile_default_external_service_schedules: resolves the default S3 source (skips quietly + retries next tick if none exists yet — handles storage configured after the service), then for each unprovisioned MariaDB service creates a daily `0 0 3 * * *` (03:00 UTC) full schedule via the existing create_backup_schedule + attach_services_to_schedule (retention 14d, target_all=false, control_plane=false, default source). Per-service failures are logged and skipped so one can't block others; each creation is logged for operator visibility. Pairs with the 5-minute binlog ship default: daily physical base + continuous binlogs = restore to any point in time, with no manual setup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
30f9816 to
54f3cff
Compare
|
I'm still testing/debugging this fyi |
54f3cff to
eec11f5
Compare
sounds good, if you can try it in a real instance with a lot of data it would be way better to harden it more before it reaches the main repo |
|
I don't actually have such a site to test with real data. I only have a very small database with only a handful of daily users (<10 users) right now, hence why I've been using full snapshots for daily backups. So I'd welcome any volunteer assistance available to test with a larger database. |
39f6dc0 to
23d5333
Compare
23d5333 to
e3fa423
Compare
…vices-upstream-pr # Conflicts: # CHANGELOG.md # crates/temps-migrations/src/migration/mod.rs
…vices-upstream-pr # Conflicts: # CHANGELOG.md
|
Update from this pass:
Design summary for review:
Verification on GitHub Actions:
Local checks also passed on both updated branches:
|
|
I saw the Postgres updates, let's leave other databases fixes outside of this PR. Also, the decision to not wait until a backup is completed is on purpose. Databases with small size have no issues, but when backing up 500-1000 GB we can't do the backups synchronous. Also, imagine if server is restarted, the state will be inconsistent. That's why for backups we need to use docker containers in detach mode and then have a way to check the state of a database |
|
Please, keep this PR scoped to MariaDB @bherila |
|
I will update again, I don't have a way to reset back to draft status. However, the postgres fix is required 1st as it blocks the mariadb e2e test suite from working correctly. |
|
I'm going to open a separate PR for the postgres fix, and then after that's merged it will rebase out of this PR. |
|
Moved the Postgres major-upgrade hardening out into its own focused PR: #151 That PR contains only the Postgres upgrade fix set:
I also kicked off a fork Once #151 lands, I’ll rebase this MariaDB PR and drop those Postgres commits from #138 so this PR stays scoped to MariaDB/PITR. |
Summary
Adds an opt-in MariaDB-compatible external service for hosted projects while leaving Temps' own internal PostgreSQL database unchanged.
This includes:
mariadb-dumpwithmysqldumpfallbackNotes
The service uses MariaDB images by default while exposing MySQL-compatible connection URLs and environment variables, since most frameworks and drivers connect to MariaDB through MySQL-compatible drivers.
MariaDB remains opt-in. Installing or starting Temps does not pull or run a MariaDB container; a container is created only when a user creates or imports a MariaDB external service. Linked projects receive separate per-project/per-environment databases inside that shared service.
Validation
Ran on the remote Ubuntu build host as
bwhfrom~/proj/temps:cargo fmt --all -- --checkCARGO_BUILD_JOBS=2 cargo check -p temps-providers --libCARGO_BUILD_JOBS=2 cargo test -p temps-providers mariadb --libcd web && bun install && TEMPS_VERSION=pr-validation bun run buildThe MariaDB-focused provider test run passed 13 tests.
For the docs-only follow-up commit:
git diff --check