Skip to content

Add opt-in MariaDB-compatible services#138

Closed
bherila wants to merge 22 commits into
gotempsh:mainfrom
bherila:feature/mariadb-services-upstream-pr
Closed

Add opt-in MariaDB-compatible services#138
bherila wants to merge 22 commits into
gotempsh:mainfrom
bherila:feature/mariadb-services-upstream-pr

Conversation

@bherila

@bherila bherila commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an opt-in MariaDB-compatible external service for hosted projects while leaving Temps' own internal PostgreSQL database unchanged.

This includes:

  • a new MariaDB service implementation with create/import/update/delete, credentials, environment bindings, and backup/restore support via mariadb-dump with mysqldump fallback
  • MariaDB query explorer support, including schema/table discovery and guarded SQL execution helpers
  • frontend service presets/forms/icons so users can choose MariaDB from the existing service workflows
  • conservative small-host defaults for MariaDB containers, plus standard/dedicated size profiles for larger hosts
  • MDX docs covering MariaDB managed services, project linking, Laravel usage, backup coverage, and opt-in service startup behavior

Notes

The service uses MariaDB images by default while exposing MySQL-compatible connection URLs and environment variables, since most frameworks and drivers connect to MariaDB through MySQL-compatible drivers.

MariaDB remains opt-in. Installing or starting Temps does not pull or run a MariaDB container; a container is created only when a user creates or imports a MariaDB external service. Linked projects receive separate per-project/per-environment databases inside that shared service.

Validation

Ran on the remote Ubuntu build host as bwh from ~/proj/temps:

  • cargo fmt --all -- --check
  • CARGO_BUILD_JOBS=2 cargo check -p temps-providers --lib
  • CARGO_BUILD_JOBS=2 cargo test -p temps-providers mariadb --lib
  • cd web && bun install && TEMPS_VERSION=pr-validation bun run build

The MariaDB-focused provider test run passed 13 tests.

For the docs-only follow-up commit:

  • git diff --check

@bherila bherila force-pushed the feature/mariadb-services-upstream-pr branch from c765f98 to d00e0ed Compare June 19, 2026 16:43
@bherila

bherila commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

FYI I also built + tested this on Amazon Linux

@dviejokfs

Copy link
Copy Markdown
Contributor

Thanks for the contribution @bherila!

Let me review it

@bherila

bherila commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Thanks. I would really like to make temps my standard devprod stack so I appreciate everything you have built so far, and it's always nice to not have to maintain a disparate fork :)

Open to switching to mysql if that is what you prefer-- Mariadb is really just for nicer licensing.

I have a few more coming too.

@bherila bherila force-pushed the feature/mariadb-services-upstream-pr branch from 8c045b7 to 8a580b1 Compare June 19, 2026 20:26
@dviejokfs

dviejokfs commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

A few points before this PR is considered ready:

  • is there any way to have PITR backup for mariadb? This would be preferred over backing it up daily since for big databases the time it takes for backup completion will only increase
  • MariaDB needs more tests in general, especially related to backup and restore (see postgres.rs)

@bherila

bherila commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

No WAL-G equivalent exists for MariaDB. The faithful analog of "base backup + WAL archive" is mariadb-backup (physical base) + binary-log archiving, with binlog replay for PITR. However, the mariadb:lts stock image already bundles mariadb-backup/mysqlbinlog, so we can use that.

bherila and others added 8 commits June 23, 2026 02:06
Wire MariaDB into the canonical V2 BackupEngine system. Previously
`resolve_engine_key("mariadb")` returned Unsupported, so scheduled
MariaDB backups were silently skipped by enqueue_scheduled_run.

- mariadb_physical: physical `mariadb-backup --stream=mbstream | gzip`
  base backup to S3 (PITR engine; analog of postgres_walg). Records
  binlog coordinates (file/position/GTID) parsed from mariadb-backup
  stderr into the metadata.json companion as the PITR replay anchor.
  Success is verified via the "completed OK!" marker since the
  container's dash shell has no pipefail.
- mariadb_dump: logical `mariadb-dump` fallback (analog of
  postgres_pgdump), no PITR.
- mariadb_exec: shared standalone docker-exec helpers (stream stdout to
  file, capture stderr) + binlog-position parser. Kept in temps-backup
  to avoid a circular dep on temps-providers.
- dispatch: new "mariadb" arm probing both mariadb-backup and
  mariadb-binlog on container `mariadb-{name}`; falls back to
  mariadb_dump when the PITR tools are absent.
- plugin: register both engines (7 -> 9).

Credentials flow via MYSQL_PWD env, never argv (cf. PR gotempsh#149); unit
tests pin that invariant plus binlog-coordinate parsing and dispatch
fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
…tting

Make Temps-managed MariaDB containers PITR-capable by enabling binary
logging, the MariaDB analog of Postgres WAL archiving.

- New BinlogArchiveInterval setting (1m/5m/15m/60m, default 5m) exposed
  in the parameter schema as an editable dropdown — the user controls
  PITR granularity (recovery-point objective) per service at runtime.
- binlog_expire_logs_seconds is derived from the interval (>= 6x, floor
  1h) so a binlog segment is never purged locally before the archiver
  ships it (the PITR continuity invariant).
- Container cmd now appends --log-bin, --binlog-format=ROW, a stable
  non-zero --server-id derived from the service name, --sync-binlog=1
  (durable binlog tail), and the derived retention.
- Container env pins TZ=UTC so binlog timestamps — and therefore
  `mysqlbinlog --stop-datetime` PITR targets (RecoveryTarget::Time is
  UTC) — are unambiguous.

Design note: binlog is enabled by default at (re)create rather than
lazily on first backup. This avoids a cross-crate enable trigger (the
backup runs in temps-backup; the container lifecycle lives here) and the
silent-drop risk of marker-driven args. Existing containers adopt binlog
on their next recreate (e.g. image upgrade); a healthy running container
is not force-recreated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Add the "frequent scheduled ship" half of MariaDB PITR: a per-service
background task that archives closed binary-log segments to S3 so
point-in-time recovery has the logs to replay.

- MariaDbService::archive_binlogs: one idempotent run — FLUSH BINARY
  LOGS, SHOW BINARY LOGS, ship each closed segment (download via
  bollard tar stream -> gzip -> S3 PUT), update a manifest. Advances
  the manifest's last_shipped_file only past segments that actually
  uploaded, stopping at the first failure so the replay chain stays
  contiguous.
- S3 layout (the contract the restore path consumes):
  {prefix}/external_services/mariadb/{svc}/binlog/{file}.gz and
  .../binlog/manifest.json (BinlogManifest).
- Pure, unit-tested helpers reused by restore: parse_show_binary_logs,
  closed_binlog_files, binlogs_to_ship, binlog_object_key,
  binlog_manifest_key.
- Trigger: ExternalServiceHealthMonitor runs the archiver for running
  standalone mariadb services on their binlog_archive_interval cadence
  (cheap in-memory interval gate checked before the backup-schedule DB
  scan). S3 destination is discovered from a covering enabled backup
  schedule; no schedule => skip. All archiver failures are swallowed so
  health monitoring is never disrupted.

Credentials flow only via MYSQL_PWD/MARIADB_PWD exec env, never argv.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Implement the restore side of MariaDB PITR on the ExternalService trait.

- restore_capabilities: pitr=true, in-place + to-new-service.
- restore_from_s3: detects a physical base.mbstream.gz vs a logical
  .sql.gz and dispatches; logical path refactored into a shared helper.
- restore_to_new_service: clones the source config onto a fresh port,
  creates a new container, restores the base into it.
- restore_pitr (core): guards that the base is a physical mariadb-backup
  with binlog coordinates (logical-only backups are rejected with a
  greppable "PITR ... physical" error); validates the recovery target
  BEFORE destroying data; restores the base; then forward-rolls.

Physical base restore mirrors the postgres swap: disable restart policy
-> stop -> ephemeral helper container (volumes_from, root) that
mbstream-extracts, `mariadb-backup --prepare`, wipes the datadir,
`--copy-back`, and `chown -R mysql:mysql` -> re-enable restart policy
(always, even on failure) -> start -> wait healthy. The gunzipped
mbstream is uploaded onto the created-but-not-started helper's writable
layer (volumes_from can't add binds; avoids binary-over-exec corruption).

Forward-roll reads the binlog manifest, downloads every shipped segment
>= the base's binlog_file, and replays them in a SINGLE `mysqlbinlog
--disable-log-bin --start-position=<base pos> [--stop-datetime|
--stop-position] file1 file2 ... | mariadb` invocation (replayed events
are not re-logged; a non-zero exit surfaces as a failure, never a silent
half-apply). Recovery-target mapping: Time -> --stop-datetime (UTC);
Lsn -> file:pos via --stop-position on the final segment; Xid (GTID) and
Name fail fast with typed errors pending the Docker E2E hardening pass.

Credentials flow only via MYSQL_PWD/MARIADB_PWD exec env, never argv.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Add a binary-log health probe mirroring postgres_wal_health, surfacing
operational PITR risks to the UI/health metadata:

- BinlogDisabled (Critical) — log_bin off, PITR impossible.
- NonRowBinlogFormat — binlog_format != ROW, PITR fidelity risk.
- LargeBinlogBacklog { segment_count, total_bytes } — local binlogs
  accumulating (> 50 segments or > 10 GiB), e.g. archiver behind or
  retention too high / disk pressure.

probe_binlog_health connects via sqlx MySQL (root), reads log_bin /
binlog_format / binlog_expire_logs_seconds / gtid_strict_mode and
SHOW BINARY LOGS (count + total bytes), returns None on any connection
failure. Pure compute_warnings is unit-tested (14 tests); a
testcontainers integration suite (3 tests) covers binlog-off, binlog-on,
and bad-connection against real mariadb:lts. Credentials never logged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Restore-validation rehearsal against real mariadb:lts revealed the replay
path hardcoded `mysqlbinlog`, which does NOT exist in the official image
(only `mariadb-binlog`/`mariadb` are present; `mysqlbinlog`/`mysql` are
not) — PITR replay would have failed on every real deployment.

- replay_binlogs now resolves the tool at run time: prefer mariadb-binlog
  (and mariadb client), fall back to mysqlbinlog/mysql for non-MariaDB
  images.
- Replay decodes to an intermediate file before feeding the client, so
  under `set -e` a failed decode aborts and surfaces as an error instead
  of being masked by the client's exit code in a pipe (no silent
  half-apply). The prior comment claimed this guard but the code piped.

Verified end-to-end with a shell rehearsal: physical base backup +
binlog archive + restore (mbstream extract -> --prepare -> copy-back ->
chown) + mariadb-binlog --stop-datetime replay correctly recovers to a
point in time (rows before T kept, after T dropped); the
mariadb-backup binlog-position stderr format matches parse_binlog_position.

Also adds 25 unit tests bringing MariaDB to parity with the Postgres
provider: full database-name validation matrix, config/schema defaults
and editability, address/env routing (baremetal + docker), import
parsing, and the upgrade-is-not-implemented (MARIADB_AUTO_UPGRADE)
decision. 74 mariadb tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
MyISAM/Aria tables are not crash-safe and don't recover consistently
under point-in-time recovery — a documented cause of PITR failure. The
binlog-health probe now counts non-InnoDB user tables (excluding system
schemas) and emits a NonInnodbTables warning so operators can convert
them to InnoDB before relying on PITR.

Complements the other PITR-failure guards already in place: sync_binlog=1
and binlog_format=ROW (commit 489f344) address lost-binlog-on-crash and
non-deterministic temp/in-memory/distributed-transaction replay; the
LargeBinlogBacklog warning flags binlog disk pressure. +2 unit tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
…PITR

Make MariaDB point-in-time recovery work out of the box. Previously a
MariaDB service had no base backup unless an operator manually created a
schedule, so binlog archiving had nothing to anchor to.

A reconcile loop (5-min tick in the backup plugin) ensures every MariaDB
service has a covering daily full-backup schedule once a default S3
source is configured:
- New `external_services.default_backup_provisioned` boolean (migration
  m20260623_000001, NOT NULL DEFAULT false, idempotent IF NOT EXISTS) is
  a one-shot latch: reconcile only provisions services where it's false
  and sets it true after creating the schedule, so a schedule the
  operator later deletes is never recreated.
- reconcile_default_external_service_schedules: resolves the default S3
  source (skips quietly + retries next tick if none exists yet — handles
  storage configured after the service), then for each unprovisioned
  MariaDB service creates a daily `0 0 3 * * *` (03:00 UTC) full schedule
  via the existing create_backup_schedule + attach_services_to_schedule
  (retention 14d, target_all=false, control_plane=false, default source).
  Per-service failures are logged and skipped so one can't block others;
  each creation is logged for operator visibility.

Pairs with the 5-minute binlog ship default: daily physical base +
continuous binlogs = restore to any point in time, with no manual setup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
@bherila bherila force-pushed the feature/mariadb-services-upstream-pr branch 8 times, most recently from 30f9816 to 54f3cff Compare June 23, 2026 17:30
@bherila

bherila commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

I'm still testing/debugging this fyi

@bherila bherila force-pushed the feature/mariadb-services-upstream-pr branch from 54f3cff to eec11f5 Compare June 23, 2026 18:17
@dviejokfs

Copy link
Copy Markdown
Contributor

I'm still testing/debugging this fyi

sounds good, if you can try it in a real instance with a lot of data it would be way better to harden it more before it reaches the main repo

@bherila

bherila commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

I don't actually have such a site to test with real data. I only have a very small database with only a handful of daily users (<10 users) right now, hence why I've been using full snapshots for daily backups. So I'd welcome any volunteer assistance available to test with a larger database.

@bherila bherila force-pushed the feature/mariadb-services-upstream-pr branch 2 times, most recently from 39f6dc0 to 23d5333 Compare June 23, 2026 19:21
@bherila bherila force-pushed the feature/mariadb-services-upstream-pr branch from 23d5333 to e3fa423 Compare June 23, 2026 19:57
bherila added 2 commits June 23, 2026 20:29
…vices-upstream-pr

# Conflicts:
#	CHANGELOG.md
#	crates/temps-migrations/src/migration/mod.rs
…vices-upstream-pr

# Conflicts:
#	CHANGELOG.md
@bherila

bherila commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Update from this pass:

  • Brought the PR branch up to current upstream/main (latest head ee2b84a7) and mirrored the same fix/merge chain onto release/bherila-amzn2023.
  • Fixed the two restore bugs found by the full-chain MariaDB PITR E2E:
    • mariadb_physical now pipes mariadb-backup --stream=mbstream directly into gzip, instead of gzipping the trailing cleanup command and uploading raw mbstream as .gz.
    • restore-to-new-service now reads archived binlogs from the source service prefix, not the restored service name, so PITR replay can find the source binlog manifest/objects.
  • Also hardened the restore/E2E path while debugging CI: Docker exec polling now stops polling closed stdout/stderr streams and has bounded Docker API calls, PITR binlog upload/replay phases have explicit timeouts/phase logs, and the heavyweight MariaDB PITR Docker E2E now runs as its own GitHub Actions job with explicit setup, image-pull, build, test, diagnostics, and log-artifact steps.

Design summary for review:

  • MariaDB support stays engine-per-source for this PR: mariadb_dump for logical backups and mariadb_physical plus binlogs for PITR. The broader Source/Sink split and Restic integration are intentionally deferred follow-ups rather than part of this change.
  • PITR is the daily-base plus short-binlog-cadence model: a default daily physical base schedule is auto-provisioned for managed MariaDB services, and archived binlogs are intended to run on the 5-minute interval path.
  • The out-of-box schedule uses the default S3 source. The Docker E2E backs that with MinIO as the S3-compatible target, not Garage.

Verification on GitHub Actions:

Local checks also passed on both updated branches:

  • cargo fmt --all -- --check
  • cargo check -p temps-providers --lib
  • cargo test -p temps-backup --features docker-tests mariadb_pitr_full_chain_e2e --no-run

@dviejokfs

dviejokfs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

I saw the Postgres updates, let's leave other databases fixes outside of this PR.

Also, the decision to not wait until a backup is completed is on purpose.

Databases with small size have no issues, but when backing up 500-1000 GB we can't do the backups synchronous. Also, imagine if server is restarted, the state will be inconsistent. That's why for backups we need to use docker containers in detach mode and then have a way to check the state of a database

@dviejokfs

Copy link
Copy Markdown
Contributor

Please, keep this PR scoped to MariaDB @bherila

@bherila

bherila commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

I will update again, I don't have a way to reset back to draft status. However, the postgres fix is required 1st as it blocks the mariadb e2e test suite from working correctly.

@bherila

bherila commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

I'm going to open a separate PR for the postgres fix, and then after that's merged it will rebase out of this PR.

@bherila

bherila commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Moved the Postgres major-upgrade hardening out into its own focused PR: #151

That PR contains only the Postgres upgrade fix set:

  • real backup/S3 fixture rows for the upgrade tests so the pre_upgrade_backup_id FK is valid;
  • stronger Postgres readiness that waits for psql SELECT 1 against the configured database, not just pg_isready/Docker health;
  • clean live-volume handoff for snapshot/rollback so pg18 cannot reopen stale pg17 PGDATA;
  • explicit copy-sidecar cleanup and wait logic before deleting volumes.

I also kicked off a fork Rust Tests workflow for that branch here as validation proof: https://github.com/bherila/temps/actions/runs/28072935670

Once #151 lands, I’ll rebase this MariaDB PR and drop those Postgres commits from #138 so this PR stays scoped to MariaDB/PITR.

@bherila

bherila commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

After #151 and #152 are reviewed and ideally merged, I will pick this PR back up again.

@bherila bherila closed this Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants