Skip to content

fix: DKIM keys bind-mount + config apply writes canonical compose#72

Merged
oakshape merged 3 commits into
mainfrom
fix/dkim-bind-mount-and-compose-path
Jun 13, 2026
Merged

fix: DKIM keys bind-mount + config apply writes canonical compose#72
oakshape merged 3 commits into
mainfrom
fix/dkim-bind-mount-and-compose-path

Conversation

@oakshape

Copy link
Copy Markdown
Contributor

Fixes the two self-host ops bugs from feedback_vectis_cli_db_and_sidecar_compose_gaps (gaps #3 + #4), surfaced repeatedly on the mx1 portfolio box.

1. DKIM keys: named volume → host bind mount

The dkim dir was a named volume (vectis_dkim-keys) on rspamd + api. The host CLI vectis domain add writes keys to the host /var/vectis/dkim, which the named volume hid from rspamd → outbound mail for host-CLI-added domains went unsigned (dkim=none) despite a correct dkim_signing.conf and passing SPF/DMARC. The api and host CLI wrote to two different places.

  • Both mounts now bind /var/vectis/dkim from the host → one shared directory, no copy step.
  • Upgrade migration: existing installs have api-written keys only in the named volume. The orchestrator self-heal now runs a one-time, idempotent, best-effort MigrateLegacyDKIMVolume before recreating rspamd — a one-shot container cp -ans the legacy volume into the host path. No-op on fresh installs (gated on the volume existing); never blocks the upgrade.

2. config apply writes the canonical /etc/vectis compose

vectis install writes the live compose to /etc/vectis/docker-compose.yml (what the orchestrator + stack read), but config apply generated/diffed/wrote it under /var/vectis/generated — a copy nothing reads. So apply's compose changes diverged from the running project (the exact split-brain that left mx1's generated compose stuck at an old version while the live stack ran a newer one).

  • docker-compose.yml is now diffed + written against configDir (/etc/vectis); per-service configs still go to the bind-mounted genDir. A back-compat copy is still written to genDir.

Testing (sa1001, Go 1.26.4)

  • go build ./... clean
  • go test ./internal/cli/ ./internal/engine/ ./internal/orchestrator/ — pass (incl. new splitCompose / writeAllFiles tests + a new engine assertion locking in the /var/vectis/dkim bind mount)
  • go vet -tags integration ./internal/... — compiles clean

Note: the two existing installs (prod, mx1) will be migrated/verified manually on deploy as belt-and-suspenders alongside the automated self-heal migration.

Vectis Build added 2 commits June 13, 2026 16:28
The dkim dir was a named volume (vectis_dkim-keys) mounted into rspamd + api.
But the host CLI `vectis domain add` writes keys to the HOST /var/vectis/dkim,
which the named volume hid from rspamd — so outbound mail for any host-CLI-
added domain went unsigned (dkim=none), even though dkim_signing.conf was
correct and SPF/DMARC passed. (api-added domains worked because the api writes
into the volume.) The two write paths diverged.

Switch both mounts to a host bind mount of /var/vectis/dkim so the host CLI,
the api, and rspamd all share one directory — no copy step, no divergence.

Existing installs have api-written keys living only in the named volume. On
the upgrade that flips the mount, the orchestrator self-heal now runs a one-
time best-effort migration (MigrateLegacyDKIMVolume) BEFORE recreating rspamd:
a one-shot container `cp -an`s the legacy volume into the host path. Idempotent,
gated on the volume existing (no-op on fresh installs), never blocks the
upgrade. The legacy volume is left in place for the operator to remove.
`vectis install` writes the live docker-compose.yml to /etc/vectis (where the
orchestrator + running stack read it), but `vectis config apply` generated,
diffed, and wrote the compose under /var/vectis/generated — a copy the stack
never reads. So apply's compose changes (and its diff) silently diverged from
the running project: structural edits never landed, and on boxes the
orchestrator had bumped, apply diffed against a stale compose.

Split the generated files: per-service configs still go to the bind-mounted
genDir, but docker-compose.yml is now diffed + written against configDir
(/etc/vectis) — the same compose install and the orchestrator use. A back-
compat copy is still written to genDir. Adds unit tests for the split.
Copilot AI review requested due to automatic review settings June 13, 2026 08:28

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes two operational gaps affecting self-hosted installs: DKIM key visibility between host CLI and containers, and vectis config apply updating the wrong docker-compose location.

Changes:

  • Switch DKIM keys from a named volume to a host bind mount (/var/vectis/dkim) for api + rspamd, and add an orchestrator self-heal migration to copy legacy volume keys to the host path.
  • Make config diff/apply treat /etc/vectis/docker-compose.yml as canonical (while still writing a back-compat copy under the generated dir).
  • Add/extend tests to lock in the compose mount + compose/config split behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/orchestrator/docker.go Adds legacy DKIM volume migration via one-shot container copy.
internal/orchestrator/compose_selfheal.go Hooks DKIM migration into self-heal before service recreation.
internal/engine/templates/docker-compose.yml.tmpl Changes DKIM mount to host bind mount; removes legacy volume declaration.
internal/engine/engine_test.go Asserts DKIM keys are bind-mounted in generated compose.
internal/cli/config_cmd.go Splits compose vs service configs so apply/diff operate on canonical compose path.
internal/cli/config_cmd_test.go Adds unit tests for compose splitting and dual-write behavior.

Comment on lines +239 to +244
// Gate on the volume existing — `docker run -v <name>:...` would otherwise
// CREATE a stray empty volume on fresh installs.
if err := exec.CommandContext(ctx, "docker", "volume", "inspect", legacyDKIMVolume).Run(); err != nil {
dm.logger.Info("dkim migration: no legacy volume, nothing to migrate", "volume", legacyDKIMVolume)
return nil
}
Comment on lines +260 to +262
dkimMigrationImage,
"sh", "-c", "cp -an /from/. /to/ 2>/dev/null || true",
}
Comment on lines +243 to +248
if composeRewritten {
if err := o.docker.MigrateLegacyDKIMVolume(ctx); err != nil {
o.logger.Warn("self-heal: legacy DKIM volume migration failed — api-added domains may sign unsigned until keys are copied to /var/vectis/dkim manually",
"error", err)
}
}
- volume inspect: only treat 'No such volume' as absent; return other
  errors (daemon down / permission denied) so the caller's warning fires
  instead of silently skipping the migration with a misleading log.
- drop 'cp ... || true' so a real copy failure surfaces.
- run the migration whenever self-heal recreates services, not only on
  composeRewritten — keys can be stuck in the legacy volume even when the
  compose already matches (e.g. an operator hand-edited it first).

Addresses Copilot review on #72.
@oakshape oakshape merged commit e2e7205 into main Jun 13, 2026
7 checks passed
@oakshape oakshape deleted the fix/dkim-bind-mount-and-compose-path branch June 13, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants