fix: DKIM keys bind-mount + config apply writes canonical compose#72
Merged
Conversation
added 2 commits
June 13, 2026 16:28
The dkim dir was a named volume (vectis_dkim-keys) mounted into rspamd + api. But the host CLI `vectis domain add` writes keys to the HOST /var/vectis/dkim, which the named volume hid from rspamd — so outbound mail for any host-CLI- added domain went unsigned (dkim=none), even though dkim_signing.conf was correct and SPF/DMARC passed. (api-added domains worked because the api writes into the volume.) The two write paths diverged. Switch both mounts to a host bind mount of /var/vectis/dkim so the host CLI, the api, and rspamd all share one directory — no copy step, no divergence. Existing installs have api-written keys living only in the named volume. On the upgrade that flips the mount, the orchestrator self-heal now runs a one- time best-effort migration (MigrateLegacyDKIMVolume) BEFORE recreating rspamd: a one-shot container `cp -an`s the legacy volume into the host path. Idempotent, gated on the volume existing (no-op on fresh installs), never blocks the upgrade. The legacy volume is left in place for the operator to remove.
`vectis install` writes the live docker-compose.yml to /etc/vectis (where the orchestrator + running stack read it), but `vectis config apply` generated, diffed, and wrote the compose under /var/vectis/generated — a copy the stack never reads. So apply's compose changes (and its diff) silently diverged from the running project: structural edits never landed, and on boxes the orchestrator had bumped, apply diffed against a stale compose. Split the generated files: per-service configs still go to the bind-mounted genDir, but docker-compose.yml is now diffed + written against configDir (/etc/vectis) — the same compose install and the orchestrator use. A back- compat copy is still written to genDir. Adds unit tests for the split.
There was a problem hiding this comment.
Pull request overview
Fixes two operational gaps affecting self-hosted installs: DKIM key visibility between host CLI and containers, and vectis config apply updating the wrong docker-compose location.
Changes:
- Switch DKIM keys from a named volume to a host bind mount (
/var/vectis/dkim) for api + rspamd, and add an orchestrator self-heal migration to copy legacy volume keys to the host path. - Make
config diff/applytreat/etc/vectis/docker-compose.ymlas canonical (while still writing a back-compat copy under the generated dir). - Add/extend tests to lock in the compose mount + compose/config split behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/orchestrator/docker.go | Adds legacy DKIM volume migration via one-shot container copy. |
| internal/orchestrator/compose_selfheal.go | Hooks DKIM migration into self-heal before service recreation. |
| internal/engine/templates/docker-compose.yml.tmpl | Changes DKIM mount to host bind mount; removes legacy volume declaration. |
| internal/engine/engine_test.go | Asserts DKIM keys are bind-mounted in generated compose. |
| internal/cli/config_cmd.go | Splits compose vs service configs so apply/diff operate on canonical compose path. |
| internal/cli/config_cmd_test.go | Adds unit tests for compose splitting and dual-write behavior. |
Comment on lines
+239
to
+244
| // Gate on the volume existing — `docker run -v <name>:...` would otherwise | ||
| // CREATE a stray empty volume on fresh installs. | ||
| if err := exec.CommandContext(ctx, "docker", "volume", "inspect", legacyDKIMVolume).Run(); err != nil { | ||
| dm.logger.Info("dkim migration: no legacy volume, nothing to migrate", "volume", legacyDKIMVolume) | ||
| return nil | ||
| } |
Comment on lines
+260
to
+262
| dkimMigrationImage, | ||
| "sh", "-c", "cp -an /from/. /to/ 2>/dev/null || true", | ||
| } |
Comment on lines
+243
to
+248
| if composeRewritten { | ||
| if err := o.docker.MigrateLegacyDKIMVolume(ctx); err != nil { | ||
| o.logger.Warn("self-heal: legacy DKIM volume migration failed — api-added domains may sign unsigned until keys are copied to /var/vectis/dkim manually", | ||
| "error", err) | ||
| } | ||
| } |
- volume inspect: only treat 'No such volume' as absent; return other errors (daemon down / permission denied) so the caller's warning fires instead of silently skipping the migration with a misleading log. - drop 'cp ... || true' so a real copy failure surfaces. - run the migration whenever self-heal recreates services, not only on composeRewritten — keys can be stuck in the legacy volume even when the compose already matches (e.g. an operator hand-edited it first). Addresses Copilot review on #72.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the two self-host ops bugs from
feedback_vectis_cli_db_and_sidecar_compose_gaps(gaps #3 + #4), surfaced repeatedly on the mx1 portfolio box.1. DKIM keys: named volume → host bind mount
The dkim dir was a named volume (
vectis_dkim-keys) on rspamd + api. The host CLIvectis domain addwrites keys to the host/var/vectis/dkim, which the named volume hid from rspamd → outbound mail for host-CLI-added domains went unsigned (dkim=none) despite a correctdkim_signing.confand passing SPF/DMARC. The api and host CLI wrote to two different places./var/vectis/dkimfrom the host → one shared directory, no copy step.MigrateLegacyDKIMVolumebefore recreating rspamd — a one-shot containercp -ans the legacy volume into the host path. No-op on fresh installs (gated on the volume existing); never blocks the upgrade.2.
config applywrites the canonical/etc/vectiscomposevectis installwrites the live compose to/etc/vectis/docker-compose.yml(what the orchestrator + stack read), butconfig applygenerated/diffed/wrote it under/var/vectis/generated— a copy nothing reads. So apply's compose changes diverged from the running project (the exact split-brain that left mx1's generated compose stuck at an old version while the live stack ran a newer one).docker-compose.ymlis now diffed + written againstconfigDir(/etc/vectis); per-service configs still go to the bind-mounted genDir. A back-compat copy is still written to genDir.Testing (sa1001, Go 1.26.4)
go build ./...cleango test ./internal/cli/ ./internal/engine/ ./internal/orchestrator/— pass (incl. newsplitCompose/writeAllFilestests + a new engine assertion locking in the/var/vectis/dkimbind mount)go vet -tags integration ./internal/...— compiles cleanNote: the two existing installs (prod, mx1) will be migrated/verified manually on deploy as belt-and-suspenders alongside the automated self-heal migration.