Skip to content

refactor(backup): remove built-in WuKongIM backup module (#139)#167

Open
an9xyz wants to merge 2 commits into
Mininglamp-OSS:mainfrom
dmwork-org:remove-endpoint-issue-139
Open

refactor(backup): remove built-in WuKongIM backup module (#139)#167
an9xyz wants to merge 2 commits into
Mininglamp-OSS:mainfrom
dmwork-org:remove-endpoint-issue-139

Conversation

@an9xyz
Copy link
Copy Markdown
Contributor

@an9xyz an9xyz commented May 26, 2026

Summary

Closes #139.

Remove the built-in modules/backup module — a cron job that tar+uploaded the WuKongIM data directory to COS, plus a /v1/manager/backup/* superAdmin API surface (~1.6k LoC). Infrastructure-layer backup (volume snapshots, sidecar sync, the service's own tooling) is the right home for this, not the business service. See the issue for the full rationale (scope mismatch, deployment fragility, live-tar consistency risk, maintenance cost).

Removed

  • modules/backup/{api_manager,db,diskcheck,model,scheduler,service,storage,storage_cos}.go and diskcheck_test.go
  • /v1/manager/backup/* route group (config / trigger / history / download / status)
  • Cron scheduler started in NewManager

Kept (rollback-safe tombstone)

  • modules/backup/sql/20260331000001_backup_legacy01.sql and 20260401000001_backup_legacy01.sql — unchanged contents
  • A minimal modules/backup/1module.go that registers only the embedded SQLDir (SetupAPI=nil, no HTTP routes, no scheduler, no services)
  • Both backup_config and backup_history tables on disk — no drop migration — so existing production rows are not lost

Why tombstones instead of deleting the SQL files

Deleting the SQL files would leave the corresponding rows in gorp_migrations orphaned, and sql-migrate's PlanMigration stage panics with unknown migration in database on the next startup of an existing deployment. Purging those rows on startup is also unsafe: rolling back to a pre-removal release would let sql-migrate re-run the original CREATE TABLE backup_config / backup_history (no IF NOT EXISTS) against the still-present tables and fail at startup with MySQL 1050.

The tombstone keeps both IDs recognised by sql-migrate, so:

  • Existing deployments: IDs already in gorp_migrations → skipped as applied.
  • Fresh installs: migrations run as historical no-ops creating the (unused) empty tables; rollback to an older release remains safe.
  • Rollback after this release: old module's CREATE TABLE finds the ID already recorded and is skipped.

Migration guidance for operators (release notes)

  • Kubernetes deployments: use VolumeSnapshot against the WuKongIM PVC.
  • VM / single-node: schedule a host-level cron job or use cloud disk snapshots.
  • A follow-up PR may add a drop migration for backup_config / backup_history once we are confident no deployment depends on the data.

Test plan

  • go build ./...
  • go vet ./...
  • go test ./pkg/db/ -race
  • Smoke-deploy against a database whose gorp_migrations already contains the two backup IDs (post-rewrite form) and confirm sql-migrate plans cleanly with no panic.
  • Smoke-deploy against a fresh database and confirm the two tombstone migrations run successfully.
  • Confirm /v1/manager/backup/* returns 404 after deploy.

an9xyz added 2 commits May 26, 2026 15:39
…OSS#139)

Delete modules/backup (cron tar+COS uploader, /v1/manager/backup/* API,
~1.6k LoC). Infrastructure-layer backup (volume snapshots, sidecar sync)
is the right home for this, not the business service.

Keep backup_config / backup_history tables — no drop migration — so
existing production rows survive.

Add PurgeRemovedMigrationIDs startup shim to delete the now-orphaned
gorp_migrations rows (both legacy backup-YYYYMMDD-NN.sql and rewritten
*_backup_legacy01.sql forms). Without this, sql-migrate's PlanMigration
panics with "unknown migration in database" on existing deployments.
Wired into main.go after RewriteLegacyMigrationIDs / ReconcileThreadSchemaRecords.
Address review on Mininglamp-OSS#139: deleting the SQL files + purging gorp_migrations
rows would silently re-run the original CREATE TABLE on rollback to a
pre-removal release (the tables are intentionally left in place, so the
unconditional CREATE would fail at startup with MySQL 1050).

Restore the two backup SQL files with their original contents and a
minimal package backup/1module.go that registers only the SQLDir
(SetupAPI=nil — no HTTP surface). All Go business code (cron scheduler,
API, tar+COS storage, disk precheck, models, service) stays deleted.

Revert PurgeRemovedMigrationIDs + its tests + the main.go wire-up and
restore the two mapping JSON entries (total back to 124). sql-migrate
continues to recognise both backup IDs, fresh installs run them as
historical migrations, and a rollback finds the IDs already recorded in
gorp_migrations so the old module's CREATE TABLE is skipped as expected.
@an9xyz an9xyz requested a review from a team as a code owner May 26, 2026 08:42
@github-actions github-actions Bot added the size/XL PR size: XL label May 26, 2026
Copy link
Copy Markdown
Contributor

@lml2468 lml2468 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[APPROVE] Clean module removal with correct sql-migrate tombstone pattern. -1564 LoC, zero risk.

Verification

Check Status Evidence
All Go code deleted (8 files) api_manager, db, diskcheck, diskcheck_test, model, scheduler, service, storage, storage_cos — all removed
1module.go retains only SQLDir registration SetupAPI field omitted (nil), no HTTP routes, no scheduler, no service init
config.Context import removed No longer needed — factory param is interface{} and only register is used
SQL migration files kept as tombstones Not in diff — unchanged, IDs remain in embedded FS for sql-migrate
Tables backup_config / backup_history not dropped No drop migration — existing production data preserved
Rollback safety Pre-removal release finds IDs already in gorp_migrations → skips CREATE TABLE; tables still on disk
Fresh install safety Tombstone migrations run as historical no-ops creating unused empty tables
go build ./... clean (per PR) No dangling references to deleted types/functions

No findings. The tombstone pattern is exactly right for sql-migrate — the package-level godoc in 1module.go explains the "why" thoroughly enough that a future maintainer won't accidentally delete the SQL files.

Highlight: The godoc in 1module.go is exemplary for a tombstone — it explains why the package still exists, why the SQL files can't be deleted, and what would break if someone tried.

CI note: Build / Lint / Vet / Test pending at review time.

Copy link
Copy Markdown
Contributor

@lml2468 lml2468 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: APPROVED ✅

PR #167 removes the built-in WuKongIM backup module (~1,564 LoC deleted), implementing issue #139's removal plan.

What remains (tombstone pattern)

File Purpose
1module.go Tombstone — registers module with SQLDir only, no SetupAPI (zero HTTP surface)
sql/20260331000001_backup_legacy01.sql Historical migration — keeps sql-migrate happy
sql/20260401000001_backup_legacy01.sql Same

What's deleted

9 Go files: api_manager.go, db.go, diskcheck.go, diskcheck_test.go, model.go, scheduler.go, service.go, storage.go, storage_cos.go.

Correctness

  • sql-migrate compat: Migration IDs stay in embedded FS → PlanMigration won't panic with "unknown migration in database" on existing deployments. ✅
  • Rollback safety: Keeping the IDs prevents a pre-removal release from re-running CREATE TABLE (no IF NOT EXISTS) and hitting MySQL 1050. ✅
  • Data preservation: backup_config / backup_history tables intentionally left in DB per issue #139. ✅
  • No dangling references: Module registers with Name: "backup" + SQLDir only. No other module imports backup. ✅
  • Package comment: Thorough explanation of the tombstone rationale. ✅

No blocking findings.

Copy link
Copy Markdown
Contributor

@Jerry-Xin Jerry-Xin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project relevance gate passed. This PR is in scope: it removes an internal modules/backup API/scheduler implementation while preserving migration compatibility for octo-server.

💬 Non-blocking

  • 🟡 Warning: go.mod:25 and go.mod:37 still list github.com/google/uuid and github.com/robfig/cron/v3 as direct dependencies even though the direct imports were removed with the backup module. go mod tidy -diff moves both to the indirect block. This is cleanup-only, not a blocker.
  • 🔵 Suggestion: Add a small regression test around modules/backup/1module.go:35 confirming the backup module remains registered with SQLDir and no SetupAPI. The risk here is future cleanup accidentally deleting the tombstone and reintroducing sql-migrate startup failures.

✅ Highlights

  • modules/backup/1module.go:35 keeps the module registered, and modules/backup/1module.go:38 keeps the embedded SQL migrations discoverable.
  • modules/backup/1module.go:37 omits SetupAPI, which matches the intended route removal; the module setup path supports nil SetupAPI.
  • The historical SQL files are still present, preserving migration ID compatibility for existing deployments and rollback safety.
  • go build ./... passes. go test ./modules/backup ./pkg/db passes. Full go test ./... was attempted but this local environment lacks MySQL/Redis on 127.0.0.1, so unrelated integration-style packages failed with connection refused.

Copy link
Copy Markdown
Contributor

@yujiawei yujiawei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #167 (octo-server)

Verdict: APPROVED

The PR is a careful, well-reasoned removal. The tombstone approach correctly addresses the upgrade/rollback hazard surfaced in the prior review round. No P0/P1 issues found.

1. Verification

Check Result Evidence
go build ./... ✅ pass local build clean
go vet ./... ✅ pass clean
go test ./pkg/db/ -race ✅ pass ok github.com/Mininglamp-OSS/octo-server/pkg/db 1.077s
External symbol references to modules/backup ✅ none only _ blank imports remain (internal/modules.go:24, modules/category/testinit_test.go:6)
BackupConfig / BackupHistory / BackupService Go-symbol leakage ✅ none grep across repo returns 0 hits
/v1/manager/backup/* route bindings ✅ removed only doc-comment mention left in modules/backup/1module.go:2
register.Module{ SetupAPI: nil, SQLDir: …} is loader-safe ✅ confirmed octo-lib/module/module.go:35 guards with if m.SetupAPI != nil (and the same nil-guard for Start/Stop/SetupTask)
Migration ID mapping consistency ✅ consistent both pkg/db/migration_id_mapping.json and tools/migrate-rename/mapping.json restored to total: 124 with the two backup-* entries (verified vs. the prior-commit revert)

2. Tombstone approach — correctness analysis

The core tradeoff (delete SQL + purge gorp_migrations rows vs. keep SQL as no-op tombstone) is correctly resolved in favour of the tombstone. The four scenarios all behave safely:

  1. Existing deployment, post-rename IDs (20260331000001_backup_legacy01.sql already in gorp_migrations) — sql-migrate finds the embedded file, marks already-applied, skips. ✅
  2. Existing deployment, legacy IDs (backup-20260331-01.sql) — RewriteLegacyMigrationIDs (pkg/db/migrate_compat.go) renames the row first, then case 1 applies. ✅
  3. Fresh install — both migrations run as no-ops, creating the (unused) backup_config / backup_history tables. Inelegant but harmless. ✅
  4. Rollback to pre-removal release — the old module's CREATE TABLE (no IF NOT EXISTS, see modules/backup/sql/20260331000001_backup_legacy01.sql:4) finds the ID already recorded → skipped → no MySQL 1050. ✅

The reverted approach (PurgeRemovedMigrationIDs + delete SQL) would have broken case 4 silently, so reverting it is the right call.

3. Security review (PR is security_sensitive)

This is an attack-surface reduction PR — net positive for security:

  • ✅ Removes the /v1/manager/backup/* superAdmin API surface (config/trigger/history/download/status). Any client request to those paths now 404s at the router (no module registers them).
  • ✅ Removes the cron scheduler that could trigger arbitrary tar+upload jobs against the WuKongIM data directory.
  • ✅ Removes COS credential handling code (storage_cos.go used credentials.NewStaticV4(cfg.AccessKey, cfg.SecretKey, "")). Fewer in-process places where COS access keys are materialised.
  • ✅ Removes filesystem read of data_dir (the WuKongIM data directory) — historically a path-traversal-shaped feature that's now simply gone.
  • ⚠️ Data residue (acknowledged, not a blocker): backup_config and backup_history rows are intentionally kept on disk per issue #139. backup_history.error_message is TEXT and could in principle contain operational details from prior runs. There's no API surface to read them anymore, so the residue is dormant — but worth flagging for the human reviewer if the deployment has any compliance constraints around stale operational logs. The follow-up drop migration mentioned in the PR description is the right place to address this.
  • ⚠️ Operational regression (documented): Operators previously relying on the built-in backup lose it. The release notes section in the PR body documents the migration paths (k8s VolumeSnapshot, host-level cron, cloud disk snapshot) — please verify the deployment/runbooks are updated downstream before/at release.

Nothing here justifies CHANGES_REQUESTED — the PR is a security improvement.

4. Findings

P0 / P1

None.

P2 (non-blocking, suggestions)

  • Smoke-deploy items in the test plan are still unchecked (Smoke-deploy against a database whose gorp_migrations already contains the two backup IDs, Smoke-deploy against a fresh database, Confirm /v1/manager/backup/* returns 404). These are exactly the cases the tombstone approach exists to make safe — they should be performed at least once before this lands in a release that ships to production.
  • Frontend audit: this PR removes a public superAdmin API. If dmwork-web (or any internal admin UI / runbook scripts) has a "Backup" page that hits /v1/manager/backup/config or /trigger, it will start showing 404 errors. Worth a quick grep on the web repo before release. Out of scope for this PR but worth coordinating in the release ticket.
  • Drop migration follow-up: as noted, please track the eventual DROP TABLE backup_config / backup_history migration in a follow-up so the schema doesn't carry dead tables forever. Tying it to an explicit operator opt-in window (e.g. "kept for 2 releases") would be reasonable.
  • Doc comment in modules/backup/1module.go is unusually long (24 lines) but it's load-bearing institutional knowledge about a non-obvious sql-migrate failure mode — keeping it is the right call. No change requested.

5. Additional observations

  • The two-commit history is well-shaped: commit 1 is the deletion + the (later reverted) PurgeRemovedMigrationIDs shim, commit 2 is the tombstone correction. The PR description accurately describes the final state. A squash merge would lose the failure-mode reasoning encoded in the second commit message; the maintainers should decide whether that's worth keeping.
  • check-sprint / check-sprint is failing on this PR but it's a project-board metadata check ("Sprint not set"), not a code or build issue.
  • backup_history schema includes started_at / finished_at migrated from TIMESTAMP to DATETIME in the second SQL file; both migrations are preserved verbatim, so no schema drift between the original module's expectations and the tombstones. ✅

Summary

Approved. The tombstone design is the right answer to the rollback-safety concern, and the security posture improves on net. Please complete the smoke-deploy checks and audit any frontend admin UI before release.

Copy link
Copy Markdown
Contributor

@yujiawei yujiawei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/codex Review — PR #167

Verdict: LGTM ✅

逐条对照派单的 6 个场景化检查点:


1. callers 全清理 ✅

grep -rn 确认仓库中仅剩两处 modules/backup 引用,均为 blank import(_ "...")保留 tombstone 注册:

  • internal/modules.go:24 — 保留,使 init() 触发 register.AddModule 注册 SQLDir
  • modules/category/testinit_test.go:6 — 测试 init 链保留

无任何 Go 代码 import backup 的类型/函数/变量。go build ./... clean(PR test plan 已验证)。

2. 1module.go +23/-5 残留 ✅ 设计正确

保留内容:

  • 24 行 godoc 注释解释 tombstone 存在理由
  • //go:embed sql + register.AddModule 只注册 Name: "backup" + SQLDir
  • SetupAPI 为 nil:零 HTTP surface,octo-lib/module/module.go:35 有 nil-guard 不会 panic

这不是"该删没删",而是 sql-migrate 兼容性要求的最小占位。删掉 SQL 文件 → PlanMigration panic "unknown migration in database"。设计合理。

3. API 兼容性 ✅ 无阻塞

/v1/manager/backup/* 路由组(config/trigger/history/download/status)全部随 api_manager.go 删除。SetupAPI=nil → 路由不注册 → 请求 404。

这是 superAdmin API(非用户端),issue #139 body 已明确讨论 deprecation vs outright removal。PR body release notes 段给出了三个替代方案(VolumeSnapshot / host cron / cloud snapshot)。未发现 dmwork-web 有 backup 管理页(grep repo 无命中),但 PR 描述 test plan 有 "confirm returns 404" 待勾。建议在 merge 前完成该 smoke check。非阻塞。

4. 数据库 schema ✅ 故意不 drop

backup_config / backup_history 两表 故意保留在 DB 中,无 drop migration。这与 issue #139 的两步方案一致(step 1 = 删代码保表,step 2 = 后续确认无依赖后再 drop)。

两个 SQL tombstone 文件内容与原始 migration 完全一致(CREATE TABLE + ALTER TIMESTAMP→DATETIME),ID 在 gorp_migrations 中已有 → 跳过。

四种部署场景验证(与 yujiawei review 一致):

  • 已有部署(post-rename ID)→ skip ✅
  • 已有部署(legacy ID)→ RewriteLegacyMigrationIDs 先重命名 → skip ✅
  • 全新部署 → 两个 migration 正常跑,建出空表,无害 ✅
  • 回滚到旧版本 → ID 已在 gorp_migrations → 旧 CREATE TABLE(无 IF NOT EXISTS)被 skip,不会 MySQL 1050 ✅

5. storage_cos.go 删除 ✅ 无生产影响

COS 集成(credentials.NewStaticV4 凭证处理)随模块一起删除。该功能仅在 backup cron 路径中使用,无其他模块 import COS storage 相关类型。删除 = 减少攻击面(COS access key 不再在进程内物化)。

6. 关联 issue #139 ✅ 替代方案明确

Issue #139 body 详细说明了五个移除理由(scope mismatch / deployment fragility / consistency risk / better alternatives / maintenance cost)+ 三个替代方案(K8s VolumeSnapshot / host cron / cloud snapshot)。PR body release notes 段复述了同样的 migration guidance。


汇总

分类 描述
✅ LGTM callers 清理完整,tombstone 设计正确,四种部署场景安全
🟡 建议 merge 前完成 PR test plan 中三项未勾 smoke check(特别是 "existing DB + tombstone IDs" 场景)
🟡 建议 Jerry-Xin 提的 go mod tidy 清理 google/uuid + robfig/cron/v3 可顺手做

无阻塞项。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove built-in WuKongIM data-directory backup module / 移除内置的 WuKongIM 数据目录备份模块

4 participants