refactor(create): catalog policy + local persistence (Phase 1)#20
Merged
Conversation
Spec covers the full 5-phase rollout of incremental archive support plus the catalog-storage policy change. Phase 1 plan implements the catalog policy slice (Disc 1 only + local persistence). Subsequent phase plans will be added when each phase begins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, the isolated dar catalog was duplicated into every disc's ISO. For archives with thousands of files (and growing over future incremental generations) this added unbounded per-disc overhead — 130 MB per disc in the user's photo archive, scaling with file count not data size. The dar slice on the last disc still embeds the master catalog at its end (dar default, unchanged), so we always have two spatially separated copies per archive set: the isolated copy on Disc 1, and the embedded master on the last disc. Discs 2..N-1 carry only their slice + par2. The isolated catalog is now also persisted to <output>/<name>-catalog.*.dar alongside images/, so the user can keep it in their normal digital backup. Phase 3 will use this file as the --base reference for incremental generations. No extract.py changes needed: its catalog-acquisition logic already copies the catalog only from the first intact disc carrying it. Manual e2e verified: 4-disc set, catalog on Disc 1 only (confirmed via isoinfo), persisted to output_dir, all discs verify OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1 of a 5-phase rollout to add incremental archive support to bd-archiver. This PR ships the catalog-storage policy change on its own — it is independent of the incremental feature and brings immediate disc-overhead savings.
<output>/<name>-catalog.*.daralongsideimages/, so the user can keep it in their regular digital backup (and so a future--basereference has a stable artifact).create's summary now reports the catalog path.Why
The current code replicates the catalog onto every disc. With thousands of files (and especially with incremental generations on the horizon, where catalogs describe cumulative state) the per-disc overhead grows unboundedly with file count rather than data size. In the author's photo archive: ~130 MB catalog × 20 discs = 2.3 GB of redundancy that was just dead weight.
The dar slice on the last disc still embeds the master catalog at its end (dar default, unchanged). So every archive set still has two spatially separated catalog copies: isolated on Disc 1, embedded at the end of the last slice. Discs 2..N-1 now carry just slice + par2.
Design context
See
docs/superpowers/specs/2026-05-12-incremental-backups-design.mdfor the full 5-phase plan; this PR is Phase 1. Implementation walk-through indocs/superpowers/plans/2026-05-12-incremental-phase1-catalog-policy.md.Subsequent phases (separate PRs): naming scheme + volume label,
--basefor incrementals, auto-defer (--min-last-disc-fill), extract chain mode.No extract.py changes
extract.py's catalog acquisition already only consumes the catalog from the first intact disc carrying it (_copy_disc_dataskips whencatalog_verifiedis True) and the final extract path toleratescatalog_base=None(passes no-Ato dar, falls back to in-slice master catalog via--sequential-read). Verified by manual e2e — see Test plan.Test plan
ruff check src/bd_archive/cleanphase1test-catalog.0001.dar+ sha512 (viaisoinfo)<output>/phase1test-catalog.0001.dar+.sha512persisted locallyCatalog: ...linebd-archive verifypasses on all 4 ISOs (confirms extract path still happy)🤖 Generated with Claude Code