Skip to content

refactor(create): catalog policy + local persistence (Phase 1)#20

Merged
Xitee1 merged 3 commits into
mainfrom
feat/incremental-phase1-catalog-policy
May 12, 2026
Merged

refactor(create): catalog policy + local persistence (Phase 1)#20
Xitee1 merged 3 commits into
mainfrom
feat/incremental-phase1-catalog-policy

Conversation

@Xitee1
Copy link
Copy Markdown
Owner

@Xitee1 Xitee1 commented May 12, 2026

Summary

Phase 1 of a 5-phase rollout to add incremental archive support to bd-archiver. This PR ships the catalog-storage policy change on its own — it is independent of the incremental feature and brings immediate disc-overhead savings.

  • Isolated dar catalog is now written onto Disc 1 only of an archive set, no longer onto every disc.
  • The catalog is also persisted locally to <output>/<name>-catalog.*.dar alongside images/, so the user can keep it in their regular digital backup (and so a future --base reference has a stable artifact).
  • create's summary now reports the catalog path.

Why

The current code replicates the catalog onto every disc. With thousands of files (and especially with incremental generations on the horizon, where catalogs describe cumulative state) the per-disc overhead grows unboundedly with file count rather than data size. In the author's photo archive: ~130 MB catalog × 20 discs = 2.3 GB of redundancy that was just dead weight.

The dar slice on the last disc still embeds the master catalog at its end (dar default, unchanged). So every archive set still has two spatially separated catalog copies: isolated on Disc 1, embedded at the end of the last slice. Discs 2..N-1 now carry just slice + par2.

Design context

See docs/superpowers/specs/2026-05-12-incremental-backups-design.md for the full 5-phase plan; this PR is Phase 1. Implementation walk-through in docs/superpowers/plans/2026-05-12-incremental-phase1-catalog-policy.md.

Subsequent phases (separate PRs): naming scheme + volume label, --base for incrementals, auto-defer (--min-last-disc-fill), extract chain mode.

No extract.py changes

extract.py's catalog acquisition already only consumes the catalog from the first intact disc carrying it (_copy_disc_data skips when catalog_verified is True) and the final extract path tolerates catalog_base=None (passes no -A to dar, falls back to in-slice master catalog via --sequential-read). Verified by manual e2e — see Test plan.

Test plan

  • ruff check src/bd_archive/ clean
  • Manual e2e on a 4-disc test archive (150 MiB source @ 50 MiB capacity)
    • Disc 1 ISO contains phase1test-catalog.0001.dar + sha512 (via isoinfo)
    • Discs 2, 3, 4 do not contain catalog files
    • <output>/phase1test-catalog.0001.dar + .sha512 persisted locally
    • Summary block shows the new Catalog: ... line
    • bd-archive verify passes on all 4 ISOs (confirms extract path still happy)

🤖 Generated with Claude Code

Xitee1 and others added 3 commits May 12, 2026 10:50
Spec covers the full 5-phase rollout of incremental archive support
plus the catalog-storage policy change. Phase 1 plan implements the
catalog policy slice (Disc 1 only + local persistence). Subsequent
phase plans will be added when each phase begins.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, the isolated dar catalog was duplicated into every disc's
ISO. For archives with thousands of files (and growing over future
incremental generations) this added unbounded per-disc overhead —
130 MB per disc in the user's photo archive, scaling with file count
not data size.

The dar slice on the last disc still embeds the master catalog at
its end (dar default, unchanged), so we always have two spatially
separated copies per archive set: the isolated copy on Disc 1, and
the embedded master on the last disc. Discs 2..N-1 carry only their
slice + par2.

The isolated catalog is now also persisted to
<output>/<name>-catalog.*.dar alongside images/, so the user can
keep it in their normal digital backup. Phase 3 will use this file
as the --base reference for incremental generations.

No extract.py changes needed: its catalog-acquisition logic already
copies the catalog only from the first intact disc carrying it.

Manual e2e verified: 4-disc set, catalog on Disc 1 only (confirmed
via isoinfo), persisted to output_dir, all discs verify OK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Xitee1 Xitee1 merged commit 2b51766 into main May 12, 2026
1 check passed
@Xitee1 Xitee1 deleted the feat/incremental-phase1-catalog-policy branch May 12, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant