Skip to content
Merged
18 changes: 10 additions & 8 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ PYTHONPATH=src python3 -m bd_archive ...
```

```bash
bd-archive create -s <source> -n <name> -o <output> [-w <workdir>] [-D /dev/srN] [-b BYTES] [-r %] [-c zstd|lzma|...] [-l <level>] [--ratio <float> | --sample <path>] [-y]
bd-archive create -s <source> -n <name> -o <output> [-w <workdir>] [-D /dev/srN] [-b BYTES] [-r %] [-c zstd|lzma|...] [-l <level>] [--ratio <float> | --sample <path>] [--base <catalog.dar>] [--min-last-disc-fill PERCENT] [-y]
bd-archive burn -i <input> [-D /dev/srN] [--start N] [--no-verify] [--skip-fit-check] [-S <speed>]
bd-archive verify [<mountpoint|dir|/dev/srN|*.iso>]
bd-archive extract -o <output> [-D /dev/srN] [-w <workdir>]
Expand Down Expand Up @@ -47,11 +47,11 @@ src/bd_archive/
├── __main__.py # entry point for `python -m bd_archive`
├── _par2_helper.py # dar -E hook: invoked as `python -m bd_archive._par2_helper ...`
├── cli.py # argparse + dispatch + top-level exception handling (uniform cancel/error output)
├── constants.py # MiB, DISC_OVERSIZE_TOLERANCE, PAR2_AND_MISC_OVERHEAD, DISC_END_MARGIN, POST_BURN_MOUNT_TIMEOUT, ISO9660_VOLUME_LABEL_MAX, PAR2_RECOVERY_RE
├── constants.py # MiB, DISC_OVERSIZE_TOLERANCE, PAR2_AND_MISC_OVERHEAD, DISC_END_MARGIN, POST_BURN_MOUNT_TIMEOUT, ISO9660_VOLUME_LABEL_MAX, ISO9660_LABEL_NAME_MAX, ISO9660_LABEL_SUFFIX_LEN, PAR2_RECOVERY_RE
├── ui/ # logger, prompts (interactive), progress (byte-counted, TTY-aware)
├── shell/ # runner.py: run() (+ SIGINT handling); deps.py: check_deps(); format.py: human_bytes()
├── tools/ # one thin wrapper per external CLI
│ ├── dar.py # dar create_sliced/isolate_catalog/compress/extract_sequential (Bad-CRC parser)
│ ├── dar.py # dar create_sliced (incl. -A ref, -P excludes, -E hook) / isolate_catalog / compress / extract_sequential (-wa overwrite for chain restore, Bad-CRC parser) / list_catalog_paths (`dar -l` parse)
│ ├── par2.py # par2 create/verify/repair (+ VerifyResult, is_par2_index)
│ ├── mkisofs.py # ISO9660+UDF image build (`-iso-level 3 -udf -V -publisher -input-charset utf-8 -graft-points`)
│ ├── growisofs.py # burn (+ DeviceBusyError on sg lock, SIGINT double-press abort with BURN_ABORT_GRACE_S=5s)
Expand All @@ -63,11 +63,11 @@ src/bd_archive/
│ └── lsof.py # find_device_holders (optional — no-op if lsof absent)
├── archive/ # domain logic over tools/
│ ├── checksums.py # sha512 verify (verify_slice per-file, used by extract on staging)
│ ├── config.py # ArchiveConfig, write_readme
│ ├── dar_archive.py # DarArchive (slices, catalog, work-dir layout)
│ ├── config.py # ArchiveConfig (incl. generation, dar_name), write_readme
│ ├── dar_archive.py # DarArchive (slices, catalog, work-dir layout) + parse_dar_filename (chain/gen detection from filename)
│ ├── disc.py # DiscIO (mount/mount_with_retry/umount/eject/close_tray_if_open/burn) + find_sg_device
│ ├── sizing.py # compute_slice_bytes, measure_compression_ratio
│ ├── source_scan.py # SourceScan + scan_source
│ ├── source_scan.py # SourceScan + scan_source; SourceFile + list_source_files (auto-defer pool); scan_delta_bytes (incremental preview)
│ └── verify.py # verify_disc()
└── commands/ # one file per subcommand
├── create.py
Expand All @@ -82,10 +82,12 @@ Layering: `commands/` → `archive/` → `tools/` → `shell/`. Lower layers nev

Four subcommands form a pipeline. `create` previews disc count + last-disc fill before prompting for confirmation, so users can dry-run sizing without committing.

1. **`create`** (`commands/create.py`) reads disc capacity via `tools.mediainfo.detect_disc_capacity` (or `args.bytes`), scans the source, and computes slice sizing plus a disc-count estimate (optionally measuring the compression ratio via `--sample`). The user confirms via `prompt_yn` before any heavy work begins (skip with `-y`). Then runs `tools.dar.create_sliced` with `--hash sha512 --min-digits 4 -Q` (plus `-z<algo>[:level] -am` when compression is enabled) to slice the source into per-disc-sized `.dar` files in `<workdir>/tmp/`. par2 is generated **inline** via dar's `-E` hook (`bd_archive._par2_helper`) — the hook fires after each slice is fully written, so par2 reads the slice while it is still hot in the OS page cache, eliminating most SSD read traffic of the create phase. After dar completes, the catalog is isolated. For each slice in order: regenerate `README.txt` with the right disc number and call `tools.mkisofs.build` (mkisofs `-iso-level 3 -udf -V <label> -publisher "bd-archive v<ver>" -input-charset utf-8 -graft-points`) to assemble `<output>/images/disc_NNNN.iso` directly from in-place files (no staging copies). The ISO file size is checked against the format-aware writable capacity as a hard limit. Phase 3 also asserts par2 files are present on disk — a missing file means the `-E` helper silently failed during dar create. After each ISO is built, the slice + par2 are deleted from `tmp/`; once all slices are processed, `tmp/` is wiped entirely. If `-w` was not supplied, the default `<output>/.bd-archive-work/` is also removed, so `<output>` ends up containing only `images/disc_*.iso`.
1. **`create`** (`commands/create.py`) reads disc capacity via `tools.mediainfo.detect_disc_capacity` (or `args.bytes`), scans the source, and computes slice sizing plus a disc-count estimate (optionally measuring the compression ratio via `--sample`). The internal dar archive name is `<-n value>-gen<N>` where N is 1 for a full archive and `base_gen + 1` for an incremental against `--base <catalog.dar>` (base gen parsed from the catalog filename via `archive.dar_archive.parse_dar_filename`, which also handles legacy pre-`-gen<N>` filenames as gen 1). Volume labels are `<truncated_name>_G<NN>_<NNNN>` — names longer than `ISO9660_LABEL_NAME_MAX` (23) are truncated in the label only; filenames inside the ISO keep the full name. When `--base` is set, `tools.dar.list_catalog_paths` parses `dar -l` output to get the set of paths already in the predecessor, and `archive.source_scan.scan_delta_bytes` re-scans the source counting only new/modified files for the preview's archive-size estimate. The user confirms via `prompt_yn` before any heavy work begins (skip with `-y`). Then runs `tools.dar.create_sliced` with `--hash sha512 --min-digits 4 -Q` (plus `-z<algo>[:level] -am` when compression is enabled, `-A <ref_catalog>` for incrementals, `-P <path>` per excluded file from auto-defer) to slice the source into per-disc-sized `.dar` files in `<workdir>/tmp/`. par2 is generated **inline** via dar's `-E` hook (`bd_archive._par2_helper`) — the hook fires after each slice is fully written, so par2 reads the slice while it is still hot in the OS page cache, eliminating most SSD read traffic of the create phase. After dar completes, the catalog is isolated. For each slice in order: regenerate `README.txt` with the right disc number + generation and call `tools.mkisofs.build` (mkisofs `-iso-level 3 -udf -V <label> -publisher "bd-archive v<ver>" -input-charset utf-8 -graft-points`) to assemble `<output>/images/disc_NNNN.iso` directly from in-place files (no staging copies). **Catalog files go onto Disc 1 only** — discs 2..N carry only their slice + par2 + README; the dar slice on the last disc embeds the master catalog at its end (dar default), so every set still has two spatially separated catalog copies. After all ISOs are built, the isolated catalog is also copied to `<output>/<name>-gen<N>-catalog.*.dar` so the user can keep it in their regular backup and use it as `--base` for future generations. The ISO file size is checked against the format-aware writable capacity as a hard limit; a missing par2 file (helper silently failed) hard-errors. After each ISO is built, the slice + par2 are deleted from `tmp/`; once all slices are processed, `tmp/` is wiped entirely. If `-w` was not supplied, the default `<output>/.bd-archive-work/` is also removed, so `<output>` ends up containing only `images/disc_*.iso` and the persisted catalog.

**Auto-defer** (`--min-last-disc-fill PERCENT`): when the projected last-disc fill is below PERCENT, the newest-by-mtime files are pushed to a future generation until either the threshold is met or the candidate pool is exhausted. For incrementals (`--base` given), the pool is "files whose relative path is not in the base catalog" — strictly conservative, so an already-archived file whose mtime has drifted on disk is never lost. For full archives (no `--base`), the pool is "all source files" with a loud warning that deferred files won't be archived anywhere until a later incremental picks them up. Deferred files become `-P <relpath>` flags on dar. The preview block shows count, byte total, oldest mtime, and a sample of deferred paths before the confirm prompt.
2. **`burn`** (`commands/burn.py`) iterates `<input>/images/disc_*.iso` lexically and burns each via `growisofs -use-the-force-luke=notray -dvd-compat -Z dev=image.iso` — a byte-for-byte ISO write, no on-the-fly mkisofs, so what's in the ISO file is exactly what ends up on disc. Volume label, publisher, file layout are all already in the file. Pre-burn fit check is **two-sided**: rejects too-small discs AND discs more than `DISC_OVERSIZE_TOLERANCE` (= 5%) larger than the ISO, guarding against wasting a 50 GB BD-DL on a 25 GB-sized archive. `--skip-fit-check` disables both directions. SIGINT is trapped during the burn itself: a first `Ctrl+C` warns and is ignored (cancelling mid-burn coasters the disc), a second within `BURN_ABORT_GRACE_S` (= 5 s) terminates growisofs and bubbles up as `KeyboardInterrupt`. growisofs runs in its own session (`start_new_session=True`) so the tty's SIGINT does not reach it directly. After burn, `DiscIO.close_tray_if_open` pulls the tray back in on auto-ejecting drives so the post-burn verify can mount. The post-burn verify runs `verify_disc` and loops on any mount/verify failure with a `Re-insert the disc … press Enter to retry` prompt, since some drives need a manual re-insert. Resumable via `--start N`; per-disc resume hints are logged on every cancel/error path. Catches `DeviceBusyError` from `tools/growisofs.py` (sg device locked) and offers an interactive retry — `tools.lsof.find_device_holders` is consulted to name the holding processes when available.
3. **`verify`** (`commands/verify.py`) dispatches on target type: block device → mount; directory → check directly; **`.iso` file → loop-mount via `tools.udisks.loop_setup` + check + tear down**. The ISO branch makes pre-burn dry-run trivial: run `create`, then `verify images/disc_0001.iso` to confirm the image is internally consistent before touching media.
4. **`extract`** (`commands/extract.py`) auto-detects the archive name from the first disc's `*.dar` filenames (no `-n` flag), then iterates discs interactively. For each disc: copy slice + sha512 sidecar (and the catalog on its first arrival) to staging in a single disc-read pass — par2 is **not** copied — then eject. The catalog is verified separately via `_verify_catalog_on_staging`: any failing catalog slice is deleted from staging so the **next** disc that carries it can re-fetch it (multi-disc catalogs converge in fewer disc-iterations than a stop-at-first-failure pass). Each slice is then verified via SHA-512 on the local copy. On corruption, the disc is re-mounted, just the par2 files for the affected slice are fetched, `par2 repair` runs in staging, and the slice is re-verified. If par2 cannot recover, the slice is kept as-is and recorded in `unrepairable_slices` — no prompt, no abort. After each disc the user is asked "Insert another disc?", allowing an early stop for partial restores (e.g. one disc lost). Once the user is done, `tools.dar.extract_sequential` does the final pass with a background thread feeding ESC bytes on stdin so dar's "missing slice" prompts auto-skip — a partial slice set still restores ~95% of files. dar 2.7 exits 0 even when per-file CRC errors occur, so the wrapper parses `Error while restoring <path> : Bad CRC` lines into a list. If `corrupted_files` OR `unrepairable_slices` is non-empty, `<output>/corrupted-files.txt` is written (listing both groups, with unrepairable slices flagged as "files originating from these may be corrupt even if dar didn't report them"), and `cmd_extract` exits with code 1 so scripts can detect a non-clean restore. The output dir still contains whatever dar managed to extract — best-effort, never silently corrupt.
4. **`extract`** (`commands/extract.py`) is **chain-aware**: it restores all generations of a chain in a single invocation. Auto-detects the chain name from the first disc's filenames (via `parse_dar_filename`) plus per-disc generation. Discs from any gen, in any order, are accepted; the tool tracks per-generation state (`catalogs_verified: dict[int, bool]`, `gen_basenames: dict[int, str]`) because legacy pre-feature gen 1 archives have bare-name basenames (`<name>`) while new-format gens have `<name>-gen<N>`. For each disc: copy slice + sha512 sidecar (and that generation's catalog on its first intact arrival) to staging in a single disc-read pass — par2 is **not** copied — then eject. Per-gen catalog verification runs `_verify_catalog_on_staging` over slices matching `<gen_basename>-catalog.*.dar`; failing slices get deleted so the next disc of the same gen can refetch. Slice verification via SHA-512, par2 fetch + repair on damage (same per-slice logic as before). After each disc the user is asked "Insert another disc?". Once stopped, `tools.dar.extract_sequential` runs **once per generation in order**: Gen 1 into the empty output dir, Gen 2 with `overwrite=True` (passes `-wa` to dar) so its newer file versions replace Gen 1's, and so on. dar's chain-restore semantics handle deletions recorded in later generations. A background thread feeds ESC bytes on stdin so dar's "missing slice" prompts auto-skip — a partial slice set still restores ~95% of files. dar 2.7 exits 0 even when per-file CRC errors occur, so the wrapper parses `Error while restoring <path> : Bad CRC` lines into a list. If any gen's extract exits non-zero, `cmd_extract` aborts with a manual-retry hint. If `corrupted_files` OR `unrepairable_slices` is non-empty across all gens, `<output>/corrupted-files.txt` is written and `cmd_extract` exits 1. The output dir still contains whatever dar managed to extract — best-effort, never silently corrupt.

**SSD-friendly tip:** pass `-w /dev/shm/bd-extract` (or any tmpfs path) to keep the staging copy in RAM. On a 25 GB-slice + 32 GB-RAM box this means **zero SSD writes for slice payload** during extract. Falls back to SSD staging automatically if `-w` is not given.

Expand Down
Loading
Loading