eBook Image Extractor

Extract images from EPUB, MOBI, AZW, and AZW3 with reading-order support, filtering, manifest output, and comic archive export.

Highlights

Reading-order-aware extraction for EPUB and MOBI
Image role classification (cover, page, thumbnail, decoration)
Filtering by size, width, height, and aspect ratio
Incremental deduplication with persistent hash cache
Optional manifest.json and debug-order output
Export formats: CBZ, CBR, and PDF (PDF requires Pillow)
Parallel extraction by file
Subcommand-based CLI: scan, extract, inspect, verify
Optional JSON logs

Installation

pip install -r requirements.txt
pip install -e .

CLI

Scan

ebook-extract scan /path/to/books -r --format auto

Extract

ebook-extract extract /path/to/books \
  --format auto \
  --recursive \
  --manifest \
  --debug-order \
  --archive-format cbz \
  --hash-cache .cache/image_hashes.json \
  --parallelism 4

Inspect

ebook-extract inspect /path/to/book.mobi --debug-order --verbose

Verify

ebook-extract verify /path/to/books -r --format auto

Common Extraction Flags

--min-size <bytes>
--min-width <px>
--min-height <px>
--max-aspect-ratio <float>
--no-dedup
--add-ignore-hash <sha256> (repeatable)
--all-images (EPUB only)
--manifest
--debug-order
--archive-format cbz|cbr|pdf
--hash-cache <path>
--parallelism <n>
--json-logs
--dry-run

Programmatic Usage

from src import EPUBImageExtractor, MobiImageExtractor

epub = EPUBImageExtractor(
    min_image_size=2048,
    min_width=300,
    min_height=300,
    max_aspect_ratio=3.5,
    write_manifest=True,
    write_debug_order=True,
    archive_format="cbz",
    hash_cache_path=".cache/hashes.json",
    parallelism=2,
)

epub.extract_from_directory("books", recursive=True)

mobi = MobiImageExtractor(write_manifest=True, archive_format="cbr")
mobi.extract_from_directory("books", recursive=True)

Output

For each book, the extractor creates a folder with files named like:

0000_cover.jpg
0001_page.jpg
0002_page.jpg

When enabled:

manifest.json is created in the output folder
debug_order payload is embedded in manifest
archive is generated next to the folder (.cbz, .cbr, or .pdf)

Notes

DRM-protected files are not supported.
PDF export requires Pillow:

pip install Pillow

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run_extractor.bat		run_extractor.bat
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eBook Image Extractor

Highlights

Installation

CLI

Scan

Extract

Inspect

Verify

Common Extraction Flags

Programmatic Usage

Output

Notes

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

eBook Image Extractor

Highlights

Installation

CLI

Scan

Extract

Inspect

Verify

Common Extraction Flags

Programmatic Usage

Output

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages