EpubChapterize

A tool to split out chapters from ePub documents. Initially just for Project Gutenberg ePub3s.

Setup

To set up the project, follow these steps:

Clone the repository:

git clone https://github.com/yourusername/EpubChapterize.git
cd EpubChapterize

Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- On macOS/Linux:
```
source venv/bin/activate
```
- On Windows:
```
venv\Scripts\activate
```
Install the required dependencies:
```
pip install -r requirements.txt
```
Install additional language models for spaCy (if needed):

Depending on the languages you plan to process, you may need to install specific spaCy language models. Use the following commands to install them:
- For English:
```
python -m spacy download en_core_web_trf
```
- For German:
```
python -m spacy download de_dep_news_trf
```
- For Italian:
```
python -m spacy download it_core_news_trf
```
- For Spanish:
```
python -m spacy download es_dep_news_trf
```
- For French:
```
python -m spacy download fr_dep_news_trf
```
If you are not using spacy then skip this step

Usage

This tool is primarily designed to extract chapters from Project Gutenberg ePub3 files. It works by analyzing the navigation structure, matching headers, and attempting to identify chapter divisions. Note that it may also include some preamble content, and its accuracy is not guaranteed.

To use the tool, run:

python chapterize.py /path/to/your/epub/files/

or

python chapterize.py

which will use the books directory by default

Notes

The tool is not perfect and may require manual adjustments to the output.
It is currently a standalone script but may be packaged in the future.
Feel free to fork the repository and modify it as needed.

Contributing

If you encounter any issues, please raise a ticket in the repository. Contributions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
epub_chapterize		epub_chapterize
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README_PyPi.md		README_PyPi.md
extract_previews.py		extract_previews.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EpubChapterize

A tool to split out chapters from ePub documents. Initially just for Project Gutenberg ePub3s.

Setup

Usage

Notes

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EpubChapterize

A tool to split out chapters from ePub documents. Initially just for Project Gutenberg ePub3s.

Setup

Usage

Notes

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages