Skip to content

Latest commit

 

History

History
52 lines (33 loc) · 1.81 KB

File metadata and controls

52 lines (33 loc) · 1.81 KB

Simple OCR Watcher

License: MIT REUSE status

This is a simple OCR (optical character recognition) tool based on OCRmyPDF. It watches an input folder for PDFs and moves the processed PDFs (with OCR applied) to an output folder.

How to clone this repository

This repository can be cloned including the tessdata_best submodule via the following command.

git clone --recurse-submodules https://github.com/mbr4cht/ocrmypdf-watcher.git

If you have already cloned the repository, the tessdata_best submodule can be initialized with the following command.

git submodule update --init --recursive --remote

How to use this tool with Docker

This tool can be used as Docker image using the following convencience scripts.

Build the Docker image

# Docker image based on Ubuntu 24.04
sudo .scripts/build_docker_image_ubuntu.sh
# Docker image based on alpine
sudo .scripts/build_docker_image_alpine.sh

Run the Docker container

sudo .scripts/start_container.sh

Adapt .docker/docker-componse.yml to your specific needs. E.g. set the input and output folders.

Licensing

Please see our LICENSE for copyright and license information.

This project follows the REUSE approach, so copyright and licensing information is available for every file (including third party components) either in the file header, an individual *.license file or a REUSE.toml file. All licenses can be found in the LICENSES folder.