PDF OCR

One canonical path:

cd /home/nharmon/git/diffio/diffio-tts
uv run python ocr_pdf.py

That defaults to the-preacher-and-his-preaching-ocr.pdf in this directory.

You can also pass a different PDF:

uv run python ocr_pdf.py some-book.pdf

Output is written to output/<pdf-stem>/:

<pdf-stem>.md: raw OCR markdown
<pdf-stem>.txt: cleaned text
<pdf-stem>.meta.json: run metadata

The script uses Hugging Face zai-org/GLM-OCR directly, with no local server. Model downloads are cached under ./models/.

TTS cleanup

After OCR, rewrite the text for TTS. This is now the canonical path for cleanup, and it uses OpenRouter only:

uv run python prepare_tts_text.py

That defaults to:

output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.txt

and writes:

output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.tts.txt
output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.tts.txt.meta.json
output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.tts.chunks/

This script calls OpenRouter using the key in ./openrouter.key. It is hardwired to use google/gemini-3.1-flash-lite-preview. There is no local LLM path in this script anymore. For testing, you can limit how many chunks are processed:

uv run python prepare_tts_text.py --max-chunks 2 --overwrite

It writes each cleaned chunk to disk as soon as that chunk finishes, and also appends incrementally to the final .tts.txt.

The cleanup pass now defaults to smaller chunks to reduce cross-paragraph mixing. You can still override that if needed:

uv run python prepare_tts_text.py --max-input-tokens 2500 --max-chunks 2 --overwrite

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
eval		eval
normalizer		normalizer
.gitignore		.gitignore
Christ-Our-High-Priest.pdf		Christ-Our-High-Priest.pdf
NORMALIZER_DESIGN.md		NORMALIZER_DESIGN.md
OnceForAll.pdf		OnceForAll.pdf
PIPELINE_DESIGN.md		PIPELINE_DESIGN.md
README.md		README.md
cain-his-world-and-his-worship.pdf		cain-his-world-and-his-worship.pdf
gibbs-alfred-worship-the-christians-highest-occupation.pdf		gibbs-alfred-worship-the-christians-highest-occupation.pdf
merge_narrations.py		merge_narrations.py
meyerpriestpriesthood.pdf		meyerpriestpriesthood.pdf
ocr_pdf.py		ocr_pdf.py
pipeline.py		pipeline.py
prepare_tts_text.py		prepare_tts_text.py
pyproject.toml		pyproject.toml
stitch_book.py		stitch_book.py
the-preacher-and-his-preaching-ocr.pdf		the-preacher-and-his-preaching-ocr.pdf
uv.lock		uv.lock
vibevoice7b_batch.py		vibevoice7b_batch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF OCR

TTS cleanup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PDF OCR

TTS cleanup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages