One canonical path:
cd /home/nharmon/git/diffio/diffio-tts
uv run python ocr_pdf.pyThat defaults to the-preacher-and-his-preaching-ocr.pdf in this directory.
You can also pass a different PDF:
uv run python ocr_pdf.py some-book.pdfOutput is written to output/<pdf-stem>/:
<pdf-stem>.md: raw OCR markdown<pdf-stem>.txt: cleaned text<pdf-stem>.meta.json: run metadata
The script uses Hugging Face zai-org/GLM-OCR directly, with no local server.
Model downloads are cached under ./models/.
After OCR, rewrite the text for TTS. This is now the canonical path for cleanup, and it uses OpenRouter only:
uv run python prepare_tts_text.pyThat defaults to:
output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.txt
and writes:
output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.tts.txtoutput/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.tts.txt.meta.jsonoutput/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.tts.chunks/
This script calls OpenRouter using the key in ./openrouter.key.
It is hardwired to use google/gemini-3.1-flash-lite-preview.
There is no local LLM path in this script anymore.
For testing, you can limit how many chunks are processed:
uv run python prepare_tts_text.py --max-chunks 2 --overwriteIt writes each cleaned chunk to disk as soon as that chunk finishes, and also
appends incrementally to the final .tts.txt.
The cleanup pass now defaults to smaller chunks to reduce cross-paragraph mixing. You can still override that if needed:
uv run python prepare_tts_text.py --max-input-tokens 2500 --max-chunks 2 --overwrite