Skip to content

Diffio-AI/diffio-audio-books

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF OCR

One canonical path:

cd /home/nharmon/git/diffio/diffio-tts
uv run python ocr_pdf.py

That defaults to the-preacher-and-his-preaching-ocr.pdf in this directory.

You can also pass a different PDF:

uv run python ocr_pdf.py some-book.pdf

Output is written to output/<pdf-stem>/:

  • <pdf-stem>.md: raw OCR markdown
  • <pdf-stem>.txt: cleaned text
  • <pdf-stem>.meta.json: run metadata

The script uses Hugging Face zai-org/GLM-OCR directly, with no local server. Model downloads are cached under ./models/.

TTS cleanup

After OCR, rewrite the text for TTS. This is now the canonical path for cleanup, and it uses OpenRouter only:

uv run python prepare_tts_text.py

That defaults to:

output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.txt

and writes:

  • output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.tts.txt
  • output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.tts.txt.meta.json
  • output/the-preacher-and-his-preaching-ocr/the-preacher-and-his-preaching-ocr.tts.chunks/

This script calls OpenRouter using the key in ./openrouter.key. It is hardwired to use google/gemini-3.1-flash-lite-preview. There is no local LLM path in this script anymore. For testing, you can limit how many chunks are processed:

uv run python prepare_tts_text.py --max-chunks 2 --overwrite

It writes each cleaned chunk to disk as soon as that chunk finishes, and also appends incrementally to the final .tts.txt.

The cleanup pass now defaults to smaller chunks to reduce cross-paragraph mixing. You can still override that if needed:

uv run python prepare_tts_text.py --max-input-tokens 2500 --max-chunks 2 --overwrite

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages