Skip to content

How to add languages for OCR #37

@githubber

Description

@githubber

After I got this up and running, and indexed some TIF image files of a book in Kazakh, the resulting OCR was mostly gobblygook. Clearly for it to process Cyrillic, or other scripts, it needs a specific language. I scrolled through the directories of File-brain and saw that it uses PIL, but didn't see anything about pytesseract.

Where would I need to add lang files for file-brain to use? I already have Tesseract installed (homebrew on macos), with the lang files located at: /opt/homebrew/share/tessdata which in turn points to /opt/homebrew/Cellar/tesseract-lang/4.1.0/share/tessdata which has the lang files of the languages I need.

How do I get file-brain to use this? PIL, from what I read online, needs pytesseract, which doesn't seem to be in the installation.

Thank youy!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions