PDF Notes Searching

PDF and OCR based notes searching. Index your notes and search for keywords.

Quick Start

pip install -r requirements.txt
Install tesseract per instruction on this (TODO: make UI to supply local PDF lib path)
- modify TESSERACT_PATH in ocr.py for now
Prepare 'database':
- TODO: make UI
- For now:
  - in ocr.py, modify convert_pdf_in_directory("your folder full of PDF notes", "a tag") and run
  - run whoosh_search.py to build index for all data by default
Web UI: streamlit run .\notes_search.py
- go to http://localhost:8501 if not already

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
pages		pages
.gitignore		.gitignore
config.py		config.py
config.yaml		config.yaml
images.png		images.png
lang.yaml		lang.yaml
markov.png		markov.png
notes_search.py		notes_search.py
ocr.py		ocr.py
pdf.png		pdf.png
readme.md		readme.md
requirements.txt		requirements.txt
whoosh_search.py		whoosh_search.py