Releases: thst71/pdfclassifier
Releases · thst71/pdfclassifier
v1.0.0
Release Notes - PDF Classifier
v1.0.0
Added
- New Feature: Implemented the
FileDataclass to manage document metadata. - New Feature: Added the
sanitize_filename_datafunction to clean up metadata for filenames. - New Feature: Implemented the
classify_pdffunction to rename PDF files based on metadata. - New Feature: Added the
PdfProcessorclass to handle PDF processing, image extraction, and OCR. - New Feature: Implemented the
LLMClassifierclass to use the Gemini LLM for document classification.
Fixed
- Bug: Fixed an issue where invalid characters in filenames were not being handled correctly.
- Bug: Fixed a bug where the
pdfprocessor.pymodule would not reprocess files if they were modified. - Bug: Fixed a bug where the
data_filesproperty was not correctly patched in the tests.
Documentation
- Docs: Updated the
README.mdfile with detailed installation and usage instructions.
Dependencies
- Chore: Updated dependencies in
requirements.txtto the latest versions.python-dotenv==1.0.1pdf2image==1.17.0pytesseract~=0.3.13pandas~=2.0Pillow~=10.0.1google-generativeai~=0.8.4
Fixed
- Initial bug fixes.
Changed
- Initial refactoring.
Documentation
- Initial documentation.