30 Mar 23:06

thst71

v1.0.0 Pre-release

Pre-release

Release Notes - PDF Classifier

v1.0.0

Added

New Feature: Implemented the FileData class to manage document metadata.
New Feature: Added the sanitize_filename_data function to clean up metadata for filenames.
New Feature: Implemented the classify_pdf function to rename PDF files based on metadata.
New Feature: Added the PdfProcessor class to handle PDF processing, image extraction, and OCR.
New Feature: Implemented the LLMClassifier class to use the Gemini LLM for document classification.

Fixed

Bug: Fixed an issue where invalid characters in filenames were not being handled correctly.
Bug: Fixed a bug where the pdfprocessor.py module would not reprocess files if they were modified.
Bug: Fixed a bug where the data_files property was not correctly patched in the tests.

Documentation

Docs: Updated the README.md file with detailed installation and usage instructions.

Dependencies

Chore: Updated dependencies in requirements.txt to the latest versions.
- python-dotenv==1.0.1
- pdf2image==1.17.0
- pytesseract~=0.3.13
- pandas~=2.0
- Pillow~=10.0.1
- google-generativeai~=0.8.4

Fixed

Initial bug fixes.

Changed

Initial refactoring.

Documentation

Initial documentation.

Assets 4