Skip to content

Releases: thst71/pdfclassifier

v1.0.0

30 Mar 23:06
a07c0ed

Choose a tag to compare

v1.0.0 Pre-release
Pre-release

Release Notes - PDF Classifier

v1.0.0

Added

  • New Feature: Implemented the FileData class to manage document metadata.
  • New Feature: Added the sanitize_filename_data function to clean up metadata for filenames.
  • New Feature: Implemented the classify_pdf function to rename PDF files based on metadata.
  • New Feature: Added the PdfProcessor class to handle PDF processing, image extraction, and OCR.
  • New Feature: Implemented the LLMClassifier class to use the Gemini LLM for document classification.

Fixed

  • Bug: Fixed an issue where invalid characters in filenames were not being handled correctly.
  • Bug: Fixed a bug where the pdfprocessor.py module would not reprocess files if they were modified.
  • Bug: Fixed a bug where the data_files property was not correctly patched in the tests.

Documentation

  • Docs: Updated the README.md file with detailed installation and usage instructions.

Dependencies

  • Chore: Updated dependencies in requirements.txt to the latest versions.
    • python-dotenv==1.0.1
    • pdf2image==1.17.0
    • pytesseract~=0.3.13
    • pandas~=2.0
    • Pillow~=10.0.1
    • google-generativeai~=0.8.4

Fixed

  • Initial bug fixes.

Changed

  • Initial refactoring.

Documentation

  • Initial documentation.