A powerful OCR (Optical Character Recognition) tool that uses Google Drive API to extract text from images with advanced features for text processing and combination.
Version 1.0.0: Modular architecture for better maintainability and extensibility!
- Multiple Image Format Support: Supports JPG, JPEG, PNG, GIF, BMP, TIFF formats
- Automatic Text Cleaning: Removes metadata and cleans extracted text
- Flexible Text Combination: Combine texts with or without file headers
- Comprehensive Logging: Detailed processing logs with colored output
- Error Handling: Robust error handling with detailed reporting
- Configurable Processing: Command-line options for customization
- Modular Architecture: Clean, maintainable code structure with 8 focused modules
- No Duplicate Logging: Clean output without repetitive messages
- Progress Tracking: Real-time progress indicators during processing
- Python 3.6+ (recommended: Python 3.8+)
- Google Drive API credentials
- Internet connection for API access
-
Install Python Dependencies
pip3 install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib oauth2client
-
Get Google Drive API Credentials
- Follow the Python Quickstart Guide
- Download
credentials.jsonfile - Place
credentials.jsonin the project directory
-
Prepare Images
- Create an
imagesfolder in the project directory - Place your images in the
imagesfolder - Supported formats: JPG, JPEG, PNG, GIF, BMP, TIFF
- Create an
If you need to convert PDF documents to images before processing:
- PDF-XChange Editor (Free): Recommended free solution for converting PDF pages to supported image formats
- Additional feature: Crop unwanted sections (headers, footers, page numbers) from all images
- Online PDF Converters: Various web-based conversion tools available
- Desktop Applications: Adobe Acrobat, other PDF utilities
- Alternative Tools: Any reliable PDF-to-image conversion software
After conversion, place all resulting images in the images folder within your project directory.
python main.py# Combine only processed texts (no raw texts) - this is the default
python main.py --no-combine-texts
# Include raw text combination
python main.py --combine-raw
# Include file headers in combined files
python main.py --include-headers
# Combine both raw and processed texts with headers
python main.py --combine-raw --include-headers
# Specify custom credentials file
python main.py --credentials my_credentials.json
# Support only specific image formats
python main.py --extensions .jpg .jpeg .png
# Enable verbose output for detailed logging
python main.py --verbose
# Enable file logging (creates ocr_processing.log)
python main.py --enable-file-logging
# Check version information
python main.py --version
# Combination example: verbose mode with raw text combination and headers
python main.py --verbose --combine-raw --include-headers--credentials PATH: Path to Google credentials JSON file (default: credentials.json)--no-combine-texts: Do not combine processed text files--combine-raw: Also combine raw text files--include-headers: Include file headers in combined files--extensions LIST: Supported image file extensions--verbose: Enable verbose logging output with detailed information--enable-file-logging: Enable logging to file (creates ocr_processing.log)--version: Show version information and exit
project/
├── images/ # Input images
├── raw_texts/ # Raw OCR output
├── texts/ # Cleaned OCR output
├── credentials.json # Google API credentials (user-provided)
├── token.json # OAuth token (auto-generated)
└── main.py # Main script
The tool offers two combination modes:
- Without Headers: Simple text concatenation with file separators
- With Headers: Detailed file information and structured output
Combined files are saved with timestamps: combined_cleaned_TIMESTAMP.txt or combined_raw_TIMESTAMP.txt
- Comprehensive error logging
- Graceful handling of API failures
- Automatic retry mechanisms
- Detailed error reporting
The application has been refactored into a modular architecture:
OCR/
├── images/ # Input images directory
├── raw_texts/ # Raw OCR output directory
├── texts/ # Cleaned OCR output directory
├── __init__.py # Package initialization
├── auth.py # Google Drive authentication and service setup
├── cli.py # Command line interface and argument parsing
├── config.py # Configuration classes and constants
├── credentials.json # Google API credentials (user-provided)
├── logger.py # Logging utilities with colored output
├── main.py # Main entry point
├── ocr_processor.py # Core OCR processing logic
├── PROJECT_STRUCTURE.md # Architecture documentation
├── README.md # User documentation
├── text_processor.py # Text cleaning and combination utilities
└── token.json # OAuth token (auto-generated)
For detailed information about the modular architecture, see PROJECT_STRUCTURE.md.
On first run, the tool will:
- Open a browser for Google OAuth
- Request permission to access Google Drive
- Save authentication token for future use
- Credentials Error: Ensure
credentials.jsonis in the project directory - No Images Found: Check image formats and file extensions
- API Quota Exceeded: Wait and retry, or check Google Cloud Console quotas
- Permission Denied: Re-run OAuth flow by deleting
token.json
- Processing time depends on image size and API response
- Large images may take longer to process
- Multiple files are processed sequentially
- Progress tracking shows current file being processed