Skip to content

Latest commit

 

History

History
133 lines (96 loc) · 3.38 KB

File metadata and controls

133 lines (96 loc) · 3.38 KB

Quick Start Guide

Get started with SmartOrderReader in 5 minutes!

Step 1: Install Tesseract OCR

Ubuntu/Debian

sudo apt-get update
sudo apt-get install tesseract-ocr

macOS

brew install tesseract

Windows

Download from: https://github.com/UB-Mannheim/tesseract/wiki

Step 2: Install Python Dependencies

pip install -r requirements.txt

Step 3: Test the Installation

Run the example script to verify everything is working:

python example.py

You should see output showing example results.

Step 4: Process Your First Invoice

Using the Command Line

python order_reader.py your_invoice.jpg

The script will:

  1. Extract text from the image using OCR
  2. Find the Order Number and Order Date
  3. Display results in a formatted table
  4. Save results to order_data.csv

Expected Output

================================================================================
EXTRACTED ORDER DATA
================================================================================

+-------------------+----------------+------------------+
| Image File Name   | Order Number   | Order Date       |
+===================+================+==================+
| your_invoice.jpg  | ABC12345       | 12/25/2023       |
+-------------------+----------------+------------------+

================================================================================
CSV FORMAT (Copy & Paste Ready)
================================================================================

Image File Name,Order Number,Order Date
your_invoice.jpg,ABC12345,12/25/2023

CSV file saved: order_data.csv

Step 5: Process Multiple Invoices

Process all images in a folder:

python order_reader.py *.jpg *.png --output my_results.csv

Common Issues & Solutions

"No module named 'pytesseract'"

Run: pip install -r requirements.txt

"TesseractNotFoundError"

Tesseract OCR is not installed. Follow Step 1 above.

"No text extracted from image"

  • Check image quality (should be clear and readable)
  • Try a higher resolution image
  • Ensure text is horizontal and not rotated

Wrong data extracted

  • Verify the invoice has clear labels like "Order #:" or "Invoice Date:"
  • Check that the image is not too blurry
  • Try preprocessing the image (the tool does this automatically)

Next Steps

  • Read the full README.md for detailed documentation
  • Check out DEMO.md for example outputs
  • Run python test_order_reader.py to see all features tested
  • Customize extraction patterns in order_reader.py for your specific invoice format

Getting Help

If you encounter issues:

  1. Check that Tesseract is properly installed: tesseract --version
  2. Verify Python dependencies: pip list | grep -E "(pytesseract|Pillow|opencv|pandas|tabulate)"
  3. Run the test suite: python test_order_reader.py
  4. Review error messages - they often indicate what's wrong

Advanced Usage

Custom Language Support

# Process invoices with English and Urdu text
python order_reader.py invoice.png --lang eng+urd

Programmatic Usage

from order_reader import OrderDataExtractor

extractor = OrderDataExtractor(lang='eng')
result = extractor.process_image('invoice.jpg')
print(f"Order: {result['Order Number']}")
print(f"Date: {result['Order Date']}")

You're all set! Start extracting invoice data automatically. 🚀