Get started with SmartOrderReader in 5 minutes!
sudo apt-get update
sudo apt-get install tesseract-ocrbrew install tesseractDownload from: https://github.com/UB-Mannheim/tesseract/wiki
pip install -r requirements.txtRun the example script to verify everything is working:
python example.pyYou should see output showing example results.
python order_reader.py your_invoice.jpgThe script will:
- Extract text from the image using OCR
- Find the Order Number and Order Date
- Display results in a formatted table
- Save results to
order_data.csv
================================================================================
EXTRACTED ORDER DATA
================================================================================
+-------------------+----------------+------------------+
| Image File Name | Order Number | Order Date |
+===================+================+==================+
| your_invoice.jpg | ABC12345 | 12/25/2023 |
+-------------------+----------------+------------------+
================================================================================
CSV FORMAT (Copy & Paste Ready)
================================================================================
Image File Name,Order Number,Order Date
your_invoice.jpg,ABC12345,12/25/2023
CSV file saved: order_data.csv
Process all images in a folder:
python order_reader.py *.jpg *.png --output my_results.csvRun: pip install -r requirements.txt
Tesseract OCR is not installed. Follow Step 1 above.
- Check image quality (should be clear and readable)
- Try a higher resolution image
- Ensure text is horizontal and not rotated
- Verify the invoice has clear labels like "Order #:" or "Invoice Date:"
- Check that the image is not too blurry
- Try preprocessing the image (the tool does this automatically)
- Read the full README.md for detailed documentation
- Check out DEMO.md for example outputs
- Run
python test_order_reader.pyto see all features tested - Customize extraction patterns in
order_reader.pyfor your specific invoice format
If you encounter issues:
- Check that Tesseract is properly installed:
tesseract --version - Verify Python dependencies:
pip list | grep -E "(pytesseract|Pillow|opencv|pandas|tabulate)" - Run the test suite:
python test_order_reader.py - Review error messages - they often indicate what's wrong
# Process invoices with English and Urdu text
python order_reader.py invoice.png --lang eng+urdfrom order_reader import OrderDataExtractor
extractor = OrderDataExtractor(lang='eng')
result = extractor.process_image('invoice.jpg')
print(f"Order: {result['Order Number']}")
print(f"Date: {result['Order Date']}")You're all set! Start extracting invoice data automatically. 🚀