OCR Image Text Extraction API

A production-ready serverless OCR API deployed on Google Cloud Run. Extract text from images with intelligent caching and structured logging.

Live API: https://ocr-api-663394155406.asia-southeast1.run.app

Working Directory: All commands in this guide should be run from the project root: /Users/ehz/flexbone

Quick Start

Test the API

# Extract text from invoice
curl -X POST -F "image=@testimages/invoice.jpg" \
  https://ocr-api-663394155406.asia-southeast1.run.app/extract-text

# Extract from handwritten note
curl -X POST -F "image=@testimages/note.png" \
  https://ocr-api-663394155406.asia-southeast1.run.app/extract-text

# Check health
curl https://ocr-api-663394155406.asia-southeast1.run.app/health

Interactive Docs

Swagger UI: https://ocr-api-663394155406.asia-southeast1.run.app/docs

Features

OCR: Extract text from JPG, PNG, GIF, WebP, BMP images
Batch Processing: Process up to 10 images at once (CLI/API only - see note below)
Smart Caching: LRU cache with 1000 entries, 1-hour TTL (~50% cost savings)
Multi-Language: Auto-detects 50+ languages
Production-Ready: Serverless, auto-scales, structured logging

Note on Batch Processing: Batch processing is available via cURL command-line requests and the Swagger UI at /docs. The web interface frontend currently supports single-image extraction only.

API Endpoints

Endpoint	Method	Purpose	Availability
`/extract-text`	POST	Extract text from single image	Web UI, cURL, Swagger UI
`/batch-extract`	POST	Extract text from multiple images (10 max)	cURL, Swagger UI only
`/health`	GET	Health check	All

Single Image Extraction (available everywhere):

curl -X POST -F "image=@testimages/invoice.jpg" \
  https://ocr-api-663394155406.asia-southeast1.run.app/extract-text

Batch Processing (cURL & Swagger UI only):

curl -X POST \
  -F "images=@testimages/invoice.jpg" \
  -F "images=@testimages/receipt.jpg" \
  https://ocr-api-663394155406.asia-southeast1.run.app/batch-extract

Single Image Response:

{
  "success": true,
  "text": "INVOICE\nTotal: $125.00",
  "confidence": 0.98,
  "processing_time_ms": 45,
  "metadata": {
    "cached": true,
    "language": "en"
  }
}

Batch Processing Response:

{
  "success": true,
  "results": [
    {
      "index": 0,
      "filename": "invoice.jpg",
      "success": true,
      "text": "INVOICE\nTotal: $125.00",
      "confidence": 0.98,
      "metadata": {"cached": true, "language": "en"}
    },
    {
      "index": 1,
      "filename": "receipt.jpg",
      "success": true,
      "text": "RECEIPT\nTotal: $45.99",
      "confidence": 0.96,
      "metadata": {"cached": false, "language": "en"}
    }
  ],
  "total_images": 2,
  "processing_time_ms": 375,
  "failed_count": 0
}

Local Development

# Setup
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Authenticate
gcloud auth application-default login

# Run
uvicorn main:app --reload --port 8080

Visit: http://localhost:8080/docs

Deployment

Runs on Python 3.12 in Google Cloud Run Docker container:

# Deploy to Google Cloud Run
bash scripts/deploy.sh

# Test deployment (23 automated tests)
bash scripts/comprehensive_test.sh

Documentation

For API Users

Swagger UI: https://ocr-api-663394155406.asia-southeast1.run.app/docs (Interactive API documentation)
README.md - Quick start and examples

For Developers

PROJECT.md - Complete technical documentation (architecture, deployment, costs, testing)
Code: Clean, well-structured Python modules with clear comments

Architecture

Client → Cloud Run → Validators → OCR Service → Google Vision API
                         ↓             ↓
                    Middleware    LRU Cache (1000 entries, 1hr TTL)
                         ↓             
                    Rate Limiter  
                         ↓             
                    Logger (Cloud Logging)

Tech Stack: FastAPI, Google Cloud Vision API, Cloud Run, Python 3.12

Configuration

Key environment variables:

GOOGLE_CLOUD_PROJECT=your-project-id
MAX_FILE_SIZE_MB=10
MAX_CACHE_SIZE=1000
CACHE_TTL_SECONDS=3600
RATE_LIMIT_ENABLED=false

See PROJECT.md for complete configuration guide.

Performance & Costs

Response Times:

Cache hit: <50ms
Cache miss: 200-500ms

Costs (with 50% cache hit rate):

1,000 requests/month: $0 (free tier)
5,000 requests/month: ~$2.75
20,000 requests/month: ~$12.50

See PROJECT.md for detailed cost analysis.

Testing

# Run test suite (23 automated tests)
bash scripts/comprehensive_test.sh

Support

Issues: Check PROJECT.md for technical details and implementation information
API Docs: Swagger UI (Interactive documentation)
Project Details: PROJECT.md (Complete technical guide)

Status: Production Ready | Version: 1.0.0 | Updated: 2025-10-20

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
middleware		middleware
scripts		scripts
services		services
static		static
testimages		testimages
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
PROJECT.md		PROJECT.md
README.md		README.md
config.py		config.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCR Image Text Extraction API

Quick Start

Test the API

Interactive Docs

Features

API Endpoints

Local Development

Deployment

Documentation

For API Users

For Developers

Architecture

Configuration

Performance & Costs

Testing

Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ehzawad/flexbone

Folders and files

Latest commit

History

Repository files navigation

OCR Image Text Extraction API

Quick Start

Test the API

Interactive Docs

Features

API Endpoints

Local Development

Deployment

Documentation

For API Users

For Developers

Architecture

Configuration

Performance & Costs

Testing

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages