Multi-Language Image Description API

A serverless API built with AWS SAM that provides AI-powered image descriptions in 20+ languages with both text and audio output options.

🚀 Features

🌍 Multi-Language Support

20+ Languages: English, Spanish, Japanese, French, German, Italian, Portuguese, Russian, Korean, Chinese, Arabic, Hindi, Turkish, Polish, Dutch, Swedish, Danish, Norwegian, Finnish, Icelandic
ISO 639-1 Codes: Standard two-letter language codes (en, es, ja, etc.)
Case Insensitive: EN, en, Es all work
Smart Defaults: Auto-selects appropriate voices for each language

🎵 Audio Descriptions

Amazon Polly Integration: Natural-sounding speech synthesis
50+ Voices: Multiple voice options per language
Neural Engine: High-quality voice synthesis
MP3 Output: Base64-encoded audio data

🖼️ Image Processing

Multiple Formats: JPEG, PNG, GIF, WebP
Auto-Resizing: Cost optimization while maintaining quality
Format Detection: Automatic image format recognition
Base64 Input: Easy integration with web applications

🏗️ Architecture

API Gateway: RESTful API with two specialized endpoints
Lambda: Serverless compute (1024MB, 60s timeout)
Amazon Bedrock: AI model for image analysis
Amazon Polly: Text-to-speech synthesis
CloudWatch: Comprehensive logging and monitoring

📋 API Endpoints

1. Text Descriptions

POST /describe/text

Get detailed text descriptions in any supported language.

Request

{
  "image": "base64_encoded_image_data",
  "language": "es"  // Optional, defaults to "en"
}

Response

{
  "description": "Una hermosa puesta de sol sobre un paisaje montañoso...",
  "format": "text",
  "language": "es"
}

2. Audio Descriptions

POST /describe/audio

Get descriptions as natural-sounding audio in any supported language.

Request

{
  "image": "base64_encoded_image_data",
  "language": "ja",     // Optional, defaults to "en"
  "voice": "Mizuki"     // Optional, auto-selected based on language
}

Response

{
  "description": "この画像には美しい夕日が山の風景に映っています...",
  "audio": "base64_encoded_mp3_data",
  "format": "audio",
  "voice": "Mizuki",
  "language": "ja"
}

🌐 Supported Languages & Voices

Language	Code	Default Voice	Additional Voices
English	`en`	Joanna	Matthew, Amy, Brian, Emma, Olivia
Spanish	`es`	Lucia	Conchita, Enrique, Miguel
Japanese	`ja`	Mizuki	Takumi
French	`fr`	Celine	Lea, Mathieu
German	`de`	Marlene	Vicki, Hans
Italian	`it`	Carla	Bianca, Giorgio
Portuguese	`pt`	Camila	Vitoria, Ricardo
Russian	`ru`	Tatyana	Maxim
Korean	`ko`	Seoyeon	-
Chinese	`zh`	Zhiyu	-
+10 more languages

🚀 Deployment

Prerequisites

AWS CLI configured with appropriate permissions
SAM CLI installed
Python 3.9+

Quick Deploy

# Build the application
sam build

# Deploy to AWS
sam deploy --stack-name image-description-api \
           --capabilities CAPABILITY_IAM \
           --region us-east-1 \
           --resolve-s3

Guided Deployment (First Time)

sam deploy --guided

🧪 Local Development

Start API Locally

sam local start-api

Test Individual Function

sam local invoke ImageDescriptionFunction --event events/event.json

Test with Sample Data

# Test text endpoint
curl -X POST http://localhost:3000/describe/text \
  -H "Content-Type: application/json" \
  -d '{"image": "base64_image_data", "language": "es"}'

# Test audio endpoint  
curl -X POST http://localhost:3000/describe/audio \
  -H "Content-Type: application/json" \
  -d '{"image": "base64_image_data", "language": "ja", "voice": "Mizuki"}'

🔧 Configuration

Environment Variables

AWS_REGION: AWS region for deployment
LOG_LEVEL: Logging level (INFO, DEBUG, ERROR)

AWS Services Used

Amazon Bedrock: AI model for image analysis
Amazon Polly: Text-to-speech synthesis
CloudWatch Logs: Application logging
API Gateway: REST API management
Lambda: Serverless compute

Required IAM Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow", 
      "Action": [
        "polly:SynthesizeSpeech"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream", 
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }
  ]
}

📊 Performance

Response Times: 1.5-4 seconds depending on language complexity
Concurrent Requests: Auto-scaling Lambda functions
Image Size Limit: 10MB (auto-resized for optimization)
Timeout: 60 seconds for audio processing, 30 seconds for text

🔍 Error Handling

HTTP Status Codes

200: Success
400: Bad Request (invalid image, language, or voice)
404: Endpoint not found
405: Method not allowed
500: Internal server error

Common Error Messages

// Invalid language
{
  "error": "Invalid language code: xyz. Use ISO 639-1 two-letter codes (e.g., en, es, ja)"
}

// Missing image
{
  "error": "No image provided in the request"
}

// Voice not available for language
{
  "error": "Voice Matthew is not available for language ja"
}

🧪 Testing

Run Test Suite

# Install test dependencies
pip install pytest requests

# Run comprehensive tests
python test_language_api.py

Manual Testing

# Test multiple languages
curl -X POST https://your-api-url/describe/text \
  -H "Content-Type: application/json" \
  -d '{"image": "base64_data", "language": "es"}'

📈 Monitoring

CloudWatch Metrics

Function duration
Error rates
Concurrent executions
Memory utilization

Custom Logs

Request/response logging
Language detection
Voice selection
Error tracking

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

MIT License

🆘 Support

Issues: Create GitHub issues for bugs
Features: Submit feature requests via GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
sample-apps/camera-narrator		sample-apps/camera-narrator
src		src
.gitignore		.gitignore
README.md		README.md
samconfig.toml		samconfig.toml
template.yaml		template.yaml

Folders and files

Latest commit

History

Repository files navigation

Multi-Language Image Description API

🚀 Features

🌍 Multi-Language Support

🎵 Audio Descriptions

🖼️ Image Processing

🏗️ Architecture

📋 API Endpoints

1. Text Descriptions

Request

Response

2. Audio Descriptions

Request

Response

🌐 Supported Languages & Voices

🚀 Deployment

Prerequisites

Quick Deploy

Guided Deployment (First Time)

🧪 Local Development

Start API Locally

Test Individual Function

Test with Sample Data

🔧 Configuration

Environment Variables

AWS Services Used

Required IAM Permissions

📊 Performance

🔍 Error Handling

HTTP Status Codes

Common Error Messages

🧪 Testing

Run Test Suite

Manual Testing

📈 Monitoring

CloudWatch Metrics

Custom Logs

🤝 Contributing

📄 License

🆘 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages