Priority: High (for production)
Complexity: Medium
Estimated Effort: 4-6 hours
Status: Documented, Deferred for Demo Environment
Cost Impact: +$35-60/month
The EC2 instance currently receives a public IP address from the default VPC (which has MapPublicIpOnLaunch = true). While the security group restricts all inbound access except from the ALB, this configuration is not ideal for production environments.
The instance requires internet access for:
- Docker Hub: Pull Docker images (
ollama/ollama,python:3.11-slim) - S3: Download Ollama model files (6.7 GB)
- AWS SSM: Communicate with Systems Manager for remote access
- Package Updates: Install system packages via
yum/dnf
The default VPC has:
- ❌ No NAT Gateway
- ❌ No VPC Endpoints (S3, SSM, EC2 Messages)
- ✅ Only public subnets with internet gateway
Mitigation in place:
- Security group only allows port 8000 inbound from ALB security group
- No SSH access configured
- All access via SSM Session Manager (which uses HTTPS)
- Public IP provides no actual access due to security group restrictions
Risk level: Low for demo/development, but not production-ready.
Changes required:
- Create private subnet in default VPC (or new VPC)
- Deploy NAT Gateway in public subnet (~$32/month + $0.045/GB)
- Add VPC Endpoints:
- S3 Gateway Endpoint (free, requires route table update)
- SSM VPC Endpoint (~$7.30/month)
- EC2 Messages VPC Endpoint (~$7.30/month)
- SSM Messages VPC Endpoint (~$7.30/month)
- Update
instance.tf:resource "aws_instance" "ttb" { subnet_id = aws_subnet.private.id associate_public_ip_address = false # ... rest of config }
- Update route tables for private subnet
Total cost: ~$50-60/month
Benefits:
- No public IP exposure
- Reduced data transfer costs via S3 Gateway Endpoint
- Better security posture for production
- Still allows outbound internet access for Docker Hub
Changes required:
- Create private subnet
- Deploy NAT Gateway in public subnet
- Update instance to use private subnet without public IP
- Update route tables
Total cost: ~$35-40/month
Benefits:
- Simpler than Option 1 (no VPC endpoints to manage)
- No public IP on instance
- Still requires public internet for all services
Drawbacks:
- Higher data transfer costs for S3 (no Gateway Endpoint)
- All SSM traffic goes through NAT instead of VPC Endpoints
Changes required:
- Create new VPC with CIDR block
- Create public and private subnets across 2+ AZs
- Deploy Internet Gateway for public subnets
- Deploy NAT Gateway in public subnet
- Add S3 Gateway Endpoint
- Add SSM VPC Endpoints
- Migrate all resources to new VPC
- Update all security group references
Total cost: ~$35-50/month
Effort: 8-12 hours
Benefits:
- Clean slate architecture
- Multi-AZ redundancy (if deployed to 2+ NAT Gateways)
- Full control over network topology
- Best practice architecture
Phase 1 (Current - Demo/Dev):
- ✅ Keep public IP with security group restrictions
- ✅ Document the limitation (this file + README.md + DEPLOYMENT_GUIDE.md)
- ✅ Accept the risk for non-production environments
Phase 2 (Pre-Production):
- Implement Option 1 (VPC Endpoints + NAT Gateway)
- Test all functionality still works (Docker, S3, SSM)
- Validate cost estimates
Phase 3 (Production):
- Consider Option 3 (Custom VPC) if scale/redundancy requirements grow
- Add multi-AZ architecture
- Implement VPC Flow Logs for network monitoring
- Create Terraform resources for VPC endpoints
- Create private subnet(s) in default VPC or new VPC
- Deploy NAT Gateway with Elastic IP
- Update route tables (private subnet → NAT Gateway, S3 → Gateway Endpoint)
- Update
instance.tfwithassociate_public_ip_address = false - Update
instance.tfwithsubnet_idpointing to private subnet - Test deployment end-to-end
- Verify SSM access still works
- Verify S3 model download still works
- Verify Docker image pulls still work
- Update documentation to reflect new architecture
- Monitor costs for first month
| Configuration | Monthly Cost | Security | Complexity |
|---|---|---|---|
| Current (Public IP) | $0 | Medium | Low |
| NAT Gateway Only | ~$35-40 | High | Medium |
| VPC Endpoints + NAT | ~$50-60 | Very High | Medium-High |
| Custom VPC | ~$35-50 | Very High | High |
Decision for demo environment: Keep current configuration to minimize costs and complexity while documenting the limitation.
Decision for production: Implement Option 1 (VPC Endpoints + NAT Gateway) for optimal security/cost balance.
Priority: Medium
Complexity: Low
Estimated Effort: 2-4 hours
Status: Proposed
Add a /model-status endpoint that provides real-time visibility into Ollama model download progress during EC2 initialization.
- Model downloads in background during instance initialization
- No visibility into download progress from API
- Users must SSH or use SSM to check
/var/log/ollama-model-download.log - Health endpoint only shows "available" or "unavailable" (binary state)
Endpoint: GET /model-status
Response:
{
"model": "llama3.2-vision",
"status": "downloading",
"progress": {
"downloaded_bytes": 3221225472,
"total_bytes": 7201866752,
"percentage": 44.7,
"estimated_time_remaining_seconds": 120
},
"source": "s3",
"started_at": "2026-02-17T20:15:45Z",
"updated_at": "2026-02-17T20:16:30Z"
}Status Values:
not_started: Model download not yet initiateddownloading: Download in progressextracting: Downloaded, currently extractingready: Model fully availableerror: Download or extraction failed
-
Progress Tracking:
- Modify
instance_init.shto write progress to shared file (e.g.,/var/run/model-status.json) - Parse AWS CLI output or use callback hooks to track download progress
- Update timestamp on each progress update
- Modify
-
API Endpoint:
- Read progress file from
/var/run/model-status.json - Return 404 if file doesn't exist (model never attempted)
- Cache file reads (check every 5 seconds max)
- Read progress file from
-
Background Process:
- Modify background download process in
instance_init.sh:echo '{"status":"downloading","model":"llama3.2-vision","started_at":"'$(date -Iseconds)'"}' > /var/run/model-status.json # Download with progress callback aws s3 cp ... | while read line; do # Parse progress and update JSON file done echo '{"status":"ready","model":"llama3.2-vision","completed_at":"'$(date -Iseconds)'"}' > /var/run/model-status.json
- Modify background download process in
-
File Permissions:
- Ensure progress file is readable by application user
- Use tmpfs mount for performance (in-memory file)
- Real-time visibility into model download during degraded mode
- Better user experience - know when full functionality will be available
- Operational insight for monitoring/debugging
- Could be displayed in future UI
- WebSocket Streaming: More complex, requires persistent connections
- Server-Sent Events (SSE): Good for real-time but adds complexity
- Simple Polling: Current proposal - simplest, good enough for this use case
- None (self-contained enhancement)
- Resolves lack of visibility during model download
- Complements
/healthendpoint (which only shows binary available/unavailable)
Store request artifacts (images, extracted text, validation results) in S3 for:
- Debugging failed validations
- Reproducing issues
- Training data collection
- Compliance/audit trail
- Performance analysis and optimization
- Debugging: When a validation fails unexpectedly, retrieve the original image and validation results
- Compliance: Maintain audit trail of all label verifications for regulatory purposes
- Training: Collect real-world examples to improve OCR and validation accuracy
- Analytics: Analyze patterns in validation failures to identify common issues
S3 Bucket:
resource "aws_s3_bucket" "artifacts" {
bucket = "ttb-verifier-artifacts-{account-id}"
}Bucket Configuration:
- Versioning: Enabled
- Encryption: AES256 (server-side)
- Public access: Blocked
- Lifecycle policy:
- S3 Standard: 0-30 days
- S3 Standard-IA: 30-90 days
- S3 Glacier: 90-365 days
- Delete: After 365 days
IAM Permissions:
- EC2 role needs
s3:PutObject,s3:GetObjecton artifacts bucket - Optional: Separate IAM user/role for artifact retrieval/analysis
Optional Request Parameter:
@app.post("/verify")
async def verify_label(
file: UploadFile,
save_artifact: bool = False, # New parameter
...
):
# Existing validation logic
result = await validate_label(file)
# Optionally save to S3
if save_artifact:
artifact_id = str(uuid.uuid4())
await save_to_s3(file, result, artifact_id)
return resultArtifact Structure:
s3://bucket/artifacts/
├── YYYY/
│ ├── MM/
│ │ ├── DD/
│ │ │ ├── {request-id}/
│ │ │ │ ├── image.jpg # Original uploaded image
│ │ │ │ ├── metadata.json # Request metadata
│ │ │ │ ├── extracted_text.json # OCR output
│ │ │ │ └── validation_result.json # Final validation result
Metadata JSON Structure:
{
"request_id": "uuid",
"timestamp": "2026-02-17T10:30:00Z",
"client_ip": "1.2.3.4",
"image_size_bytes": 1234567,
"image_format": "image/jpeg",
"ocr_backend": "tesseract",
"validation_passed": true,
"processing_time_ms": 1523,
"model_version": "1.0.0"
}Artifact Retrieval Endpoint:
@app.get("/artifacts/{artifact_id}")
async def get_artifact(artifact_id: str):
"""Retrieve stored artifact by ID"""
# Download from S3 and return
pass
@app.get("/artifacts/{artifact_id}/image")
async def get_artifact_image(artifact_id: str):
"""Get original image from artifact"""
pass
@app.get("/artifacts")
async def list_artifacts(
start_date: Optional[date] = None,
end_date: Optional[date] = None,
validation_status: Optional[str] = None
):
"""List artifacts with filters"""
# Query S3 and return list
passTerraform Changes: ~1 hour
- Create S3 bucket
- Configure lifecycle policies
- Update IAM permissions
- Add outputs for bucket name
API Changes: ~3-4 hours
- Add S3 client configuration
- Implement artifact upload logic
- Add metadata generation
- Implement retrieval endpoints
- Add error handling
Testing: ~1-2 hours
- Test artifact upload/download
- Verify lifecycle policies
- Test retrieval endpoints
- Load testing with artifacts enabled
Total Estimate: 5-7 hours
Storage:
- Average image size: 2 MB
- Requests per day: 100
- Storage per month: ~6 GB
- Cost: ~$0.14/month (Standard) + $0.05/month (Standard-IA) + Glacier costs
API Requests:
- PUT requests: ~$0.005 per 1000 requests
- GET requests: ~$0.0004 per 1000 requests
- Total API cost: <$1/month for moderate usage
Total Estimated Monthly Cost: $1-2/month
Environment Variables:
environment:
- ARTIFACT_STORAGE_ENABLED=true
- ARTIFACT_S3_BUCKET=ttb-verifier-artifacts-{account-id}
- ARTIFACT_DEFAULT_SAVE=false # Only save when explicitly requested
- ARTIFACT_RETENTION_DAYS=365- PII: Label images may contain brand names but typically don't contain PII
- Access Control: Implement IAM policies to restrict who can access artifacts
- Encryption: Use S3 server-side encryption for all artifacts
- Compliance: Ensure retention policies meet regulatory requirements
- Data Minimization: Only store what's necessary for debugging/audit
- Right to Delete: Provide mechanism to delete artifacts on request
- Batch Analysis: Analyze multiple artifacts to identify patterns
- ML Training Pipeline: Automatically extract training data from artifacts
- Comparison Tool: Compare OCR results across different backends
- Performance Dashboard: Visualize validation metrics over time
- Anomaly Detection: Alert on unusual validation patterns