Enterprise-grade Document Intelligence & Real-time Vision API
A production-ready REST API combining best-in-class document processing with low-latency robotics vision capabilities.
- Mistral OCR 3 - Industry-leading accuracy, outperforms GPT-4o, Google, and Azure
- Table Extraction - Preserves structure in HTML format
- Document Classification - Auto-detect invoices, contracts, forms, receipts
- Multi-page PDF - Process entire documents with page-by-page results
- ~50ms Latency - Powered by Groq's Llama 3.2 Vision
- Face Detection - MediaPipe integration for real-time detection
- Object Detection - Scene understanding for robotics
- Video Processing - Frame extraction and analysis
- JWT + API Key Authentication - Secure access control
- Health Checks - Kubernetes-ready liveness/readiness probes
- Swagger Documentation - Auto-generated interactive API docs
- Multi-Agent Architecture - Intelligent routing to specialist agents
# Clone the repository
git clone <repository-url>
cd Project-4_Vision&Langchain
# Create virtual environment
python -m venv .venv
.\.venv\Scripts\activate # Windows
source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txtCreate a .env file:
# Required
OPENAI_API_KEY=sk-proj-...
MISTRAL_API_KEY=your_mistral_key
# Optional (for robotics vision)
GROQ_API_KEY=gsk_...
# Authentication (change in production!)
SECRET_KEY=generate_with_openssl_rand_hex_32
ADMIN_USERNAME=admin
ADMIN_PASSWORD_HASH=$2b$12$...uvicorn api.main:app --reload --port 8000- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- Health Check: http://localhost:8000/api/v1/health/ready
curl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "admin123"}'Response:
{
"access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
"token_type": "bearer"
}curl http://localhost:8000/api/v1/agent/chat \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"message": "Extract text from invoice.pdf"}'For service-to-service communication:
curl http://localhost:8000/api/v1/documents/ocr \
-H "X-API-Key: your_api_key" \
-H "Content-Type: application/json" \
-d '{"file_path": "/path/to/document.pdf"}'| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /api/v1/auth/login |
Login with credentials | No |
| POST | /api/v1/auth/token |
OAuth2 token endpoint | No |
| GET | /api/v1/auth/me |
Get current user | Yes |
| GET | /api/v1/auth/status |
Auth system status | No |
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /api/v1/agent/chat |
Chat with AI agents | Yes |
| POST | /api/v1/agent/stream |
Stream responses (SSE) | Yes |
| GET | /api/v1/agent/status |
Agent status & models | No |
| POST | /api/v1/agent/reset |
Reset memory | Admin |
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /api/v1/documents/ocr |
Extract text | Yes |
| POST | /api/v1/documents/tables |
Extract tables | Yes |
| POST | /api/v1/documents/classify |
Classify document | Yes |
| POST | /api/v1/documents/analyze |
Full analysis | Yes |
| POST | /api/v1/documents/pdf |
Process PDF | Yes |
| POST | /api/v1/documents/upload |
Upload file | Yes |
| GET | /api/v1/documents/status |
Service status | No |
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| GET | /api/v1/health/ |
Health status | No |
| GET | /api/v1/health/live |
Liveness probe | No |
| GET | /api/v1/health/ready |
Readiness probe | No |
curl -X POST http://localhost:8000/api/v1/documents/ocr \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_base64": "BASE64_ENCODED_IMAGE",
"extract_tables": true
}'curl -X POST http://localhost:8000/api/v1/agent/chat \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"message": "What type of document is invoice.pdf and extract all the line items?"
}'curl -X POST http://localhost:8000/api/v1/documents/upload \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@document.pdf" Client Request
↓
┌─────────────────┐
│ FastAPI App │
│ + JWT Auth │
└────────┬────────┘
↓
┌──────────────┴──────────────┐
↓ ↓
┌────────────────┐ ┌────────────────┐
│ /documents │ │ /agent │
│ Direct OCR │ │ Multi-Agent │
└────────────────┘ └────────────────┘
↓ ↓
│ ┌─────────────────┐
│ │ SUPERVISOR │
│ │ (gpt-4o-mini) │
│ └────────┬────────┘
│ ↓
│ ┌─────────────┴─────────────┐
↓ ↓ ↓
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Mistral OCR 3 │ │ Document Agent │ │ Video Agent │
│ $2/1000 pages │ │ mistral-large │ │ Groq (~50ms) │
└────────────────┘ └────────────────┘ └────────────────┘
| Component | Model | Provider | Latency | Cost |
|---|---|---|---|---|
| Supervisor | gpt-4o-mini | OpenAI | ~100ms | $0.15/1M tokens |
| Document Agent | mistral-large-latest | Mistral | ~500ms | ~$2/1M tokens |
| Video Agent | llama-3.2-11b-vision | Groq | ~50ms | $0.18/1M tokens |
| OCR | mistral-ocr-2512 | Mistral | ~1s | $2/1000 pages |
docker-compose up -d| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI API key for supervisor |
MISTRAL_API_KEY |
Yes | Mistral API key for OCR |
GROQ_API_KEY |
No | Groq API key for robotics vision |
SECRET_KEY |
Yes | JWT signing key (32+ bytes) |
AUTH_ENABLED |
No | Enable/disable auth (default: True) |
| Operation | Typical Latency | Throughput |
|---|---|---|
| OCR (single page) | ~1-2s | 30 pages/min |
| Document Classification | ~500ms | 120 req/min |
| Multi-Agent Chat | ~3-5s | 20 req/min |
| Video Frame Analysis | ~50ms | 1200 frames/min |
- Issues: GitHub Issues
- Documentation: Full API Docs
- Enterprise Support: contact@example.com
MIT License - See LICENSE file for details.
- Document Intelligence (Mistral OCR 3)
- Multi-Agent System (LangGraph)
- Authentication (JWT + API Keys)
- Video/Robotics Vision (Groq)
- Async Job Processing (Celery)
- Docker Deployment
- Cloud Deployment (AWS/GCP/Azure)
- Usage Analytics & Billing
Built with FastAPI, LangChain, Mistral AI, and Groq