Vision Platform API

Enterprise-grade Document Intelligence & Real-time Vision API

A production-ready REST API combining best-in-class document processing with low-latency robotics vision capabilities.

Key Features

Document Intelligence

Mistral OCR 3 - Industry-leading accuracy, outperforms GPT-4o, Google, and Azure
Table Extraction - Preserves structure in HTML format
Document Classification - Auto-detect invoices, contracts, forms, receipts
Multi-page PDF - Process entire documents with page-by-page results

Real-time Vision (Coming Soon)

~50ms Latency - Powered by Groq's Llama 3.2 Vision
Face Detection - MediaPipe integration for real-time detection
Object Detection - Scene understanding for robotics
Video Processing - Frame extraction and analysis

Enterprise Ready

JWT + API Key Authentication - Secure access control
Health Checks - Kubernetes-ready liveness/readiness probes
Swagger Documentation - Auto-generated interactive API docs
Multi-Agent Architecture - Intelligent routing to specialist agents

Quick Start

1. Get API Access

# Clone the repository
git clone <repository-url>
cd Project-4_Vision&Langchain

# Create virtual environment
python -m venv .venv
.\.venv\Scripts\activate  # Windows
source .venv/bin/activate  # Linux/Mac

# Install dependencies
pip install -r requirements.txt

2. Configure API Keys

Create a .env file:

# Required
OPENAI_API_KEY=sk-proj-...
MISTRAL_API_KEY=your_mistral_key

# Optional (for robotics vision)
GROQ_API_KEY=gsk_...

# Authentication (change in production!)
SECRET_KEY=generate_with_openssl_rand_hex_32
ADMIN_USERNAME=admin
ADMIN_PASSWORD_HASH=$2b$12$...

3. Start the Server

uvicorn api.main:app --reload --port 8000

4. Access the API

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
Health Check: http://localhost:8000/api/v1/health/ready

Authentication

Login

curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "admin123"}'

Response:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
  "token_type": "bearer"
}

Use Token in Requests

curl http://localhost:8000/api/v1/agent/chat \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"message": "Extract text from invoice.pdf"}'

API Key Alternative

For service-to-service communication:

curl http://localhost:8000/api/v1/documents/ocr \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"file_path": "/path/to/document.pdf"}'

API Endpoints

Authentication

Method	Endpoint	Description	Auth
POST	`/api/v1/auth/login`	Login with credentials	No
POST	`/api/v1/auth/token`	OAuth2 token endpoint	No
GET	`/api/v1/auth/me`	Get current user	Yes
GET	`/api/v1/auth/status`	Auth system status	No

Multi-Agent Chat

Method	Endpoint	Description	Auth
POST	`/api/v1/agent/chat`	Chat with AI agents	Yes
POST	`/api/v1/agent/stream`	Stream responses (SSE)	Yes
GET	`/api/v1/agent/status`	Agent status & models	No
POST	`/api/v1/agent/reset`	Reset memory	Admin

Document Processing

Method	Endpoint	Description	Auth
POST	`/api/v1/documents/ocr`	Extract text	Yes
POST	`/api/v1/documents/tables`	Extract tables	Yes
POST	`/api/v1/documents/classify`	Classify document	Yes
POST	`/api/v1/documents/analyze`	Full analysis	Yes
POST	`/api/v1/documents/pdf`	Process PDF	Yes
POST	`/api/v1/documents/upload`	Upload file	Yes
GET	`/api/v1/documents/status`	Service status	No

Health

Method	Endpoint	Description	Auth
GET	`/api/v1/health/`	Health status	No
GET	`/api/v1/health/live`	Liveness probe	No
GET	`/api/v1/health/ready`	Readiness probe	No

Examples

Extract Text from Document

curl -X POST http://localhost:8000/api/v1/documents/ocr \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file_base64": "BASE64_ENCODED_IMAGE",
    "extract_tables": true
  }'

Chat with Multi-Agent System

curl -X POST http://localhost:8000/api/v1/agent/chat \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What type of document is invoice.pdf and extract all the line items?"
  }'

Upload and Process File

curl -X POST http://localhost:8000/api/v1/documents/upload \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@document.pdf"

Architecture

                         Client Request
                              ↓
                    ┌─────────────────┐
                    │   FastAPI App   │
                    │   + JWT Auth    │
                    └────────┬────────┘
                             ↓
              ┌──────────────┴──────────────┐
              ↓                             ↓
     ┌────────────────┐           ┌────────────────┐
     │   /documents   │           │    /agent      │
     │   Direct OCR   │           │  Multi-Agent   │
     └────────────────┘           └────────────────┘
              ↓                             ↓
              │                   ┌─────────────────┐
              │                   │   SUPERVISOR    │
              │                   │  (gpt-4o-mini)  │
              │                   └────────┬────────┘
              │                            ↓
              │              ┌─────────────┴─────────────┐
              ↓              ↓                           ↓
     ┌────────────────┐  ┌────────────────┐   ┌────────────────┐
     │  Mistral OCR 3 │  │ Document Agent │   │  Video Agent   │
     │  $2/1000 pages │  │ mistral-large  │   │ Groq (~50ms)   │
     └────────────────┘  └────────────────┘   └────────────────┘

Model Configuration

Component	Model	Provider	Latency	Cost
Supervisor	gpt-4o-mini	OpenAI	~100ms	$0.15/1M tokens
Document Agent	mistral-large-latest	Mistral	~500ms	~$2/1M tokens
Video Agent	llama-3.2-11b-vision	Groq	~50ms	$0.18/1M tokens
OCR	mistral-ocr-2512	Mistral	~1s	$2/1000 pages

Deployment

Docker (Coming Soon)

docker-compose up -d

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	Yes	OpenAI API key for supervisor
`MISTRAL_API_KEY`	Yes	Mistral API key for OCR
`GROQ_API_KEY`	No	Groq API key for robotics vision
`SECRET_KEY`	Yes	JWT signing key (32+ bytes)
`AUTH_ENABLED`	No	Enable/disable auth (default: True)

Performance

Operation	Typical Latency	Throughput
OCR (single page)	~1-2s	30 pages/min
Document Classification	~500ms	120 req/min
Multi-Agent Chat	~3-5s	20 req/min
Video Frame Analysis	~50ms	1200 frames/min

Support & Contact

Issues: GitHub Issues
Documentation: Full API Docs
Enterprise Support: contact@example.com

License

MIT License - See LICENSE file for details.

Roadmap

Document Intelligence (Mistral OCR 3)
Multi-Agent System (LangGraph)
Authentication (JWT + API Keys)
Video/Robotics Vision (Groq)
Async Job Processing (Celery)
Docker Deployment
Cloud Deployment (AWS/GCP/Azure)
Usage Analytics & Billing

Built with FastAPI, LangChain, Mistral AI, and Groq

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
api		api
config		config
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_authentication.py		test_authentication.py
test_document_intelligence.py		test_document_intelligence.py
test_image.png		test_image.png
test_mistral_ocr.py		test_mistral_ocr.py
test_video_agent.py		test_video_agent.py

Folders and files

Latest commit

History

Repository files navigation

Vision Platform API

Key Features

Document Intelligence

Real-time Vision (Coming Soon)

Enterprise Ready

Quick Start

1. Get API Access

2. Configure API Keys

3. Start the Server

4. Access the API

Authentication

Login

Use Token in Requests

API Key Alternative

API Endpoints

Authentication

Multi-Agent Chat

Document Processing

Health

Examples

Extract Text from Document

Chat with Multi-Agent System

Upload and Process File

Architecture

Model Configuration

Deployment

Docker (Coming Soon)

Environment Variables

Performance

Support & Contact

License

Roadmap

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages