Skip to content

Leviethal/Inter-IIT-14.0

Repository files navigation

🚀 geoNLI Backend

High-performance Flask API for geometric natural language interpretation over satellite imagery

Flask Python PyTorch Modal


🎯 Overview

geoNLI Backend is the AI-powered REST API that processes satellite imagery through a sophisticated multi-model pipeline to deliver:

  • 📝 Dense image captioning via Qwen2-VL
  • 🎯 Object detection & grounding with OWLv2
  • 🔄 Oriented bounding box (OBB) generation using rotated detectors
  • 📐 Geometric reasoning (areas, distances, spatial relationships)
  • 💬 Visual question answering with fact-augmented context

Built on Flask with CORS support, it's designed for seamless integration with the geoNLI frontend and supports both local deployment and cloud hosting (Modal).


✨ Key Features

🧠 Multi-Model AI Pipeline

  • Captioning: Qwen2-VL-7B
  • Grounding: OWLv2-style open-vocabulary detection
  • Segmentation: SAM-based mask generation
  • VQA: Fact-augmented question answering with geometric context using Qwen2-VL-7B

🌐 RESTful API

  • /api/upload - Image upload with UUID generation
  • /api/chat - Natural language query processing
  • /api/status - Model loading status monitoring
  • /api/evaluation - Batch evaluation endpoints
  • /uploads/<filename> - Static file serving

🔧 Production-Ready

  • CORS enabled for frontend integration
  • Robust path resolution for Docker/Modal environments
  • 50MB file size limit
  • Automatic upload folder creation
  • Comprehensive error handling & logging

Performance Optimizations

  • Singleton model loading (once at startup)
  • Efficient memory management for GPU constraints
  • Agent-based query routing for intelligent task dispatch

🏗️ Architecture

Directory Structure

webapp/
├── 📄 app.py                       # Flask application entrypoint
├── 📄 __init__.py                  # App factory & configuration
│
├── 📁 api/                         # REST API blueprints
│   ├── upload.py                   # POST /api/upload - Image upload handle
|   |
│   ├── chat.py                     # POST /api/chat - Chat query processor
│   ├── status.py                   # GET /api/status - Model status check
│   ├── evaluation.py               # POST /api/evaluation - Batch eval
│   ├── caption.py                  #  Direct captioning endpoint
│   ├── grounding.py                #  Direct grounding endpoint
│   ├── vqa.py                      #  Direct VQA endpoint
│   ├── general.py                  #  General-purpose routes
│   └── __init__.py
│
├── 📁 utils/                       # Helper utilities
│   ├── file_handler.py             # File validation & processing
│   ├── response_formatter.py       # JSON response standardization
│   ├── validator.py                # Input validation logic
│   └── __init__.py
│
└── 📁 templates/                   # (Optional) HTML templates for web UI

Core Pipeline Integration

The backend delegates heavy ML processing to src/pipeline/:

src/
├── pipeline/
│   ├── inference_pipeline.py       # GeoNLIPipeline orchestrator
│   └── agent.py                    # GeoAgent (query router)
│
├── models/
│   ├── captioning/                 # Qwen2-VL / InternVL2 wrappers
│   ├── grounding/                  # Detection model wrappers
│   └── vqa/                        # VQA engine implementations
│
└── tools/
    ├── manager.py                  # Shared model singleton manager
    └── ...

🚀 Getting Started

Prerequisites

  • Python 3.10+
  • CUDA 11.8+ (for GPU acceleration)
  • 8GB+ VRAM (or 16GB+ for full pipeline)
  • Node.js 18+ (for frontend, optional)

Installation

  1. Clone the repository:

    git clone https://github.com/Leviethal/Inter-IIT-14.0.git
    cd Inter-IIT-14.0
  2. Create virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Download model weights (if not using Modal):

    # Qwen2-VL
    huggingface-cli download Qwen/Qwen2-VL-7B-Instruct
    
    # InternImage (for detection)
    # Follow InternImage repo instructions
  5. Configure paths:

    Edit config/inference_config.yaml to point to your model checkpoints.


🎮 Usage

Local Development

Start the Flask server:

python webapp/app.py

The API will be available at:

http://localhost:15200

Production (Modal)

Deploy to Modal cloud:

modal deploy modal_deployment.py

Your API will be accessible at:

https://<username>--geonli-backend-flask-app-dev.modal.run

🔌 API Reference

1. Upload Image

Endpoint: POST /api/upload

Description: Upload a satellite image for processing.

Request:

POST /api/upload
Content-Type: multipart/form-data

file: <image file>

Response:

{
  "message": "Upload successful",
  "file_id": "a1b2c3d4e5f6.jpg",
  "url": "/uploads/a1b2c3d4e5f6.jpg",
  "local_path": "/path/to/data/uploads/a1b2c3d4e5f6.jpg"
}

Supported Formats: PNG, JPG, JPEG, TIF, TIFF, WEBP
Max File Size: 50MB


2. Chat Query

Endpoint: POST /api/chat

Description: Process a natural language query about an uploaded image.

Request:

{
  "text": "How many buildings are visible?",
  "image_url": "http://localhost:15200/uploads/a1b2c3d4e5f6.jpg",
  "session_id": "user_session_123"
}

Response:

{
  "reply": "There are 5 buildings visible in the image.",
  "grounding": [
    {
      "bbox": [512, 384, 120, 80, 0.45],
      "score": 0.92,
      "label": "building",
      "object_id": "0"
    }
  ],
  "display_mode": "box",
  "debug_intent": "counting",
  "raw_results": {
    "connections": [
      {
        "from": 0,
        "to": 3,
        "label": "150 m"
      }
    ]
  }
}

Query Types:

  • Caption: "Describe the image"
  • Grounding: "Locate all houses"
  • Counting: "How many trees?"
  • Measurement: "What is the distance between A and B?"
  • Comparison: "Which building is larger?"

3. Status Check

Endpoint: GET /api/status

Description: Check model loading status.

Response:

{
  "status": "ready",
  "models_loaded": {
    "vqa": true,
    "detector": true,
    "segmenter": true,
    "parser": true
  },
  "memory_usage_mb": 4832
}

4. Serve Uploaded Files

Endpoint: GET /uploads/<filename>

Description: Serve uploaded images for frontend display.

Example:

http://localhost:15200/uploads/a1b2c3d4e5f6.jpg

🧩 Key Components

app.py - Application Entrypoint

Initializes the Flask app and loads the ML pipeline on startup:

from webapp import create_app
from src.pipeline.inference_pipeline import GeoNLIPipeline

app = create_app()
app.pipeline = GeoNLIPipeline()

if __name__ == "__main__":
    app.logger.info("Loading ML Models...")
    app.pipeline.load_models()
    app.run(host="0.0.0.0", port=15200)

__init__.py - App Factory

Creates and configures the Flask application:

def create_app():
    app = Flask(__name__)
    
    # Enable CORS for frontend
    CORS(app, resources={r"/*": {"origins": "*"}})
    
    # Configure uploads
    app.config['UPLOAD_FOLDER'] = 'data/uploads'
    app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024
    
    # Register blueprints
    from webapp.api.upload import upload_bp
    from webapp.api.chat import chat_bp
    app.register_blueprint(upload_bp)
    app.register_blueprint(chat_bp)
    
    return app

api/upload.py - File Upload Handler

Handles image uploads with UUID-based naming:

@upload_bp.route('/api/upload', methods=['POST'])
def upload_file():
    file = request.files['file']
    
    if file and allowed_file(file.filename):
        ext = file.filename.rsplit('.', 1)[1].lower()
        unique_name = f"{uuid.uuid4().hex}.{ext}"
        save_path = os.path.join(current_app.config['UPLOAD_FOLDER'], unique_name)
        
        file.save(save_path)
        
        return jsonify({
            "file_id": unique_name,
            "url": f"/uploads/{unique_name}"
        }), 201

api/chat.py - Query Processor

Routes queries through the GeoAgent with robust path resolution:

@chat_bp.route('/api/chat', methods=['POST'])
def chat_endpoint():
    data = request.get_json()
    
    user_text = data.get("text")
    image_url = data.get("image_url")
    
    # Resolve image path (handles Docker/Modal environments)
    filename = os.path.basename(image_url)
    local_path = find_uploaded_file(filename)
    
    # Process query through agent
    result = agent.process_query(session_id, local_path, user_text)
    
    return jsonify(result), 200

Path Resolution Strategy:

  1. Check configured UPLOAD_FOLDER
  2. Try Docker path /root/data/uploads/
  3. Check relative data/uploads/
  4. Fallback to uploads/
  5. Use URL as absolute path if exists

🔧 Configuration

Environment Variables

# Flask settings
FLASK_ENV=production
FLASK_DEBUG=0

# Model paths
MODEL_CACHE_DIR=/path/to/models
QWEN_MODEL_PATH=/path/to/Qwen2-VL-7B-Instruct

# Upload settings
UPLOAD_FOLDER=data/uploads
MAX_FILE_SIZE_MB=50

# Server settings
HOST=0.0.0.0
PORT=15200

YAML Configuration

Edit config/inference_config.yaml:

models:
  vqa:
    model_name: "Qwen/Qwen2-VL-7B-Instruct"
    device: "cuda:0"
  
  detector:
    config_path: "configs/internimage_det.py"
    checkpoint: "checkpoints/internimage_l.pth"
  
  segmenter:
    model_type: "vit_h"
    checkpoint: "checkpoints/sam_vit_h.pth"

pipeline:
  batch_size: 1
  enable_caching: true
  cleanup_memory: true

🐛 Troubleshooting

CORS Errors

Problem: Frontend can't access backend API.

Solution: Ensure CORS is enabled in __init__.py:

from flask_cors import CORS
CORS(app, resources={r"/*": {"origins": "*"}})

For production, restrict origins:

CORS(app, resources={r"/*": {"origins": ["https://your-frontend.com"]}})

File Not Found (404)

Problem: Uploaded images return 404.

Solution: Check path resolution in chat.py. Add debug logging:

current_app.logger.info(f"Looking for file: {filename}")
current_app.logger.info(f"Upload folder: {current_app.config['UPLOAD_FOLDER']}")

Out of Memory (OOM)

Problem: GPU runs out of VRAM.

Solution: Enable cleanup in pipeline:

# After each model inference
import gc
import torch

del model_instance
gc.collect()
torch.cuda.empty_cache()

Models Not Loading

Problem: Pipeline fails to initialize.

Solution: Check status endpoint:

curl http://localhost:15200/api/status

Verify model paths in config files and ensure weights are downloaded.


📊 Performance

Inference Benchmarks

Task Model Resolution Latency (GPU) VRAM
Caption Qwen2-VL-7B 1024×1024 ~2.5s 6GB
Detection InternImage-L 1024×1024 ~1.8s 4GB
Segmentation SAM ViT-H 1024×1024 ~0.8s 3GB
VQA (Fact) Qwen2-VL-7B 1024×1024 ~2.2s 6GB
Full Pipeline - 1024×1024 ~8-10s 8GB peak

Tested on NVIDIA RTX 3090 (24GB VRAM)

Optimization Tips

  1. Use smaller models for development:

    • Qwen2-VL-2B instead of 7B
    • SAM ViT-B instead of ViT-H
  2. Enable quantization:

    model = AutoModel.from_pretrained(
        "Qwen/Qwen2-VL-7B-Instruct",
        load_in_8bit=True
    )
  3. Batch processing (for evaluation):

    results = pipeline.batch_process(image_list, batch_size=4)

🚢 Deployment

Docker

Build and run with Docker:

docker build -t geonli-backend .
docker run -p 15200:15200 -v $(pwd)/data:/app/data geonli-backend

Modal Cloud

Deploy to Modal:

modal deploy modal_deployment.py

Benefits:

  • Auto-scaling
  • GPU on-demand
  • Built-in load balancing
  • Zero DevOps

Production Checklist

  • Set FLASK_ENV=production
  • Disable debug mode
  • Configure CORS whitelist
  • Set up HTTPS (nginx/Let's Encrypt)
  • Enable request rate limiting
  • Add authentication middleware
  • Set up monitoring (Prometheus/Grafana)
  • Configure log rotation
  • Database for session persistence (optional)

🧪 Testing

Run API tests:

python -m pytest tests/test_api.py -v

Test individual endpoints:

# Upload
curl -X POST -F "file=@test_image.jpg" http://localhost:15200/api/upload

# Chat
curl -X POST http://localhost:15200/api/chat \
  -H "Content-Type: application/json" \
  -d '{"text":"Describe the image","image_url":"http://localhost:15200/uploads/test.jpg"}'

# Status
curl http://localhost:15200/api/status

📄 License

Distributed under the MIT License. See LICENSE for more information.


Back to Top ⬆️

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors