🚀 geoNLI Backend

High-performance Flask API for geometric natural language interpretation over satellite imagery

🎯 Overview

geoNLI Backend is the AI-powered REST API that processes satellite imagery through a sophisticated multi-model pipeline to deliver:

📝 Dense image captioning via Qwen2-VL
🎯 Object detection & grounding with OWLv2
🔄 Oriented bounding box (OBB) generation using rotated detectors
📐 Geometric reasoning (areas, distances, spatial relationships)
💬 Visual question answering with fact-augmented context

Built on Flask with CORS support, it's designed for seamless integration with the geoNLI frontend and supports both local deployment and cloud hosting (Modal).

✨ Key Features

🧠 Multi-Model AI Pipeline

Captioning: Qwen2-VL-7B
Grounding: OWLv2-style open-vocabulary detection
Segmentation: SAM-based mask generation
VQA: Fact-augmented question answering with geometric context using Qwen2-VL-7B

🌐 RESTful API

/api/upload - Image upload with UUID generation
/api/chat - Natural language query processing
/api/status - Model loading status monitoring
/api/evaluation - Batch evaluation endpoints
/uploads/<filename> - Static file serving

🔧 Production-Ready

CORS enabled for frontend integration
Robust path resolution for Docker/Modal environments
50MB file size limit
Automatic upload folder creation
Comprehensive error handling & logging

⚡ Performance Optimizations

Singleton model loading (once at startup)
Efficient memory management for GPU constraints
Agent-based query routing for intelligent task dispatch

🏗️ Architecture

Directory Structure

webapp/
├── 📄 app.py                       # Flask application entrypoint
├── 📄 __init__.py                  # App factory & configuration
│
├── 📁 api/                         # REST API blueprints
│   ├── upload.py                   # POST /api/upload - Image upload handle
|   |
│   ├── chat.py                     # POST /api/chat - Chat query processor
│   ├── status.py                   # GET /api/status - Model status check
│   ├── evaluation.py               # POST /api/evaluation - Batch eval
│   ├── caption.py                  #  Direct captioning endpoint
│   ├── grounding.py                #  Direct grounding endpoint
│   ├── vqa.py                      #  Direct VQA endpoint
│   ├── general.py                  #  General-purpose routes
│   └── __init__.py
│
├── 📁 utils/                       # Helper utilities
│   ├── file_handler.py             # File validation & processing
│   ├── response_formatter.py       # JSON response standardization
│   ├── validator.py                # Input validation logic
│   └── __init__.py
│
└── 📁 templates/                   # (Optional) HTML templates for web UI

Core Pipeline Integration

The backend delegates heavy ML processing to src/pipeline/:

src/
├── pipeline/
│   ├── inference_pipeline.py       # GeoNLIPipeline orchestrator
│   └── agent.py                    # GeoAgent (query router)
│
├── models/
│   ├── captioning/                 # Qwen2-VL / InternVL2 wrappers
│   ├── grounding/                  # Detection model wrappers
│   └── vqa/                        # VQA engine implementations
│
└── tools/
    ├── manager.py                  # Shared model singleton manager
    └── ...

🚀 Getting Started

Prerequisites

Python 3.10+
CUDA 11.8+ (for GPU acceleration)
8GB+ VRAM (or 16GB+ for full pipeline)
Node.js 18+ (for frontend, optional)

Installation

Clone the repository:

git clone https://github.com/Leviethal/Inter-IIT-14.0.git
cd Inter-IIT-14.0

Create virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Download model weights (if not using Modal):

# Qwen2-VL
huggingface-cli download Qwen/Qwen2-VL-7B-Instruct

# InternImage (for detection)
# Follow InternImage repo instructions

Configure paths:

Edit config/inference_config.yaml to point to your model checkpoints.

🎮 Usage

Local Development

Start the Flask server:

python webapp/app.py

The API will be available at:

http://localhost:15200

Production (Modal)

Deploy to Modal cloud:

modal deploy modal_deployment.py

Your API will be accessible at:

https://<username>--geonli-backend-flask-app-dev.modal.run

🔌 API Reference

1. Upload Image

Endpoint: POST /api/upload

Description: Upload a satellite image for processing.

Request:

POST /api/upload
Content-Type: multipart/form-data

file: <image file>

Response:

{
  "message": "Upload successful",
  "file_id": "a1b2c3d4e5f6.jpg",
  "url": "/uploads/a1b2c3d4e5f6.jpg",
  "local_path": "/path/to/data/uploads/a1b2c3d4e5f6.jpg"
}

Supported Formats: PNG, JPG, JPEG, TIF, TIFF, WEBP
Max File Size: 50MB

2. Chat Query

Endpoint: POST /api/chat

Description: Process a natural language query about an uploaded image.

Request:

{
  "text": "How many buildings are visible?",
  "image_url": "http://localhost:15200/uploads/a1b2c3d4e5f6.jpg",
  "session_id": "user_session_123"
}

Response:

{
  "reply": "There are 5 buildings visible in the image.",
  "grounding": [
    {
      "bbox": [512, 384, 120, 80, 0.45],
      "score": 0.92,
      "label": "building",
      "object_id": "0"
    }
  ],
  "display_mode": "box",
  "debug_intent": "counting",
  "raw_results": {
    "connections": [
      {
        "from": 0,
        "to": 3,
        "label": "150 m"
      }
    ]
  }
}

Query Types:

Caption: "Describe the image"
Grounding: "Locate all houses"
Counting: "How many trees?"
Measurement: "What is the distance between A and B?"
Comparison: "Which building is larger?"

3. Status Check

Endpoint: GET /api/status

Description: Check model loading status.

Response:

{
  "status": "ready",
  "models_loaded": {
    "vqa": true,
    "detector": true,
    "segmenter": true,
    "parser": true
  },
  "memory_usage_mb": 4832
}

4. Serve Uploaded Files

Endpoint: GET /uploads/<filename>

Description: Serve uploaded images for frontend display.

Example:

http://localhost:15200/uploads/a1b2c3d4e5f6.jpg

🧩 Key Components

`app.py` - Application Entrypoint

Initializes the Flask app and loads the ML pipeline on startup:

from webapp import create_app
from src.pipeline.inference_pipeline import GeoNLIPipeline

app = create_app()
app.pipeline = GeoNLIPipeline()

if __name__ == "__main__":
    app.logger.info("Loading ML Models...")
    app.pipeline.load_models()
    app.run(host="0.0.0.0", port=15200)

`init.py` - App Factory

Creates and configures the Flask application:

def create_app():
    app = Flask(__name__)
    
    # Enable CORS for frontend
    CORS(app, resources={r"/*": {"origins": "*"}})
    
    # Configure uploads
    app.config['UPLOAD_FOLDER'] = 'data/uploads'
    app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024
    
    # Register blueprints
    from webapp.api.upload import upload_bp
    from webapp.api.chat import chat_bp
    app.register_blueprint(upload_bp)
    app.register_blueprint(chat_bp)
    
    return app

`api/upload.py` - File Upload Handler

Handles image uploads with UUID-based naming:

@upload_bp.route('/api/upload', methods=['POST'])
def upload_file():
    file = request.files['file']
    
    if file and allowed_file(file.filename):
        ext = file.filename.rsplit('.', 1)[1].lower()
        unique_name = f"{uuid.uuid4().hex}.{ext}"
        save_path = os.path.join(current_app.config['UPLOAD_FOLDER'], unique_name)
        
        file.save(save_path)
        
        return jsonify({
            "file_id": unique_name,
            "url": f"/uploads/{unique_name}"
        }), 201

`api/chat.py` - Query Processor

Routes queries through the GeoAgent with robust path resolution:

@chat_bp.route('/api/chat', methods=['POST'])
def chat_endpoint():
    data = request.get_json()
    
    user_text = data.get("text")
    image_url = data.get("image_url")
    
    # Resolve image path (handles Docker/Modal environments)
    filename = os.path.basename(image_url)
    local_path = find_uploaded_file(filename)
    
    # Process query through agent
    result = agent.process_query(session_id, local_path, user_text)
    
    return jsonify(result), 200

Path Resolution Strategy:

Check configured UPLOAD_FOLDER
Try Docker path /root/data/uploads/
Check relative data/uploads/
Fallback to uploads/
Use URL as absolute path if exists

🔧 Configuration

Environment Variables

# Flask settings
FLASK_ENV=production
FLASK_DEBUG=0

# Model paths
MODEL_CACHE_DIR=/path/to/models
QWEN_MODEL_PATH=/path/to/Qwen2-VL-7B-Instruct

# Upload settings
UPLOAD_FOLDER=data/uploads
MAX_FILE_SIZE_MB=50

# Server settings
HOST=0.0.0.0
PORT=15200

YAML Configuration

Edit config/inference_config.yaml:

models:
  vqa:
    model_name: "Qwen/Qwen2-VL-7B-Instruct"
    device: "cuda:0"
  
  detector:
    config_path: "configs/internimage_det.py"
    checkpoint: "checkpoints/internimage_l.pth"
  
  segmenter:
    model_type: "vit_h"
    checkpoint: "checkpoints/sam_vit_h.pth"

pipeline:
  batch_size: 1
  enable_caching: true
  cleanup_memory: true

🐛 Troubleshooting

CORS Errors

Problem: Frontend can't access backend API.

Solution: Ensure CORS is enabled in __init__.py:

from flask_cors import CORS
CORS(app, resources={r"/*": {"origins": "*"}})

For production, restrict origins:

CORS(app, resources={r"/*": {"origins": ["https://your-frontend.com"]}})

File Not Found (404)

Problem: Uploaded images return 404.

Solution: Check path resolution in chat.py. Add debug logging:

current_app.logger.info(f"Looking for file: {filename}")
current_app.logger.info(f"Upload folder: {current_app.config['UPLOAD_FOLDER']}")

Out of Memory (OOM)

Problem: GPU runs out of VRAM.

Solution: Enable cleanup in pipeline:

# After each model inference
import gc
import torch

del model_instance
gc.collect()
torch.cuda.empty_cache()

Models Not Loading

Problem: Pipeline fails to initialize.

Solution: Check status endpoint:

curl http://localhost:15200/api/status

Verify model paths in config files and ensure weights are downloaded.

📊 Performance

Inference Benchmarks

Task	Model	Resolution	Latency (GPU)	VRAM
Caption	Qwen2-VL-7B	1024×1024	~2.5s	6GB
Detection	InternImage-L	1024×1024	~1.8s	4GB
Segmentation	SAM ViT-H	1024×1024	~0.8s	3GB
VQA (Fact)	Qwen2-VL-7B	1024×1024	~2.2s	6GB
Full Pipeline	-	1024×1024	~8-10s	8GB peak

Tested on NVIDIA RTX 3090 (24GB VRAM)

Optimization Tips

Use smaller models for development:
- Qwen2-VL-2B instead of 7B
- SAM ViT-B instead of ViT-H

Enable quantization:

model = AutoModel.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct",
    load_in_8bit=True
)

Batch processing (for evaluation):

results = pipeline.batch_process(image_list, batch_size=4)

🚢 Deployment

Docker

Build and run with Docker:

docker build -t geonli-backend .
docker run -p 15200:15200 -v $(pwd)/data:/app/data geonli-backend

Modal Cloud

Deploy to Modal:

modal deploy modal_deployment.py

Benefits:

Auto-scaling
GPU on-demand
Built-in load balancing
Zero DevOps

Production Checklist

Set FLASK_ENV=production
Disable debug mode
Configure CORS whitelist
Set up HTTPS (nginx/Let's Encrypt)
Enable request rate limiting
Add authentication middleware
Set up monitoring (Prometheus/Grafana)
Configure log rotation
Database for session persistence (optional)

🧪 Testing

Run API tests:

python -m pytest tests/test_api.py -v

Test individual endpoints:

# Upload
curl -X POST -F "file=@test_image.jpg" http://localhost:15200/api/upload

# Chat
curl -X POST http://localhost:15200/api/chat \
  -H "Content-Type: application/json" \
  -d '{"text":"Describe the image","image_url":"http://localhost:15200/uploads/test.jpg"}'

# Status
curl http://localhost:15200/api/status

📄 License

Distributed under the MIT License. See LICENSE for more information.

Back to Top ⬆️

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
InternImage		InternImage
config		config
deployment		deployment
docs		docs
evaluation		evaluation
geonli-frontend		geonli-frontend
models		models
modules		modules
notebooks		notebooks
scripts		scripts
src		src
tests		tests
training		training
webapp		webapp
.gitignore		.gitignore
README.md		README.md
a.py		a.py
config.yaml		config.yaml
environment.yml		environment.yml
generic_response.json		generic_response.json
main.py		main.py
modal_deploy.py		modal_deploy.py
requirements.txt		requirements.txt
setup.py		setup.py
test_config.py		test_config.py
test_satellite.jpg		test_satellite.jpg

Folders and files

Latest commit

History

Repository files navigation

🚀 geoNLI Backend

🎯 Overview

✨ Key Features

🧠 Multi-Model AI Pipeline

🌐 RESTful API

🔧 Production-Ready

⚡ Performance Optimizations

🏗️ Architecture

Directory Structure

Core Pipeline Integration

🚀 Getting Started

Prerequisites

Installation

🎮 Usage

Local Development

Production (Modal)

🔌 API Reference

1. Upload Image

2. Chat Query

3. Status Check

4. Serve Uploaded Files

🧩 Key Components

app.py - Application Entrypoint

__init__.py - App Factory

api/upload.py - File Upload Handler

api/chat.py - Query Processor

🔧 Configuration

Environment Variables

YAML Configuration

🐛 Troubleshooting

CORS Errors

File Not Found (404)

Out of Memory (OOM)

Models Not Loading

📊 Performance

Inference Benchmarks

Optimization Tips

🚢 Deployment

Docker

Modal Cloud

Production Checklist

🧪 Testing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`app.py` - Application Entrypoint

`init.py` - App Factory

`api/upload.py` - File Upload Handler

`api/chat.py` - Query Processor

Packages