High-performance Flask API for geometric natural language interpretation over satellite imagery
geoNLI Backend is the AI-powered REST API that processes satellite imagery through a sophisticated multi-model pipeline to deliver:
- 📝 Dense image captioning via Qwen2-VL
- 🎯 Object detection & grounding with OWLv2
- 🔄 Oriented bounding box (OBB) generation using rotated detectors
- 📐 Geometric reasoning (areas, distances, spatial relationships)
- 💬 Visual question answering with fact-augmented context
Built on Flask with CORS support, it's designed for seamless integration with the geoNLI frontend and supports both local deployment and cloud hosting (Modal).
- Captioning: Qwen2-VL-7B
- Grounding: OWLv2-style open-vocabulary detection
- Segmentation: SAM-based mask generation
- VQA: Fact-augmented question answering with geometric context using Qwen2-VL-7B
/api/upload- Image upload with UUID generation/api/chat- Natural language query processing/api/status- Model loading status monitoring/api/evaluation- Batch evaluation endpoints/uploads/<filename>- Static file serving
- CORS enabled for frontend integration
- Robust path resolution for Docker/Modal environments
- 50MB file size limit
- Automatic upload folder creation
- Comprehensive error handling & logging
- Singleton model loading (once at startup)
- Efficient memory management for GPU constraints
- Agent-based query routing for intelligent task dispatch
webapp/
├── 📄 app.py # Flask application entrypoint
├── 📄 __init__.py # App factory & configuration
│
├── 📁 api/ # REST API blueprints
│ ├── upload.py # POST /api/upload - Image upload handle
| |
│ ├── chat.py # POST /api/chat - Chat query processor
│ ├── status.py # GET /api/status - Model status check
│ ├── evaluation.py # POST /api/evaluation - Batch eval
│ ├── caption.py # Direct captioning endpoint
│ ├── grounding.py # Direct grounding endpoint
│ ├── vqa.py # Direct VQA endpoint
│ ├── general.py # General-purpose routes
│ └── __init__.py
│
├── 📁 utils/ # Helper utilities
│ ├── file_handler.py # File validation & processing
│ ├── response_formatter.py # JSON response standardization
│ ├── validator.py # Input validation logic
│ └── __init__.py
│
└── 📁 templates/ # (Optional) HTML templates for web UI
The backend delegates heavy ML processing to src/pipeline/:
src/
├── pipeline/
│ ├── inference_pipeline.py # GeoNLIPipeline orchestrator
│ └── agent.py # GeoAgent (query router)
│
├── models/
│ ├── captioning/ # Qwen2-VL / InternVL2 wrappers
│ ├── grounding/ # Detection model wrappers
│ └── vqa/ # VQA engine implementations
│
└── tools/
├── manager.py # Shared model singleton manager
└── ...
- Python 3.10+
- CUDA 11.8+ (for GPU acceleration)
- 8GB+ VRAM (or 16GB+ for full pipeline)
- Node.js 18+ (for frontend, optional)
-
Clone the repository:
git clone https://github.com/Leviethal/Inter-IIT-14.0.git cd Inter-IIT-14.0 -
Create virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Download model weights (if not using Modal):
# Qwen2-VL huggingface-cli download Qwen/Qwen2-VL-7B-Instruct # InternImage (for detection) # Follow InternImage repo instructions
-
Configure paths:
Edit
config/inference_config.yamlto point to your model checkpoints.
Start the Flask server:
python webapp/app.pyThe API will be available at:
http://localhost:15200
Deploy to Modal cloud:
modal deploy modal_deployment.pyYour API will be accessible at:
https://<username>--geonli-backend-flask-app-dev.modal.run
Endpoint: POST /api/upload
Description: Upload a satellite image for processing.
Request:
POST /api/upload
Content-Type: multipart/form-data
file: <image file>Response:
{
"message": "Upload successful",
"file_id": "a1b2c3d4e5f6.jpg",
"url": "/uploads/a1b2c3d4e5f6.jpg",
"local_path": "/path/to/data/uploads/a1b2c3d4e5f6.jpg"
}Supported Formats: PNG, JPG, JPEG, TIF, TIFF, WEBP
Max File Size: 50MB
Endpoint: POST /api/chat
Description: Process a natural language query about an uploaded image.
Request:
{
"text": "How many buildings are visible?",
"image_url": "http://localhost:15200/uploads/a1b2c3d4e5f6.jpg",
"session_id": "user_session_123"
}Response:
{
"reply": "There are 5 buildings visible in the image.",
"grounding": [
{
"bbox": [512, 384, 120, 80, 0.45],
"score": 0.92,
"label": "building",
"object_id": "0"
}
],
"display_mode": "box",
"debug_intent": "counting",
"raw_results": {
"connections": [
{
"from": 0,
"to": 3,
"label": "150 m"
}
]
}
}Query Types:
- Caption: "Describe the image"
- Grounding: "Locate all houses"
- Counting: "How many trees?"
- Measurement: "What is the distance between A and B?"
- Comparison: "Which building is larger?"
Endpoint: GET /api/status
Description: Check model loading status.
Response:
{
"status": "ready",
"models_loaded": {
"vqa": true,
"detector": true,
"segmenter": true,
"parser": true
},
"memory_usage_mb": 4832
}Endpoint: GET /uploads/<filename>
Description: Serve uploaded images for frontend display.
Example:
http://localhost:15200/uploads/a1b2c3d4e5f6.jpg
Initializes the Flask app and loads the ML pipeline on startup:
from webapp import create_app
from src.pipeline.inference_pipeline import GeoNLIPipeline
app = create_app()
app.pipeline = GeoNLIPipeline()
if __name__ == "__main__":
app.logger.info("Loading ML Models...")
app.pipeline.load_models()
app.run(host="0.0.0.0", port=15200)Creates and configures the Flask application:
def create_app():
app = Flask(__name__)
# Enable CORS for frontend
CORS(app, resources={r"/*": {"origins": "*"}})
# Configure uploads
app.config['UPLOAD_FOLDER'] = 'data/uploads'
app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024
# Register blueprints
from webapp.api.upload import upload_bp
from webapp.api.chat import chat_bp
app.register_blueprint(upload_bp)
app.register_blueprint(chat_bp)
return appHandles image uploads with UUID-based naming:
@upload_bp.route('/api/upload', methods=['POST'])
def upload_file():
file = request.files['file']
if file and allowed_file(file.filename):
ext = file.filename.rsplit('.', 1)[1].lower()
unique_name = f"{uuid.uuid4().hex}.{ext}"
save_path = os.path.join(current_app.config['UPLOAD_FOLDER'], unique_name)
file.save(save_path)
return jsonify({
"file_id": unique_name,
"url": f"/uploads/{unique_name}"
}), 201Routes queries through the GeoAgent with robust path resolution:
@chat_bp.route('/api/chat', methods=['POST'])
def chat_endpoint():
data = request.get_json()
user_text = data.get("text")
image_url = data.get("image_url")
# Resolve image path (handles Docker/Modal environments)
filename = os.path.basename(image_url)
local_path = find_uploaded_file(filename)
# Process query through agent
result = agent.process_query(session_id, local_path, user_text)
return jsonify(result), 200Path Resolution Strategy:
- Check configured
UPLOAD_FOLDER - Try Docker path
/root/data/uploads/ - Check relative
data/uploads/ - Fallback to
uploads/ - Use URL as absolute path if exists
# Flask settings
FLASK_ENV=production
FLASK_DEBUG=0
# Model paths
MODEL_CACHE_DIR=/path/to/models
QWEN_MODEL_PATH=/path/to/Qwen2-VL-7B-Instruct
# Upload settings
UPLOAD_FOLDER=data/uploads
MAX_FILE_SIZE_MB=50
# Server settings
HOST=0.0.0.0
PORT=15200Edit config/inference_config.yaml:
models:
vqa:
model_name: "Qwen/Qwen2-VL-7B-Instruct"
device: "cuda:0"
detector:
config_path: "configs/internimage_det.py"
checkpoint: "checkpoints/internimage_l.pth"
segmenter:
model_type: "vit_h"
checkpoint: "checkpoints/sam_vit_h.pth"
pipeline:
batch_size: 1
enable_caching: true
cleanup_memory: trueProblem: Frontend can't access backend API.
Solution: Ensure CORS is enabled in __init__.py:
from flask_cors import CORS
CORS(app, resources={r"/*": {"origins": "*"}})For production, restrict origins:
CORS(app, resources={r"/*": {"origins": ["https://your-frontend.com"]}})Problem: Uploaded images return 404.
Solution: Check path resolution in chat.py. Add debug logging:
current_app.logger.info(f"Looking for file: {filename}")
current_app.logger.info(f"Upload folder: {current_app.config['UPLOAD_FOLDER']}")Problem: GPU runs out of VRAM.
Solution: Enable cleanup in pipeline:
# After each model inference
import gc
import torch
del model_instance
gc.collect()
torch.cuda.empty_cache()Problem: Pipeline fails to initialize.
Solution: Check status endpoint:
curl http://localhost:15200/api/statusVerify model paths in config files and ensure weights are downloaded.
| Task | Model | Resolution | Latency (GPU) | VRAM |
|---|---|---|---|---|
| Caption | Qwen2-VL-7B | 1024×1024 | ~2.5s | 6GB |
| Detection | InternImage-L | 1024×1024 | ~1.8s | 4GB |
| Segmentation | SAM ViT-H | 1024×1024 | ~0.8s | 3GB |
| VQA (Fact) | Qwen2-VL-7B | 1024×1024 | ~2.2s | 6GB |
| Full Pipeline | - | 1024×1024 | ~8-10s | 8GB peak |
Tested on NVIDIA RTX 3090 (24GB VRAM)
-
Use smaller models for development:
- Qwen2-VL-2B instead of 7B
- SAM ViT-B instead of ViT-H
-
Enable quantization:
model = AutoModel.from_pretrained( "Qwen/Qwen2-VL-7B-Instruct", load_in_8bit=True )
-
Batch processing (for evaluation):
results = pipeline.batch_process(image_list, batch_size=4)
Build and run with Docker:
docker build -t geonli-backend .
docker run -p 15200:15200 -v $(pwd)/data:/app/data geonli-backendDeploy to Modal:
modal deploy modal_deployment.pyBenefits:
- Auto-scaling
- GPU on-demand
- Built-in load balancing
- Zero DevOps
- Set
FLASK_ENV=production - Disable debug mode
- Configure CORS whitelist
- Set up HTTPS (nginx/Let's Encrypt)
- Enable request rate limiting
- Add authentication middleware
- Set up monitoring (Prometheus/Grafana)
- Configure log rotation
- Database for session persistence (optional)
Run API tests:
python -m pytest tests/test_api.py -vTest individual endpoints:
# Upload
curl -X POST -F "file=@test_image.jpg" http://localhost:15200/api/upload
# Chat
curl -X POST http://localhost:15200/api/chat \
-H "Content-Type: application/json" \
-d '{"text":"Describe the image","image_url":"http://localhost:15200/uploads/test.jpg"}'
# Status
curl http://localhost:15200/api/statusDistributed under the MIT License. See LICENSE for more information.