GeoScope is a deep learning framework for image geolocalization that employs CNN transfer learning and a CLIP-inspired vision-language model to classify street view images into geographic regions.
GeoScope is a deep learning framework for image geolocalization. The project leverages two complementary approaches:
-
CNN-Based Geolocalization: Utilizing CNN architectures such as ResNet-18 and WideResNet via transfer learning, GeoScope classifies images into discrete geographic regions (e.g., regions or continents) using geotagged street view images.
-
Lite StreetCLIP-Inspired Model: Adapting a lightweight version of the CLIP model, this approach leverages contrastive learning on image–caption pairs and synthetic captions derived from geographic labels. It enables robust zero-shot or few-shot predictions to improve generalization on unseen geographies.
GeoScope addresses the challenge of image geolocalization, a problem with critical applications in photo tagging, search, and open-source intelligence, by mapping images to predefined geographic regions. By integrating transfer learning and vision-language models, GeoScope aims to overcome data scarcity and improve robustness to distribution shifts.
- Dual-Model Approach: Combines CNN-based classification with CLIP-inspired zero-shot learning.
- Transfer Learning: Fine-tuning on geotagged datasets for robust feature extraction.
- Zero-Shot Generalization: Leverages pretrained embeddings to predict geographic regions for unseen data.
- Scalable & Diverse: Designed to work across various environments from urban to rural scenes.
- He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition.
- Radford, A., et al. (2021). Learning transferable visual models from natural language supervision.
- Haas, L., Alberti, S., & Skreta, M. (2023). Learning generalized zero-shot learners for open-domain image geolocalization.
- Weyand, T., Kostrikov, I., & Philbin, J. (2016). PlaNet - Photo Geolocation with Convolutional Neural Networks.