metadata

library_name: transformers
tags:
  - image-geolocation
  - geolocation
  - geography
  - geoguessr
  - multi-modal
license: cc-by-nc-4.0
language:
  - en
base_model: openai/clip-vit-large-patch14-336
pipeline_tag: zero-shot-image-classification

Model Card for Thesis-CLIP-geoloc-continent

CLIP-ViT model fine-tuned for image geolocation. Optimized for queries at continent-level.

Model Details

Model Description

Developed by: jrheiner
Model type: CLIP-ViT
Language(s) (NLP): English
License: Creative Commons Attribution Non Commercial 4.0
Finetuned from model: openai/clip-vit-large-patch14-336

Model Sources

Repository: https://github.com/jrheiner/thesis-appendix
Demo: Image Geolocation Demo Space

How to Get Started with the Model

from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("jrheiner/thesis-clip-geoloc-continent")
processor = CLIPProcessor.from_pretrained("jrheiner/thesis-clip-geoloc-continent")

url = "https://huggingface.co/spaces/jrheiner/thesis-demo/resolve/main/kerger-test-images/Oceania_Australia_-32.947127313081_151.47903359833_kerger.jpg"
image = Image.open(requests.get(url, stream=True).raw)
choices = ["North America", "Africa", "Asia", "Oceania", "South America", "Europe"]
inputs = processor(text=choices, images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities

Training Details

The model was fine-tuned on 177 270 images (29 545 per continent) sourced from Mapillary.