--- library_name: transformers tags: - image-geolocation - geolocation - geography - geoguessr - multi-modal license: cc-by-nc-4.0 language: - en base_model: openai/clip-vit-large-patch14-336 pipeline_tag: zero-shot-image-classification --- # Model Card for Thesis-CLIP-geoloc-continent CLIP-ViT model fine-tuned for image geolocation. Optimized for queries at continent-level. ## Model Details ### Model Description - **Developed by:** [jrheiner](https://huggingface.co/jrheiner) - **Model type:** CLIP-ViT - **Language(s) (NLP):** English - **License:** Creative Commons Attribution Non Commercial 4.0 - **Finetuned from model: [openai/clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)** ### Model Sources - **Repository:** https://github.com/jrheiner/thesis-appendix - **Demo:** [Image Geolocation Demo Space](https://huggingface.co/spaces/jrheiner/thesis-demo) ## How to Get Started with the Model ```python from PIL import Image import requests from transformers import CLIPProcessor, CLIPModel model = CLIPModel.from_pretrained("jrheiner/thesis-clip-geoloc-continent") processor = CLIPProcessor.from_pretrained("jrheiner/thesis-clip-geoloc-continent") url = "https://huggingface.co/spaces/jrheiner/thesis-demo/resolve/main/kerger-test-images/Oceania_Australia_-32.947127313081_151.47903359833_kerger.jpg" image = Image.open(requests.get(url, stream=True).raw) choices = ["North America", "Africa", "Asia", "Oceania", "South America", "Europe"] inputs = processor(text=choices, images=image, return_tensors="pt", padding=True) outputs = model(**inputs) logits_per_image = outputs.logits_per_image # this is the image-text similarity score probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities ``` ## Training Details The model was fine-tuned on 177 270 images (29 545 per continent) sourced from Mapillary.