File size: 2,132 Bytes
8eb5b98 3d5ddbc 8eb5b98 3d5ddbc 8eb5b98 04c3104 8eb5b98 3d5ddbc 8eb5b98 3d5ddbc 8eb5b98 3d5ddbc 8eb5b98 3d5ddbc 8eb5b98 3d5ddbc 8eb5b98 3d5ddbc 8eb5b98 3d5ddbc 8eb5b98 3d5ddbc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
library_name: transformers
tags:
- image-geolocation
- geolocation
- geography
- geoguessr
- multi-modal
license: cc-by-nc-4.0
language:
- en
base_model: openai/clip-vit-large-patch14-336
pipeline_tag: zero-shot-image-classification
---
# Model Card for Thesis-CLIP-geoloc-continent
CLIP-ViT model fine-tuned for image geolocation. Optimized for queries at continent-level.
## Model Details
### Model Description
- **Developed by:** [jrheiner](https://huggingface.co/jrheiner)
<!-- - **Funded by [optional]:** [More Information Needed] -->
<!-- - **Shared by [optional]:** [More Information Needed] -->
- **Model type:** CLIP-ViT
- **Language(s) (NLP):** English
- **License:** Creative Commons Attribution Non Commercial 4.0
- **Finetuned from model: [openai/clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)**
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/jrheiner/thesis-appendix
<!-- - **Paper:** [More Information Needed] -->
- **Demo:** [Image Geolocation Demo Space](https://huggingface.co/spaces/jrheiner/thesis-demo)
## How to Get Started with the Model
```python
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("jrheiner/thesis-clip-geoloc-continent")
processor = CLIPProcessor.from_pretrained("jrheiner/thesis-clip-geoloc-continent")
url = "https://huggingface.co/spaces/jrheiner/thesis-demo/resolve/main/kerger-test-images/Oceania_Australia_-32.947127313081_151.47903359833_kerger.jpg"
image = Image.open(requests.get(url, stream=True).raw)
choices = ["North America", "Africa", "Asia", "Oceania", "South America", "Europe"]
inputs = processor(text=choices, images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
```
## Training Details
The model was fine-tuned on 177 270 images (29 545 per continent) sourced from Mapillary.
|