|
--- |
|
library_name: transformers |
|
tags: |
|
- image-geolocation |
|
- geolocation |
|
- geography |
|
- geoguessr |
|
- multi-modal |
|
license: cc-by-nc-4.0 |
|
language: |
|
- en |
|
base_model: openai/clip-vit-large-patch14-336 |
|
pipeline_tag: zero-shot-image-classification |
|
--- |
|
|
|
# Model Card for Thesis-CLIP-geoloc-continent |
|
|
|
CLIP-ViT model fine-tuned for image geolocation. Optimized for queries at continent-level. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
|
|
- **Developed by:** [jrheiner](https://huggingface.co/jrheiner) |
|
<!-- - **Funded by [optional]:** [More Information Needed] --> |
|
<!-- - **Shared by [optional]:** [More Information Needed] --> |
|
- **Model type:** CLIP-ViT |
|
- **Language(s) (NLP):** English |
|
- **License:** Creative Commons Attribution Non Commercial 4.0 |
|
- **Finetuned from model: [openai/clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)** |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/jrheiner/thesis-appendix |
|
<!-- - **Paper:** [More Information Needed] --> |
|
- **Demo:** [Image Geolocation Demo Space](https://huggingface.co/spaces/jrheiner/thesis-demo) |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from PIL import Image |
|
import requests |
|
from transformers import CLIPProcessor, CLIPModel |
|
|
|
model = CLIPModel.from_pretrained("jrheiner/thesis-clip-geoloc-continent") |
|
processor = CLIPProcessor.from_pretrained("jrheiner/thesis-clip-geoloc-continent") |
|
|
|
url = "https://huggingface.co/spaces/jrheiner/thesis-demo/resolve/main/kerger-test-images/Oceania_Australia_-32.947127313081_151.47903359833_kerger.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
choices = ["North America", "Africa", "Asia", "Oceania", "South America", "Europe"] |
|
inputs = processor(text=choices, images=image, return_tensors="pt", padding=True) |
|
outputs = model(**inputs) |
|
logits_per_image = outputs.logits_per_image # this is the image-text similarity score |
|
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities |
|
``` |
|
|
|
## Training Details |
|
|
|
The model was fine-tuned on 177 270 images (29 545 per continent) sourced from Mapillary. |
|
|