jrheiner
/

thesis-clip-geoloc-continent

Zero-Shot Image Classification

image-geolocation

Inference Endpoints

Model card Files Files and versions Community

thesis-clip-geoloc-continent / README.md

jrheiner's picture

Fix typo

04c3104 verified 4 months ago

|

2.13 kB

	---
	library_name: transformers
	tags:
	- image-geolocation
	- geolocation
	- geography
	- geoguessr
	- multi-modal
	license: cc-by-nc-4.0
	language:
	- en
	base_model: openai/clip-vit-large-patch14-336
	pipeline_tag: zero-shot-image-classification
	---

	# Model Card for Thesis-CLIP-geoloc-continent

	CLIP-ViT model fine-tuned for image geolocation. Optimized for queries at continent-level.


	## Model Details

	### Model Description


	- Developed by: [jrheiner](https://huggingface.co/jrheiner)
	<!-- - Funded by [optional]: [More Information Needed] -->
	<!-- - Shared by [optional]: [More Information Needed] -->
	- Model type: CLIP-ViT
	- Language(s) (NLP): English
	- License: Creative Commons Attribution Non Commercial 4.0
	- Finetuned from model: [openai/clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/jrheiner/thesis-appendix
	<!-- - Paper: [More Information Needed] -->
	- Demo: [Image Geolocation Demo Space](https://huggingface.co/spaces/jrheiner/thesis-demo)


	## How to Get Started with the Model

	```python
	from PIL import Image
	import requests
	from transformers import CLIPProcessor, CLIPModel

	model = CLIPModel.from_pretrained("jrheiner/thesis-clip-geoloc-continent")
	processor = CLIPProcessor.from_pretrained("jrheiner/thesis-clip-geoloc-continent")

	url = "https://huggingface.co/spaces/jrheiner/thesis-demo/resolve/main/kerger-test-images/Oceania_Australia_-32.947127313081_151.47903359833_kerger.jpg"
	image = Image.open(requests.get(url, stream=True).raw)
	choices = ["North America", "Africa", "Asia", "Oceania", "South America", "Europe"]
	inputs = processor(text=choices, images=image, return_tensors="pt", padding=True)
	outputs = model(**inputs)
	logits_per_image = outputs.logits_per_image # this is the image-text similarity score
	probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
	```

	## Training Details

	The model was fine-tuned on 177 270 images (29 545 per continent) sourced from Mapillary.