OpenGVLab
/

InternViT-6B-224px

Image Feature Extraction

feature-extraction

Model card Files Files and versions Community

InternViT-6B-224px / README.md

czczup's picture

Update README.md

58b8706 12 months ago

|

1.2 kB

	---
	license: mit
	datasets:
	- laion/laion2B-en
	- laion/laion-coco
	- laion/laion2B-multi
	- kakaobrain/coyo-700m
	- conceptual_captions
	- wanng/wukong100m
	---

	# Model card for InternViT-6B-224px

	## Model Details
	- Model Type: feature backbone
	- Model Stats:
	- Params (M): 5903
	- Image size: 224 x 224
	- Papers:
	- InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
	- GitHub:
	- https://github.com/OpenGVLab/InternVL
	- Pretrain Dataset: LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi

	## Model Usage

	### Image Embeddings

	```python
	import torch
	from PIL import Image
	from transformers import AutoModel, CLIPImageProcessor

	model = AutoModel.from_pretrained(
	'OpenGVLab/InternViT-6B-224px',
	torch_dtype=torch.bfloat16,
	low_cpu_mem_usage=True,
	trust_remote_code=True).cuda().eval()

	image = Image.open('./examples/image1.jpg').convert('RGB')

	image_processor = CLIPImageProcessor.from_pretrained('OpenGVLab/InternViT-6B-224px')

	pixel_values = image_processor(images=image, return_tensors='pt').pixel_values
	pixel_values = pixel_values.to(torch.bfloat16).cuda()

	outputs = model(pixel_values)
	```