royleibov
/

Llama-3.2-11B-Vision-Instruct-ZipNN-Compressed

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3.2-11B-Vision-Instruct-ZipNN-Compressed / README.md

qubvel-hf's picture

qubvel-hf HF staff

Upload folder using huggingface_hub

3a18199 verified about 2 months ago

|

2.04 kB


	## Inference with transformers

	Please, install the in-progress development wheel from https://huggingface.co/nltpt/transformers/tree/main.

	This is an example inference snippet (API subject to change):

	```python
	import requests
	import torch
	from PIL import Image
	from transformers import MllamaForConditionalGeneration, AutoProcessor

	model_id = "nltpt/Llama-3.2-11B-Vision-Instruct"
	model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
	processor = AutoProcessor.from_pretrained(model_id)

	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image"},
	{"type": "text", "text": "Describe image in two sentences"}
	]
	}
	]
	text = processor.apply_chat_template(messages, add_generation_prompt=True)

	url = "https://llava-vl.github.io/static/images/view.jpg"
	raw_image = Image.open(requests.get(url, stream=True).raw)

	inputs = processor(text=text, images=raw_image, return_tensors="pt").to(model.device)
	output = model.generate(**inputs, do_sample=False, max_new_tokens=25)
	print(processor.decode(output[0]))
	```

	Output:
	```text
	<\|begin_of_text\|><\|start_header_id\|>user<\|end_header_id\|>

	<\|image\|>Describe image in two sentences<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	The image depicts a serene lake scene, featuring a long wooden dock extending into the calm water, with a dense forest of trees
	```

	## Running the original checkpoints
	The package installed will provide three binaries:

	1. example_chat_completion
	2. example_text_completion
	3. multimodal_example_chat_completion
	You can invoke them via torchrun by doing the following:
	```
	CHECKPOINT_DIR=~/.llama/checkpoints/Llama-3.2-11B-Vision-Instruct/

	torchrun `which multimodal_example_chat_completion` "$CHECKPOINT_DIR"
	```
	You can study the code for the script by doing something like:
	```
	PACKAGE_DIR=$(pip show -f llama-models \| grep Location \| awk '{ print $2 }')

	echo "Scripts are in the directory: $PACKAGE_DIR/llama-models/scripts/"
	```