|
|
|
## Inference with transformers |
|
|
|
Please, install the in-progress development wheel from https://huggingface.co/nltpt/transformers/tree/main. |
|
|
|
This is an example inference snippet (API subject to change): |
|
|
|
```python |
|
import requests |
|
import torch |
|
from PIL import Image |
|
from transformers import MllamaForConditionalGeneration, AutoProcessor |
|
|
|
model_id = "nltpt/Llama-3.2-11B-Vision-Instruct" |
|
model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16) |
|
processor = AutoProcessor.from_pretrained(model_id) |
|
|
|
messages = [ |
|
{ |
|
"role": "user", |
|
"content": [ |
|
{"type": "image"}, |
|
{"type": "text", "text": "Describe image in two sentences"} |
|
] |
|
} |
|
] |
|
text = processor.apply_chat_template(messages, add_generation_prompt=True) |
|
|
|
url = "https://llava-vl.github.io/static/images/view.jpg" |
|
raw_image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
inputs = processor(text=text, images=raw_image, return_tensors="pt").to(model.device) |
|
output = model.generate(**inputs, do_sample=False, max_new_tokens=25) |
|
print(processor.decode(output[0])) |
|
``` |
|
|
|
Output: |
|
```text |
|
<|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
|
<|image|>Describe image in two sentences<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
The image depicts a serene lake scene, featuring a long wooden dock extending into the calm water, with a dense forest of trees |
|
``` |
|
|
|
## Running the original checkpoints |
|
The package installed will provide three binaries: |
|
|
|
1. example_chat_completion |
|
2. example_text_completion |
|
3. multimodal_example_chat_completion |
|
You can invoke them via torchrun by doing the following: |
|
``` |
|
CHECKPOINT_DIR=~/.llama/checkpoints/Llama-3.2-11B-Vision-Instruct/ |
|
|
|
torchrun `which multimodal_example_chat_completion` "$CHECKPOINT_DIR" |
|
``` |
|
You can study the code for the script by doing something like: |
|
``` |
|
PACKAGE_DIR=$(pip show -f llama-models | grep Location | awk '{ print $2 }') |
|
|
|
echo "Scripts are in the directory: $PACKAGE_DIR/llama-models/scripts/" |
|
``` |