Image-Text-to-Text
Transformers
Safetensors
llava_gemma
text-generation
multimodal
llava
gemma
visual-instruction-tuning
llm
vision-language-model
instruction-tuned
clip
llama-3-1
phi-4
siglip
siglip2
conversational
Instructions to use aimagelab/LLaVA_MORE-gemma_2_9b-finetuning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aimagelab/LLaVA_MORE-gemma_2_9b-finetuning with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="aimagelab/LLaVA_MORE-gemma_2_9b-finetuning") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("aimagelab/LLaVA_MORE-gemma_2_9b-finetuning", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use aimagelab/LLaVA_MORE-gemma_2_9b-finetuning with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "aimagelab/LLaVA_MORE-gemma_2_9b-finetuning" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aimagelab/LLaVA_MORE-gemma_2_9b-finetuning", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/aimagelab/LLaVA_MORE-gemma_2_9b-finetuning
- SGLang
How to use aimagelab/LLaVA_MORE-gemma_2_9b-finetuning with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "aimagelab/LLaVA_MORE-gemma_2_9b-finetuning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aimagelab/LLaVA_MORE-gemma_2_9b-finetuning", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "aimagelab/LLaVA_MORE-gemma_2_9b-finetuning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aimagelab/LLaVA_MORE-gemma_2_9b-finetuning", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use aimagelab/LLaVA_MORE-gemma_2_9b-finetuning with Docker Model Runner:
docker model run hf.co/aimagelab/LLaVA_MORE-gemma_2_9b-finetuning
Improve model card for LLaVA_MORE-gemma_2_9b-finetuning
#1
by nielsr HF Staff - opened
This PR significantly enhances the model card for aimagelab/LLaVA_MORE-gemma_2_9b-finetuning by providing comprehensive details and improving discoverability.
Key updates include:
- Metadata: Added
pipeline_tag: image-text-to-text, a comprehensive set oftags,base_model(google/gemma-2-9b-it), anddatasets(liuhaotian/LLaVA-Pretrain,liuhaotian/LLaVA-Instruct-150K) for better categorization and searchability. Thelicenseis explicitly set to Apache 2.0. - Model Details & Description: Replaced placeholder content with a detailed overview of the LLaVA-MORE family, including its purpose, LLM and visual backbone variations explored, and specific details for this model variant (Gemma-2 9B + CLIP).
- Model Sources: Added direct links to the paper on Hugging Face Papers, the official GitHub repository, the project page, the Hugging Face collection, and a general Hugging Face Space demo.
- Usage Example: Included a ready-to-use Python code snippet for inference, specifically tailored for this model variant, with guidance on handling out-of-memory issues.
- Performance Benchmarks: Integrated the detailed performance table and plot from the original GitHub repository, showcasing the model's evaluation results.
- Training Details: Provided information on the two-stage training process, including datasets and procedure.
- Checkpoints: Included the full table of all LLaVA-MORE checkpoints with their respective Hugging Face links.
- Latest Updates: Included the "Latest Updates" section from the GitHub repository to keep users informed about project milestones.
- Citation & Acknowledgments: Ensured the BibTeX citation is present and retained the acknowledgments section.
This update aims to make the model card a much richer, more informative, and user-friendly resource on the Hugging Face Hub.
fede97 changed pull request status to merged