Safetensors
Italian
llava_next

Model Card for LLaVA-NDiNO_pt

Model description

LLaVA-NDiNO is a family of Large Vision Language Models (LVLMs) trained for the Italian language.

LLaVA-NDiNO_pt is a pre-trained model that has been trained over three different types of image-text data:

  • Wikipedia Image-Text Sections: Wikipedia image together with the text section in which the image appears
  • Wikipedia Image-Text Captions: Wikipedia image together with its caption
  • OCR PDF Documents: text in PDF documents extracted using Tesseract from MultiEurlex

If you are interested in more details regarding the training procedure, you can find the code we used at the following link:

  • Repository: https://github.com/swapUniba/LLaVA-NDiNO

  • Developed by: Elio Musacchio, Lucia Siciliani, Pierpaolo Basile, Giovanni Semeraro

  • Funded by: PNRR project FAIR - Future AI Research

  • Compute infrastructure: Leonardo supercomputer

  • Model type: LLaMA 3 + CLIP

  • Language(s) (NLP): Italian

  • License: Llama 3 Community License

Example usage

The model is not intended to be used without fine-tuning. It is recommended to further train it using the LLaVA-NeXT codebase.

Citation

@inproceedings{musacchioLLaVANDiNO,
  title={LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language},
  author={Musacchio, Elio and Siciliani, Lucia and Basile, Pierpaolo and Semeraro, Giovanni},
  booktitle={Proceedings of the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI 2024) co-located with 23th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2024)},
  year={2024}
}
Downloads last month
14
Safetensors
Model size
8.36B params
Tensor type
FP16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for swap-uniba/LLaVA-NDiNO_pt

Finetuned
(370)
this model

Datasets used to train swap-uniba/LLaVA-NDiNO_pt

Collection including swap-uniba/LLaVA-NDiNO_pt