metadata
pipeline_tag: image-text-to-text
This repository contains the Elva-Phi3-3.8B model presented in On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning.
pipeline_tag: image-text-to-text
This repository contains the Elva-Phi3-3.8B model presented in On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning.