Multimodal Models Collection Multimodal models with leading performance. • 17 items • Updated 1 day ago • 28
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 70
Visual Document Retrieval Collection A collection of models, datasets, and spaces in the VDR series • 5 items • Updated 8 days ago • 8
jina-embeddings-v3 Collection Multilingual multi-task general text embedding model • 6 items • Updated Sep 19, 2024 • 20
LLaVa-NeXT-Video Collection LLaVa-NeXT-Video extends LLaVa-NeXT for video understanding. • 5 items • Updated Jun 10, 2024 • 9
LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 10 items • Updated 10 days ago • 51
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated 10 days ago • 547
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction Paper • 2411.17835 • Published Nov 19, 2024 • 3
Jina Reranker v2 Collection A collection of state-of-the-art multilingual neural rerankers • 1 item • Updated Sep 17, 2024 • 8
Jina Reader-LM Collection Convert HTML content to LLM-friendly Markdown/JSON content • 3 items • Updated 2 days ago • 7
jina-embeddings-v3: Multilingual Embeddings With Task LoRA Paper • 2409.10173 • Published Sep 16, 2024 • 29
SmolVLM Collection State-of-the-art compact VLMs for on-device applications: Base, Synthetic, and Instruct • 5 items • Updated 27 days ago • 31