MFM - Multimodal Foundation Models - a LeafInTheTree Collection

LeafInTheTree 's Collections

Speech-2-Speech

MFM - Multimodal Foundation Models

MFM - Multimodal Foundation Models

updated 15 days ago

Running

99

99

Idefics3

📊

Generate text based on an image and prompt
Running on Zero

134

134

VideoLLaMA2

🎥

Media understanding
Running on Zero

45

45

GroundingDINO ⚔ OWL

🦖

Identify objects in images using text queries
Running

79

79

Paligemma HF

🤗

Generate text and segment images using PaliGemma
Running on T4

312

312

PaliGemma Demo

🤲

Annotate and describe images with text prompts
Running on Zero

461

461

Florence2 + SAM2

🔥

Segment objects in images and videos using text prompts
Running on Zero

7

7

Florence 2 Vision Model V1

💻

Analyze images to caption, detect objects, extract text, and ground phrases
Sleeping

2

2

Marketing Vision

👁
Runtime error

2

2

Idefics3

📊
Running on Zero

7

7

Theia

⚡

Decode images to teacher model outputs
Running on Zero

15

15

XGen MM

💻

Generate detailed descriptions from images and questions
Sleeping

LLaMA 3.1 Vision

🦙
Running on Zero

78

78

Chameleon 30b

🔥

Generate descriptions for images using text prompts
Running

417

417

InternVL

⚡

Chat with an AI that understands text and images
Running on Zero

724

724

Florence 2

📉

Analyze images to generate captions, detect objects, or perform OCR
Running on Zero

214

214

Phi 3.5 Vision

🔥

Generate text from an image and question
Runtime error

888

888

MiniGPT-4

🚀
Running on Zero

38

38

Mistral Pixtral Demo

👀

Chat with Pixtral 12B using Mistral Inference
Running on Zero

317

317

Ovis1.6 Gemma2 9B

🐑

Chat with an AI that understands images and text
meta-llama/Llama-Guard-3-11B-Vision

Image-Text-to-Text • Updated Nov 18, 2024 • 1.21k • 53
Running on Zero

106

106

Molmo 7B D 0924

👁
Running

72

72

Owlv2

👀

State-of-the-art Zero-shot Object Detection
Running on Zero

384

384

Llama-Vision-11B

🚀

Chat about images by uploading them and typing questions
Running on Zero

121

121

SmolVLM

📊

Generate text responses using images and text prompts
Running on Zero

5

5

GLM-Edge-V-5B Space

📷

Generate text responses based on images and chat history
Running on Zero

15

15

Paligemma2 Detection

😻

Paligemma2 Detection with Supervision
Running on Zero

39

39

Florence Llama

💬

Generate text responses based on images and input text
Runtime error

6

6

Paligemma2 10b Ft Docci 448

📉
Running on Zero

4

4

OLA-VLM

🔍

Generate images and insights from text and images
Running on Zero

1.69k

1.69k

Chat With Janus-Pro-7B

🌍

A unified multimodal understanding and generation model.