Umitcan Sahin's picture

Umitcan Sahin

ucsahin

·

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Recent Activity

reacted to ezgikorkmaz's post with 🚀 about 15 hours ago

liked a dataset about 21 hours ago

microsoft/orca-agentinstruct-1M-v1

liked a dataset 1 day ago

mlabonne/orca-agentinstruct-1M-v1-cleaned

Organizations

None yet

ucsahin's activity

upvoted a collection 4 days ago

SigLIP

Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 10 items • Updated 4 days ago • 37

upvoted a collection 7 days ago

Nov 15 Releases 🍂

15 items • Updated 7 days ago • 6

upvoted a collection 2 months ago

Turkish Vision-Language Datasets

Collection of Turkish vision-language datasets. • 19 items • Updated 6 days ago • 4

upvoted 3 papers 3 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5 • 60

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9 • 46

upvoted 2 papers 4 months ago

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 108

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

upvoted a collection 4 months ago

Vision Language Leaderboards

This collection has all the vision language leaderboards. • 7 items • Updated Aug 24 • 11

upvoted 2 articles 4 months ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31

• 59

Article

The Rise of Agentic Data Generation

By

•

Jul 15

• 78

upvoted 2 papers 4 months ago

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9 • 41

upvoted a collection 4 months ago

🪐 SmolLM

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 198

upvoted 2 articles 4 months ago

Article

TGI Multi-LoRA: Deploy Once, Serve 30 Models

Jul 18

• 48

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18

• 67

upvoted 3 papers 4 months ago

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11 • 19

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3 • 48

Multi-Object Hallucination in Vision-Language Models

Paper • 2407.06192 • Published Jul 8 • 9

upvoted a paper 5 months ago

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27 • 41