Merve Noyan's picture

Merve Noyan

merve

·

https://github.com/merveenoyan/smol-vision

AI & ML interests

VLMs, vision & co

Recent Activity

updated a collection about 10 hours ago

January 31 Releases 🧤

liked a model about 10 hours ago

onnx-community/Janus-Pro-1B-ONNX

updated a collection about 10 hours ago

January 31 Releases 🧤

View all activity

Articles

We now support VLMs in smolagents!

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Introducing smolagents: simple agents that write actions in code.

Welcome PaliGemma 2 – New vision language models by Google

SmolVLM - small yet mighty Vision Language Model

Llama can now see and run on your device - welcome Llama 3.2

Preference Optimization for Vision Language Models

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models Explained

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

Deploy MusicGen in no time with Inference Endpoints

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Jupyter X Hugging Face

Using Machine Learning to Aid Survivors and Race through Time

Introducing Skops

Announcing the Hugging Face Fellowship Program

Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Showcase Your Projects in Spaces using Gradio

Organizations

merve's activity

upvoted an article about 11 hours ago

Article

🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker!

By

•

about 16 hours ago

• 13

upvoted an article 1 day ago

Article

Welcome to Inference Providers on the Hub 🔥

2 days ago

• 147

upvoted a collection 5 days ago

AceMath

We are releasing math instruction models, math reward models, general instruction models, all training datasets, and a math reward benchmark. • 11 items • Updated 12 days ago • 9

upvoted a collection 6 days ago

SmolVLM 256M & 500M

Collection for models & demos for even smoller SmolVLM release • 12 items • Updated 6 days ago • 60

upvoted a paper 15 days ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published 19 days ago • 78

upvoted 2 collections 20 days ago

ViTPose

Collection for ViTPose models based on transformers implementation. • 10 items • Updated 17 days ago • 12

Sa2VA model zoo

4 items • Updated 15 days ago • 28

upvoted a collection 29 days ago

QVQ

QVQ: Qwen models for visual reasoning • 7 items • Updated 28 days ago • 41

upvoted 2 papers about 1 month ago

Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published Dec 10, 2024 • 27

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 139

upvoted a paper about 2 months ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 124

upvoted 4 papers 4 months ago

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 26

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 49

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

Paper • 2405.02296 • Published Mar 7, 2024 • 4

NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Paper • 2404.01300 • Published Apr 1, 2024 • 4

upvoted an article 4 months ago

Article

Document Similarity Search with ColPali

By

•

Sep 21, 2024

• 49

upvoted 3 papers 5 months ago

DriveLM: Driving with Graph Visual Question Answering

Paper • 2312.14150 • Published Dec 21, 2023 • 4

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 58

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52

upvoted a collection 6 months ago

InternVideo2

InternVideo2 • 17 items • Updated 7 days ago • 18