Merve Noyan

merve

AI & ML interests

VLMs, vision & co

Recent Activity

posted an update 2 days ago
Last week we were blessed with open-source models! A recap πŸ’ https://huggingface.co/collections/merve/nov-29-releases-674ccc255a57baf97b1e2d31 πŸ–ΌοΈ Multimodal > At Hugging Face we released SmolVLM, a performant and efficient smol vision language model πŸ’— > Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents πŸ€– > Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length > ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM > Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning modelπŸ“– > Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT πŸ“• πŸ’¬ LLMs > Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet πŸ”₯ > AliBaba has released Marco-o1, a new open-source reasoning model πŸ’₯ > NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer) ⏯️ Image/Video Generation > Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation > Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res ⏯️ > Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla 🏷️ Audio > OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
updated a collection 2 days ago
Nov 29 Releases 🌲🌲
View all activity

Articles

Organizations

Hugging Face's profile picture Google's profile picture Deprem Yapay Zeka's profile picture Notebooks-explorers's profile picture SODA's profile picture Deprem Private's profile picture PyTorch Image Models's profile picture Turkish NLP Dataset Creators's profile picture Templates's profile picture Demo Crafters πŸ€— 's profile picture Keras's profile picture tensorflow's profile picture Mukayese's profile picture HugGAN Community's profile picture EPFL VILAB's profile picture Hugging Face Fellows's profile picture Huggingface.js's profile picture scikit-learn's profile picture JAX β™₯️ Diffusers 🧨's profile picture HuggingFaceM4's profile picture 2023 Jan Offsite hackathon's profile picture HF Canonical Model Maintainers's profile picture scikit-learn's profile picture Huggingface Projects's profile picture fastai X Hugging Face Group 2022's profile picture boun-tabi-LMG's profile picture skops-tests's profile picture Kornia AI's profile picture Hugging Face H4's profile picture Keras Dreambooth Event's profile picture Turkish T5 - BERT - GPT-2's profile picture Blog-explorers's profile picture Hugging Face for Computer Vision's profile picture Hacktoberfest 2023's profile picture Hugging Face TB Research's profile picture adept-hf-collab's profile picture ZeroGPU Explorers's profile picture kotol's profile picture Magic Leap Community's profile picture Llava Hugging Face's profile picture MLX Community's profile picture Social Post Explorers's profile picture Top Contributors: Profile Followers's profile picture Dev Mode Explorers's profile picture Paris AI Running Club's profile picture yorg's profile picture CVPR2024's profile picture Les papiers de Merve's profile picture nltpt's profile picture s0409's profile picture Hugging Face FineVideo's profile picture mv's profile picture Cookbook Authors's profile picture open/ acc's profile picture Agents's profile picture

merve's activity

posted an update 1 day ago
view post
Post
1802
small but mighty πŸ”₯
you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM 🫰🏻 also with gradient accumulation simulated batch size is 16 ✨
I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work πŸ’ https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
posted an update 2 days ago
view post
Post
2242
Last week we were blessed with open-source models! A recap πŸ’
merve/nov-29-releases-674ccc255a57baf97b1e2d31

πŸ–ΌοΈ Multimodal
> At Hugging Face we released SmolVLM, a performant and efficient smol vision language model πŸ’—
> Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents πŸ€–
> Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length
> ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM
> Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning modelπŸ“–
> Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT πŸ“•

πŸ’¬ LLMs
> Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet πŸ”₯
> AliBaba has released Marco-o1, a new open-source reasoning model πŸ’₯
> NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer)

⏯️ Image/Video Generation
> Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation
> Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res ⏯️
> Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla 🏷️

Audio
> OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
posted an update 7 days ago
view post
Post
2078
The authors of ColPali trained a retrieval model based on SmolVLM 🀠 vidore/colsmolvlm-alpha
TLDR;

- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks

- ColSmolVLM is more memory efficient than ColQwen2 πŸ’—
posted an update 8 days ago
view post
Post
3704
Small yet mighty! πŸ’«

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🀠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO πŸ’
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO πŸ’—
posted an update 12 days ago
view post
Post
2525
What a week! A recap for everything you missed ❄️
merve/nov-22-releases-673fbbcfc1c97c4f411def07
Multimodal ✨
> Mistral AI
released Pixtral 124B, a gigantic open vision language model
> Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU
> OpenGVLab released MMPR: a new multimodal reasoning dataset
> Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings
> Apple released new SotA vision encoders AIMv2

LLMs πŸ¦™
> AllenAI dropped a huge release of models, datasets and scripts for TΓΌlu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR
> Jina has released embeddings-v3: new multilingual embeddings with longer context
> Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning
> Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs

Image Generation πŸ–ΌοΈ
> Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations

Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them πŸ“š
$ pip install observers
  • 3 replies
Β·
posted an update 12 days ago
view post
Post
1459
Apple released AIMv2 🍏 a family of state-of-the-art open-set vision encoders
apple/aimv2-6720fe1558d94c7805f7688c
> like CLIP, but add a decoder and train on autoregression 🀯
> 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448
> Load and use with πŸ€— transformers
posted an update 12 days ago
view post
Post
2948
your hugging face profile now has your recent activities πŸ€—
posted an update 16 days ago
reacted to sayakpaul's post with ❀️ 16 days ago
view post
Post
2476
It's been a while we shipped native quantization support in diffusers 🧨

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
  • 1 reply
Β·
reacted to BlinkDL's post with πŸ”₯ 16 days ago
view post
Post
3106
RWKV-6-world-v3 (+3.1T tokens) is our best multilingual 7B model as of now: BlinkDL/rwkv-6-world

It's 100% RNN and attention-free. MMLU 54.2% (previous world-v2.1 = 47.9%. note: without eval-boosting tricks such as annealing).

RWKV-7-world-v4 soon :)
reacted to davidberenstein1957's post with πŸ”₯πŸ‘€ 16 days ago
view post
Post
1909
For anyone who struggles with NER or information extraction with LLM.

We showed an efficient workflow for token classification including zero-shot suggestions and model fine-tuning with Argilla, GliNER, the NuMind NuExtract LLM and SpanMarker. @argilla

Video: https://youtu.be/JvLpaYgNd84?feature=shared
Notebooks and slides included to try it yourself πŸ™‚
reacted to erikkaum's post with πŸ‘€πŸ”₯ 16 days ago
view post
Post
1680
A while ago I started experimenting with compiling the Python interpreter to WASM.

To build a secure, fast, and lightweight sandbox for code execution β€” ideal for running LLM-generated Python code.

- Send code simply as a POST request
- 1-2ms startup times

Hack away:
https://github.com/ErikKaum/runner
reacted to AdinaY's post with πŸ‘€ 16 days ago
reacted to prithivMLmods's post with πŸ€— 16 days ago
view post
Post
3913
Minimalistic Adapters πŸŽƒ

πŸš€Demo Here:
prithivMLmods/FLUX-LoRA-DLC

πŸš€Model:
{ Quote Tuner } : prithivMLmods/Flux.1-Dev-Quote-LoRA
{ Stamp Art } : prithivMLmods/Flux.1-Dev-Stamp-Art-LoRA
{ Hand Sticky } : prithivMLmods/Flux.1-Dev-Hand-Sticky-LoRA
{ Poster HQ } : prithivMLmods/Flux.1-Dev-Poster-HQ-LoRA
{ Ctoon Min } : prithivMLmods/Flux.1-Dev-Ctoon-LoRA

πŸš€Collection:
{ Flux LoRA Collection} : prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
{ LoRA Space Collection } : prithivMLmods/lora-space-collections-6714b72e0d49e1c97fbd6a32

πŸš€For More Visit
https://huggingface.co/strangerzonehf
.
.
.
πŸ€—@prithivMLmods
  • 3 replies
Β·
reacted to hexgrad's post with πŸ”₯ 16 days ago
posted an update 17 days ago
view post
Post
4841
OmniVision-968M: a new local VLM for edge devices, fast & small but performant
πŸ’¨ a new vision language model with 9x less image tokens, super efficient
πŸ“– aligned with DPO for reducing hallucinations
⚑️ Apache 2.0 license πŸ”₯

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model NexaAIDev/omnivision-968M
  • 4 replies
Β·
reacted to AdinaY's post with πŸ‘€πŸ”₯ 19 days ago
view post
Post
2524
Let’s dive into the exciting releases from the Chinese community last week πŸ”₯πŸš€
More details πŸ‘‰ https://huggingface.co/zh-ai-community

Code model:
✨Qwen 2.5 coder by Alibaba Qwen
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f
✨OpenCoder by InflyAI - Fully open code modelπŸ™Œ
infly/opencoder-672cec44bbb86c39910fb55e

Image model:
✨Hunyuan3D-1.0 by Tencent
tencent/Hunyuan3D-1

MLLM:
✨JanusFlow by DeepSeek
deepseek-ai/JanusFlow-1.3B
deepseek-ai/JanusFlow-1.3B
✨Mono-InternVL-2B by OpenGVlab
OpenGVLab/Mono-InternVL-2B

Video model:
✨CogVideoX 1.5 by ChatGLM
THUDM/CogVideoX1.5-5B-SAT

Audio model:
✨Fish Agent by FishAudio
fishaudio/fish-agent-v0.1-3b

Dataset:
✨OPI dataset by BAAIBeijing
BAAI/OPI