AI & ML interests

None defined yet.

merveย 
posted an update 21 days ago
view post
Post
5125
deepseek-ai/DeepSeek-OCR is out! ๐Ÿ”ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
ยท
merveย 
posted an update about 2 months ago
view post
Post
6629
large AI labs open-sourced a ton of models last week ๐Ÿ”ฅ
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 ๐Ÿค
> IBM released a new Docling model with 258M params based on Granite (A2.0) ๐Ÿ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana ๐ŸŒ (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset ๐Ÿ’ป OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash ๐Ÿ’ญ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
ยท
merveย 
posted an update about 2 months ago
view post
Post
3279
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face ๐Ÿ”ฅ

> not only a document converter but also can do document question answering, understand multiple languages ๐Ÿคฏ
> best part: released with Apache 2.0 license ๐Ÿ‘ use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! ๐Ÿค—
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo ๐Ÿ’—
merveย 
posted an update about 2 months ago
view post
Post
1133
a ton of image/video generation models and LLMs from big labs ๐Ÿ”ฅ

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use ๐Ÿ’ฌ
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR ๐Ÿ“
> ByteDance released bytedance-research/HuMo, video generation from any input โฏ๏ธ

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
merveย 
posted an update about 2 months ago
view post
Post
949
fan-favorite vision LM Florence-2 is now officially supported in transformers ๐Ÿค—

find all the models in florence-community org ๐Ÿซก
merveย 
posted an update 2 months ago
merveย 
posted an update 2 months ago
merveย 
posted an update 2 months ago
view post
Post
6262
large AI labs have dropped so many open models last week ๐Ÿ”ฅ don't miss out on them

โ†’ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
โ†’ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
โ†’ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486
  • 1 reply
ยท
merveย 
posted an update 3 months ago
view post
Post
6046
first vision language model built off openai/gpt-oss-20b just dropped! ๐Ÿ”ฅ

InternVL3.5 comes with 32 models ๐Ÿคฏ pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part โคต๏ธ
  • 1 reply
ยท
Xenovaย 
posted an update 3 months ago
view post
Post
9192
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! ๐Ÿคฏ
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! ๐Ÿ˜

How does it work? ๐Ÿค”
1๏ธโƒฃ Generate and cache image features for each frame
2๏ธโƒฃ Create a list of embeddings for selected patch(es)
3๏ธโƒฃ Compute cosine similarity between each patch and the selected patch(es)
4๏ธโƒฃ Highlight those whose score is above some threshold

... et voilร ! ๐Ÿฅณ

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!
  • 1 reply
ยท
merveย 
posted an update 3 months ago
view post
Post
3297
GPT-4.1-mini level model right in your iPhone ๐Ÿคฏ

openbmb/MiniCPM-V-4 is only 4B while surpassing GPT-4.1-mini in vision benchmarks ๐Ÿ”ฅ

allows commercial use as well!
Xenovaย 
posted an update 3 months ago
view post
Post
4394
The next generation of AI-powered websites is going to be WILD! ๐Ÿคฏ

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by ๐Ÿค— Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! ๐Ÿš€
  • 2 replies
ยท
merveย 
posted an update 3 months ago
view post
Post
1171
we're all sleeping on this OCR model rednote-hilab/dots.ocr ๐Ÿ”ฅ

dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! ๐Ÿคฏ

single e2e model to extract image, convert tables, formula, and more into markdown ๐Ÿ“
try it MohamedRashad/Dots-OCR
merveย 
posted an update 3 months ago
view post
Post
690
massive releases and tons of Flux 1. Krea LoRas past week!
here's some of the picks, find more models in collection ๐Ÿซก merve/releases-august-2-6890c14248203522b7d0267f

LLMs ๐Ÿ’ฌ
> Tencent dropped tencent/Hunyuan-7B-Instruct
> Qwen released Qwen/Qwen3-Coder-30B-A3B-Instruct, 30B MoE with 3B params for coding (OS)

vision/multimodal
> RedNote released rednote-hilab/dots.ocr - 3B OCR model (OS)
> Cohere released CohereLabs/command-a-vision-07-2025 - 112B (dense!) VLM for 6 languages
> StepFun-AI shipped stepfun-ai/step3 - 321B MoE VLM (OS)
> Skywork shipped Skywork/Skywork-UniPic-1.5B - new any-to-any model (image+text โ†’ image+text) (OS)
merveย 
posted an update 3 months ago
merveย 
posted an update 3 months ago
view post
Post
3656
past week in open AI was insane ๐Ÿ”ฅ here's some of picks, find more here merve/releases-july-25-688768ca47fe3693407e02d1

๐Ÿ’ฌ LLMs & VLMs
> Qwen/Qwen3-235B-A22B-Thinking-2507 had a new update (OS)
> Qwen/Qwen3-Coder-480B-A35B-Instruct is out with 480B total 35B active params ๐Ÿคฏ (OS)
> AllenAI dropped an update to allenai/olmOCR-7B-0725 ๐Ÿ“
> InternLM released internlm/Intern-S1 - 235B Qwen3 MoE + 6B InternViT encoder (OS)
> OmniSVG/OmniSVG is a new SVG generation VLM (OS)

๐Ÿ–ผ๏ธ image/video/3D generation
> WanAI released Wan2.2 series - both T2V and I2V 14B models for high-quality video generation (OS) multimodalart/wan-22-688767e313337b434ed55112
> Tencent dropped tencent/HunyuanWorld-1 - image-to-3D scene generation model
  • 1 reply
ยท