MD MUHAIMIN RAHMAN's picture

1 1

MD MUHAIMIN RAHMAN

sezan92

·

AI & ML interests

AI, Reinforcement learning, Graph Neural Network, Computer Vision, Robotics

Recent Activity

reacted to prithivMLmods's post with 🚀 8 days ago

OpenGVLab's InternVL3_5-2B-MPO [Mixed Preference Optimization (MPO)] is a compact vision-language model in the InternVL3.5 series. You can now experience it in the Tiny VLMs Lab, an app featuring 15+ multimodal VLMs ranging from 250M to 4B parameters. These models support tasks such as OCR, reasoning, single-shot answering with small models, and captioning (including ablated variants), across a broad range of visual categories. They are also capable of handling images with complex, sensitive, or nuanced content, while adapting to varying aspect ratios and resolutions. ✨ Space/App : https://huggingface.co/spaces/prithivMLmods/Tiny-VLMs-Lab 🫙 Model : https://huggingface.co/OpenGVLab/InternVL3_5-2B-MPO ↗️ Collection: https://huggingface.co/collections/OpenGVLab/internvl35-68ac87bd52ebe953485927fb 🗞️ Paper : https://arxiv.org/pdf/2508.18265 ↗️ Multimodal Space Collection : https://huggingface.co/collections/prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0 To learn more, visit the relevant spaces, collections, and model cards.

reacted to prithivMLmods's post with 🤗 8 days ago

OpenGVLab's InternVL3_5-2B-MPO [Mixed Preference Optimization (MPO)] is a compact vision-language model in the InternVL3.5 series. You can now experience it in the Tiny VLMs Lab, an app featuring 15+ multimodal VLMs ranging from 250M to 4B parameters. These models support tasks such as OCR, reasoning, single-shot answering with small models, and captioning (including ablated variants), across a broad range of visual categories. They are also capable of handling images with complex, sensitive, or nuanced content, while adapting to varying aspect ratios and resolutions. ✨ Space/App : https://huggingface.co/spaces/prithivMLmods/Tiny-VLMs-Lab 🫙 Model : https://huggingface.co/OpenGVLab/InternVL3_5-2B-MPO ↗️ Collection: https://huggingface.co/collections/OpenGVLab/internvl35-68ac87bd52ebe953485927fb 🗞️ Paper : https://arxiv.org/pdf/2508.18265 ↗️ Multimodal Space Collection : https://huggingface.co/collections/prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0 To learn more, visit the relevant spaces, collections, and model cards.

updated a dataset 7 months ago

hf-vision/course-assets

View all activity

Organizations

upvoted a paper almost 2 years ago

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 55