view post Post 3410 Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? 🌋🧑🍳 We've got you covered!!NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace 's Cookbook.Go to the recipe 👉https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trlPowered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images 🌋 See translation 🔥 6 6 + Reply
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated 29 days ago • 335
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO Paper • 2506.07464 • Published Jun 9 • 14 • 3
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper • 2506.09985 • Published Jun 11 • 30
SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification Paper • 2506.15569 • Published Jun 18 • 13
Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published May 30 • 81
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought Paper • 2505.16192 • Published May 22 • 12
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Paper • 2505.14640 • Published May 20 • 15
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21 • 53
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20 • 53
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning Paper • 2505.11049 • Published May 16 • 61
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA Paper • 2505.06356 • Published May 9 • 3
Aya Vision: Advancing the Frontier of Multilingual Multimodality Paper • 2505.08751 • Published May 13 • 12
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Paper • 2505.07263 • Published May 12 • 30
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning Paper • 2505.10557 • Published May 15 • 47