3 62 26

Gullal Singh Cheema

gullalc

gullalc

AI & ML interests

Multimodality, Vision and Language, Cross-modal relations, Video Understanding

Recent Activity

liked a dataset about 2 hours ago

HuggingFaceM4/FineVision

reacted to sergiopaniego's post with 🔥 about 1 month ago

Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? 🌋 🧑‍🍳 We've got you covered!! NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace's Cookbook. Go to the recipe 👉https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images 🌋

upvoted a collection about 1 month ago

gpt-oss

View all activity

Organizations

None yet

liked a dataset about 2 hours ago

HuggingFaceM4/FineVision

Viewer • Updated about 5 hours ago • 24.2M • 782 • 80

reacted to sergiopaniego's post with 🔥 about 1 month ago

Post

3410

Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? 🌋

🧑‍🍳 We've got you covered!!

NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace 's Cookbook.

Go to the recipe 👉https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl

Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images 🌋

upvoted a collection about 1 month ago

gpt-oss

Collection

Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated 29 days ago • 335

liked 2 models about 1 month ago

openai/gpt-oss-20b

Text Generation • 22B • Updated 9 days ago • 9.66M • • 3.41k

openai/gpt-oss-120b

Text Generation • 120B • Updated 9 days ago • 2.79M • • 3.74k

commented a paper about 2 months ago

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9 • 14 •

upvoted 9 papers about 2 months ago

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Paper • 2506.09985 • Published Jun 11 • 30

SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Paper • 2506.15569 • Published Jun 18 • 13

CoMemo: LVLMs Need Image Context with Image Memory

Paper • 2506.06279 • Published Jun 6 • 9

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 81

VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22 • 12

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Paper • 2505.14640 • Published May 20 • 15

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 53

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

Paper • 2505.14231 • Published May 20 • 53

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Paper • 2505.11049 • Published May 16 • 61

upvoted 5 papers 4 months ago

Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

Paper • 2505.06356 • Published May 9 • 3

Aya Vision: Advancing the Frontier of Multilingual Multimodality

Paper • 2505.08751 • Published May 13 • 12

Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning

Paper • 2505.07263 • Published May 12 • 30

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

Paper • 2505.10557 • Published May 15 • 47

Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11 • 150

Gullal Singh Cheema

AI & ML interests

Recent Activity

Organizations

gullalc's activity