3 62 26

Gullal Singh Cheema

gullalc

gullalc

AI & ML interests

Multimodality, Vision and Language, Cross-modal relations, Video Understanding

Recent Activity

liked a dataset about 10 hours ago

HuggingFaceM4/FineVision

reacted to sergiopaniego's post with 🔥 about 1 month ago

Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? 🌋 🧑‍🍳 We've got you covered!! NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace's Cookbook. Go to the recipe 👉https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images 🌋

upvoted a collection about 1 month ago

gpt-oss

View all activity

Organizations

None yet

upvoted a collection about 1 month ago

gpt-oss

Collection

Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated 29 days ago • 335

upvoted 9 papers about 2 months ago

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Paper • 2506.09985 • Published Jun 11 • 30

SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Paper • 2506.15569 • Published Jun 18 • 13

CoMemo: LVLMs Need Image Context with Image Memory

Paper • 2506.06279 • Published Jun 6 • 9

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 81

VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22 • 12

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Paper • 2505.14640 • Published May 20 • 15

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 53

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

Paper • 2505.14231 • Published May 20 • 53

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Paper • 2505.11049 • Published May 16 • 61

upvoted 10 papers 4 months ago

Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

Paper • 2505.06356 • Published May 9 • 3

Aya Vision: Advancing the Frontier of Multilingual Multimodality

Paper • 2505.08751 • Published May 13 • 12

Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning

Paper • 2505.07263 • Published May 12 • 30

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

Paper • 2505.10557 • Published May 15 • 47

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5 • 22

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Paper • 2505.04601 • Published May 7 • 27

Gullal Singh Cheema

AI & ML interests

Recent Activity

Organizations

gullalc's activity