9 98 18

Dhruv Diddi

ddiddi

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

LocAgent: Graph-Guided LLM Agents for Code Localization

upvoted a paper 2 days ago

Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol

upvoted a paper 2 days ago

AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models

View all activity

Organizations

ddiddi's activity

upvoted 7 papers 2 days ago

LocAgent: Graph-Guided LLM Agents for Code Localization

Paper • 2503.09089 • Published 3 days ago • 5

Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol

Paper • 2503.05860 • Published 7 days ago • 7

AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models

Paper • 2503.08417 • Published 3 days ago • 6

"Principal Components" Enable A New Language of Images

Paper • 2503.08685 • Published 3 days ago • 10

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published 4 days ago • 89

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published 4 days ago • 32

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published 4 days ago • 73

upvoted 2 papers about 1 month ago

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Paper • 2501.18362 • Published Jan 30 • 21

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 111

upvoted 4 papers about 2 months ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 85

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 346

GPS as a Control Signal for Image Generation

Paper • 2501.12390 • Published Jan 21 • 12

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22 • 57

upvoted 7 papers 3 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 94

MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views

Paper • 2412.06767 • Published Dec 9, 2024 • 7

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 107

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Paper • 2412.02687 • Published Dec 3, 2024 • 109