LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images Paper • 2403.11703 • Published Mar 18, 2024 • 18
Ovis2.5 Collection Our next-generation MLLMs for native-resolution vision and advanced reasoning • 5 items • Updated 17 days ago • 54
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • Aug 5 • 487
Moshi: a speech-text foundation model for real-time dialogue Paper • 2410.00037 • Published Sep 17, 2024 • 6
Configurable Preference Tuning with Rubric-Guided Synthetic Data Paper • 2506.11702 • Published Jun 13 • 2
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation Paper • 2506.03857 • Published Jun 4 • 3
Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning Paper • 2506.08477 • Published Jun 10 • 5
Reward Models Enable Scalable Code Verification by Trading Accuracy for Throughput Paper • 2506.10056 • Published Jun 11 • 3
Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models Paper • 2506.11116 • Published Jun 9 • 5
LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning Paper • 2506.10082 • Published Jun 11 • 9
Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills Paper • 2506.10387 • Published Jun 12 • 5
Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings Paper • 2506.08592 • Published Jun 10 • 23