Submitted by Shengnan An 33 AMO-Bench: Large Language Models Still Struggle in High School Math Competitions LongCat 53 1
7 UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in OmniModels LongCat 75
- LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models LongCat
Submitted by Luyi 26 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? LongCat 18 2
Submitted by Wei He 19 VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications LongCat 16 2