Multi-Agent Collaboration for Multilingual Code Instruction Tuning Paper • 2502.07487 • Published Feb 11
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models Paper • 2502.13059 • Published Feb 18
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published Feb 23 • 27
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation Paper • 2505.14552 • Published May 20 • 1
M3TQA: Massively Multilingual Multitask Table Question Answering Paper • 2508.16265 • Published Aug 22
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published 10 days ago • 194
V-GameGym: Visual Game Generation for Code Large Language Models Paper • 2509.20136 • Published Sep 24 • 9