PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published Sep 10, 2024 • 66
Running 165 165 Low-bit Quantized Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation Paper • 2401.04092 • Published Jan 8, 2024 • 21 • 1
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Paper • 2312.14238 • Published Dec 21, 2023 • 20
Self-Evaluation Improves Selective Generation in Large Language Models Paper • 2312.09300 • Published Dec 14, 2023 • 16
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor Paper • 2312.07661 • Published Dec 12, 2023 • 19