LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Paper β’ 2309.11998 β’ Published Sep 21, 2023 β’ 25
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench Paper β’ 2409.13373 β’ Published Sep 20, 2024 β’ 3
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise Paper β’ 2312.12436 β’ Published Dec 19, 2023 β’ 13
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation Paper β’ 2311.08877 β’ Published Nov 15, 2023 β’ 6
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure Paper β’ 2311.07590 β’ Published Nov 9, 2023 β’ 16