MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published 9 days ago • 22
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published 9 days ago • 22
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval Paper • 2303.03004 • Published Mar 6, 2023
DelucionQA: Detecting Hallucinations in Domain-specific Question Answering Paper • 2312.05200 • Published Dec 8, 2023 • 1
Evidence to Generate (E2G): A Single-agent Two-step Prompting for Context Grounded and Retrieval Augmented Reasoning Paper • 2401.05787 • Published Jan 11, 2024
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning Paper • 2403.09028 • Published Mar 14, 2024
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving Paper • 2405.11403 • Published May 18, 2024 • 2
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations Paper • 2407.04069 • Published Jul 4, 2024
Learning to Filter Context for Retrieval-Augmented Generation Paper • 2311.08377 • Published Nov 14, 2023
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models Paper • 2410.01782 • Published Oct 2, 2024 • 10
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval Paper • 2412.01558 • Published Dec 2, 2024 • 4
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval Paper • 2412.01558 • Published Dec 2, 2024 • 4
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving Paper • 2405.11403 • Published May 18, 2024 • 2