MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering Paper • 2410.07095 • Published Oct 9 • 6
[Re] Badder Seeds: Reproducing the Evaluation of Lexical Methods for Bias Measurement Paper • 2206.01767 • Published Jun 3, 2022
Probing LLMs for Joint Encoding of Linguistic Categories Paper • 2310.18696 • Published Oct 28, 2023 • 1