StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code Paper • 2306.04556 • Published Jun 7, 2023
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation Paper • 2208.08227 • Published Aug 17, 2022 • 1
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published Feb 3 • 9
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability Paper • 2408.01416 • Published Aug 2, 2024 • 1
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models Paper • 2502.01639 • Published Feb 3 • 26
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge Paper • 2412.00176 • Published Nov 29, 2024 • 9
Themis: Towards Flexible and Interpretable NLG Evaluation Paper • 2406.18365 • Published Jun 26, 2024
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 35
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 35
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 35
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs Paper • 2406.20086 • Published Jun 28, 2024 • 6
Activation Steering for Robust Type Prediction in CodeLLMs Paper • 2404.01903 • Published Apr 2, 2024 • 2