Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published 5 days ago • 40
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published 5 days ago • 40
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation Paper • 2410.15748 • Published Oct 21, 2024 • 13
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels Paper • 2405.07526 • Published May 13, 2024 • 19
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation Paper • 2305.09515 • Published May 16, 2023 • 3
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing Paper • 2305.11738 • Published May 19, 2023 • 8