DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29 • 137
PokerBench: Training Large Language Models to become Professional Poker Players Paper • 2501.08328 • Published Jan 14 • 19
EmbedLLM: Learning Compact Representations of Large Language Models Paper • 2410.02223 • Published Oct 3, 2024 • 3