SihengLi's picture

4 10 1

SihengLi

Siheng99

·

SihengLi99

AI & ML interests

Artificial Intelligence

Recent Activity

upvoted a paper 7 days ago

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

authored a paper 23 days ago

Reinforcement Learning on Pre-Training Data

upvoted a paper 23 days ago

Reinforcement Learning on Pre-Training Data

View all activity

Organizations

upvoted a paper 7 days ago

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published 14 days ago • 42

authored a paper 23 days ago

Reinforcement Learning on Pre-Training Data

Paper • 2509.19249 • Published 24 days ago • 66

upvoted a paper 23 days ago

Reinforcement Learning on Pre-Training Data

Paper • 2509.19249 • Published 24 days ago • 66

upvoted a paper 3 months ago

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Paper • 2507.14683 • Published Jul 19 • 131

updated a collection 4 months ago

🌸RePO

RePO: Replay-Enhanced Policy Optimization • 6 items • Updated Jun 6

updated a model 4 months ago

Siheng99/Qwen3-1.7B-DeepMath-1024samples-RePO

Text Generation • 2B • Updated Jun 6 • 1

published a model 4 months ago

Siheng99/Qwen3-1.7B-DeepMath-1024samples-RePO

Text Generation • 2B • Updated Jun 6 • 1

updated a model 4 months ago

Siheng99/Qwen3-1.7B-DeepMath-1024samples-GRPO

Text Generation • 2B • Updated Jun 6 • 4

published a model 4 months ago

Siheng99/Qwen3-1.7B-DeepMath-1024samples-GRPO

Text Generation • 2B • Updated Jun 6 • 4

updated a model 4 months ago

Siheng99/Qwen2.5-Math-7B-DeepMath-1024samples-RePO

Text Generation • 8B • Updated Jun 6

published a model 4 months ago

Siheng99/Qwen2.5-Math-7B-DeepMath-1024samples-RePO

Text Generation • 8B • Updated Jun 6

updated a model 4 months ago

Siheng99/Qwen2.5-Math-7B-DeepMath-1024samples-GRPO

Text Generation • 8B • Updated Jun 6 • 2

published a model 4 months ago

Siheng99/Qwen2.5-Math-7B-DeepMath-1024samples-GRPO

Text Generation • 8B • Updated Jun 6 • 2

updated a model 4 months ago

Siheng99/Qwen2.5-Math-1.5B-DeepMath-1024samples-RePO

Text Generation • 2B • Updated Jun 6

published a model 4 months ago

Siheng99/Qwen2.5-Math-1.5B-DeepMath-1024samples-RePO

Text Generation • 2B • Updated Jun 6