-
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 32
Igor Kilbas
kaleinaNyan
AI & ML interests
Computer Vision, NLP
Organizations
None yet
Good RL papers
-
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 32
JinaJudge
A series of encoder-transformer models for cheap evaluation of LLM on Russian Hard LLM Arena.
models
11
kaleinaNyan/kolibri-qwen2.5-7b-060225-rlhf-1
8B
•
Updated
kaleinaNyan/kolibri-qwen2.5-7b-060225-rlhf-1.gguf
8B
•
Updated
•
3
kaleinaNyan/eule-qwen2.5instruct-14b-111224
15B
•
Updated
•
2
•
1
kaleinaNyan/eule-qwen2.5instruct-7b-111224
8B
•
Updated
•
1
kaleinaNyan/jina-v3-rullmarena-judge-300924
0.6B
•
Updated
•
2
•
2
kaleinaNyan/jina-v3-rullmarena-judge-041024
0.6B
•
Updated
•
2
•
1
kaleinaNyan/jina-v3-rullmarena-judge
0.6B
•
Updated
•
9
•
3
kaleinaNyan/kolibri-mistral-0427-upd
Text Generation
•
7B
•
Updated
•
2
•
1
kaleinaNyan/kolibri-mistral-0427-upd.gguf
7B
•
Updated
•
3
kaleinaNyan/kolibri-mistral-0427.gguf
7B
•
Updated
•
11
•
1