@qq8933 on Hugging Face: "LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and…"

Join the community of Machine Learners and AI enthusiasts.

qq8933

posted an update 19 days ago

Post

5513

LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace
Large Reasoning Models powered by Monte Carlo Tree Search (MCTS), Self-Play Reinforcement Learning, PPO, AlphaGo Zero's dua policy paradigm and Large Language Models!
https://github.com/SimpleBerry/LLaMA-O1/

What will happen when you compound MCTS ❤ LLM ❤ Self-Play ❤RLHF?
Just a little bite of strawberry!🍓

Past related works:
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)

17 days ago

Awesome work. Can we finetune further this reasoning model?

qq8933

16 days ago

main.py is the entry for finetune, but codes need further improvements, see 'Call for contributors'

In this post