FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.
It includes GDPO, the latest variant of GRPO for multi-reward RL โจ GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence โ developed by @sliuau@SimonX et al.
Recursive Language Models (RLM) is a new interface for LLMs with cool ideas by Alex Zhang!
โ ๏ธ LLMs struggle with long prompts โ attention overload & lost info ๐ RLMs inspect, split & call themselves on chunks, then aggregate results โ Handles millions of tokens, reduces noise, improves reasoning ๐ก System prompt guides recursion ๐ฏ RLM trajectories can be used for RL training or distillation (OpenEnv+TRL!!)
The list of hands-on notebooks (some beginner-friendly!) to get started with fine-tuning using TRL keeps growing!!
โข SFT โข GRPO โข Tool calling & agents โข RL environments with OpenEnv โข LLMs and VLMs โจ Many run on FREE Colab, making it super easy to get started fast!
The Christmas holidays are here! ๐ Thinking about learning something new in AI?
@huggingface offers 12 FREE courses covering all the relevant topics, for every level of experience. A great challenge for the holidays (and worth saving for later ๐)