In a Training Loop 🔄

49 59 173

Stefano Fiorucci PRO

anakin87

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

liked a dataset about 3 hours ago

OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B

posted an update 1 day ago

💭 Do thinking traces make Language Models learn better? Curious what others think 𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼 You take an instruction-following LM. You want to train it with a GRPO-style RL algorithm on a task like Tic Tac Toe. Rewards are outcome-based, applied only at the end of each episode: win/loss/draw, format adherence... During training, the model could just output answers, but a common choice is to make it also output thinking traces. 𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻 Does forcing the model to produce thinking traces during training actually improve learning❓ 💬 I'd like to hear your thoughts. Share ideas and links to relevant papers and resources. From what I've understood so far, the answer seems to be 𝘆𝗲𝘀. 1️⃣ If you force the model to think during training, it becomes a model that thinks at inference time. It naturally allocates more budget (tokens) to a problem, which tends to improve performance. 2️⃣ While the model's "reasoning" already exists in its activation space, using explicit thinking traces as a scratchpad allows training to steer and shape that reasoning. 3️⃣ As the model produces more traces during training, the RL algorithm can progressively give higher rewards to the reasoning patterns that lead to better outcomes.

liked a model 2 days ago

NousResearch/nomos-1

View all activity

Organizations

posted an update 1 day ago

Post

105

💭 Do thinking traces make Language Models learn better? Curious what others think

𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼
You take an instruction-following LM.
You want to train it with a GRPO-style RL algorithm on a task like Tic Tac Toe.
Rewards are outcome-based, applied only at the end of each episode: win/loss/draw, format adherence...

During training, the model could just output answers, but a common choice is to make it also output thinking traces.

𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻
Does forcing the model to produce thinking traces during training actually improve learning❓

💬 I'd like to hear your thoughts. Share ideas and links to relevant papers and resources.

From what I've understood so far, the answer seems to be 𝘆𝗲𝘀.

1️⃣ If you force the model to think during training, it becomes a model that thinks at inference time. It naturally allocates more budget (tokens) to a problem, which tends to improve performance.

2️⃣ While the model's "reasoning" already exists in its activation space, using explicit thinking traces as a scratchpad allows training to steer and shape that reasoning.

3️⃣ As the model produces more traces during training, the RL algorithm can progressively give higher rewards to the reasoning patterns that lead to better outcomes.

posted an update 15 days ago

Post

439

I made a visualization based on the Prime Intellect INTELLECT-3 technical report.

Wild to see how far they pushed GLM-4.5-Air-Base with SFT + RL.
SOTA for its size and competitive with models 3x larger.

All open.

Congrats on the release!

Model: PrimeIntellect/INTELLECT-3
Technical report: https://storage.googleapis.com/intellect-3-paper/INTELLECT_3_Technical_Report.pdf
Chat: https://chat.primeintellect.ai/

posted an update about 1 month ago

Post

2868

LLMs can leak their post-training data (RL included) 💧

New interesting paper on this topic from Google DeepMind: Extracting alignment data in open models (2510.18554)

It's known that Language Models memorize data that can be extracted via prompting.

In this paper, the authors investigate this aspect:
- using open models, where prompting can be fully customized by the user, including special tokens.
- focusing on open-source models like Olmo, where full training data is available.

📤 How do they extract data?

During post-training (like SFT), new tokens such as <|user|> are introduced.

The authors hypothesize prompting the model with these tokens can make it output its alignment data (remember Magpie?).

For example, for SFT, their extraction prompt is <|endoftext|><|user|>.

📏 Evaluating memorization

The authors compare each sampled example with the original data using vector search with embedding similarity.

They find that many outputs are semantically very similar to the original data, even if the exact words differ.

Traditional string-matching algorithms underestimate memorization by 10x.

🔁 What about RL?

Surprisingly, the same technique works to extract data from Reinforcement Learning (PPO/GRPO) phases.

This is counter-intuitive because the RL objective is not designed to increase sequence likelihoods (unlike SFT).

Practical limitation: in this case, extraction relies on using the initial part of the training prompt, which is not generally public.

📈 Are the extracted data effective for post-training?

Both in SFT and RL, the extracted data can be used to fine-tune models to similar performance to the originals.

The authors suggest that model distillation, where a stronger model is used to drive the training of a weaker one, may be a form of indirect training on the original dataset.

posted an update 3 months ago

Post

491

Your Language Model needs better (open) environments to learn 🌀

📝 https://huggingface.co/blog/anakin87/environments-hub

RL environments help LLMs practice, reason, and improve.
I explored the Environments Hub and wrote a walkthrough showing how to train and evaluate models using these open environments.

1️⃣ 𝗪𝗵𝘆 𝗥𝗟 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀

DeepSeek-R1 made clear that Reinforcement Learning can be used to incentivize reasoning in LLMs.
In GRPO, the model generates multiple answers and learns to prefer the better ones from rewards.

2️⃣ 𝗪𝗵𝗮𝘁 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀 𝗮𝗿𝗲
In classic RL, the environment is the world where the Agent lives, interacts, and get rewards to learn.

We can also think of them as software packages, containing data, harness and scoring rules - for the model
to learn and be evaluated.

Nowadays, the Agent is not just the LLM. It can use tools, from a weather API to a terminal.

This makes environments for training and evaluation more complex and critical.

3️⃣ 𝐓𝐡𝐞 𝐨𝐩𝐞𝐧 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞

Big labs are advancing, but open models and the community still face a fragmented ecosystem.
We risk becoming users of systems built with tools we can't access or fully understand.

4️⃣ 𝐄𝐧𝐯𝐢𝐫𝐨𝐧𝐦𝐞𝐧𝐭𝐬 𝐇𝐮𝐛
That's why, I was excited when Prime Intellect released the Environments Hub.

It's a place where people share RL environments: tasks you can use to train LLMs with RL (GRPO-style) or evaluate Agents.
Plus, the Verifiers library ( @willcb ) standardizes the creation of RL environments and evaluations.
They can help to keep science and experimentation open. 🔬

I explored the Hub and wrote a hands-on walkthrough 📝
- RL + LLMs basics
- Environments Hub navigation
- Evaluating models/Agents
- GRPO Training a tiny model on an alphabetical sort task

Take a look!

📝 https://huggingface.co/blog/anakin87/environments-hub

reacted to sergiopaniego's post with 🔥 3 months ago

Post

3960

You can now supercharge your TRL training pipelines with kernels

👷 kernels is new library to load optimized compute kernels directly from the Hub

Combined with TRL, it makes you developer experience smoother & faster.

Check out the new guide to learn more! 🕺

Learn ➡️ https://huggingface.co/docs/trl/main/en/kernels_hub

posted an update 4 months ago

Post

4735

Want to quickly try Gemma 3 270m? 💎💬

I made a simple Space to do that: anakin87/gemma-3-270m-it

⚡ Fast: Flash Attention, Zero GPU
⚙️ Configurable

posted an update 4 months ago

Post

394

🕵️🌐 Building Browser Agents - notebook

No API? No problem.
Browser Agents can use websites like you do: click, type, wait, read.

📓 Step-by-step notebook: https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/browser_agents.ipynb

🎥 In the video, the Agent:
- Goes to Hugging Face Spaces
- Finds black-forest-labs/FLUX.1-schnell
- Expands a short prompt ("my holiday on Lake Como") into a detailed image generation prompt
- Waits for the image
- Returns the image URL

## What else can it do?
Great for information gathering and summarization

🗞️🗞️ Compare news websites and create a table of shared stories with links
▶️ Find content creator social profiles from YouTube videos
🛍️ Find a product's price range on Amazon
🚂 🚌 Gather public transportation travel options

## How is it built?
🏗️ Haystack → Agent execution logic
🧠 Google Gemini 2.5 Flash → Good and fast LLM with a generous free tier
🛠️ Playwright MCP server → Browser automation tools: navigate, click, type, wait...

Even without vision capabilities, this setup can get quite far.

## Next steps
- Try a local open model
- Move from notebook to real deployment
- Incorporate vision

And you? Have you built something similar? What's in your stack?

reacted to mlabonne's post with 🔥 4 months ago

Post

6848

Liquid just released two 450M and 1.6B param VLMs!

They're super fast and leverage SigLIP2 NaFlex encoders to handle native resolutions without distortion. It's ideal for on-device deployment in constrained environments like phones.

It's available today on Hugging Face, with an inference and a fine-tuning Colab notebooks.

LiquidAI/LFM2-VL-450M
LiquidAI/LFM2-VL-1.6B

posted an update 4 months ago

Post

1088

Haystack can now see 👀

The latest release of the Haystack OSS LLM framework adds a long-requested feature: image support!

📓 Notebooks below

This isn't just about passing images to an LLM. We built several features to enable practical multimodal use cases.

What's new?
🧠 Support for multiple LLM providers: OpenAI, Amazon Bedrock, Google Gemini, Mistral, NVIDIA, OpenRouter, Ollama and more (support for Hugging Face API coming 🔜)
🎛️ Prompt template language to handle structured inputs, including images
📄 PDF and image converters
🔍 Image embedders using CLIP-like models
🧾 LLM-based extractor to pull text from images
🧩 Components to build multimodal RAG pipelines and Agents

I had the chance of leading this effort with @sjrhuschlee (great collab).

📓 Below you can find two notebooks to explore the new features:
󠁯•󠁏󠁏 Introduction to Multimodal Text Generation https://haystack.deepset.ai/cookbook/multimodal_intro
󠁯•󠁏󠁏 Creating Vision+Text RAG Pipelines https://haystack.deepset.ai/tutorials/46_multimodal_rag

(🖼️ image by @bilgeyucel )

posted an update 5 months ago

Post

446

🛡️ AI Guardrails with Open Language Models - Tutorial

📓 https://haystack.deepset.ai/cookbook/safety_moderation_open_lms

How do you ensure your AI application is safe from harmful or inappropriate user inputs?

This is a core requirement for real-world AI deployments. Luckily, several open Language Models are built specifically for safety moderation.

I've been exploring them and put together a hands-on tutorial using the Haystack framework to build your own AI guardrails.

In the notebook, you'll learn how to use and customize:
🔹 Meta Llama Guard (via Hugging Face API)
🔹 IBM Granite Guardian (via Ollama), which can also evaluate RAG specific risk dimensions
🔹 Google ShieldGemma (via Ollama)
🔹 Nvidia NemoGuard models family, including a model for topic control

You'll also see how to integrate content moderation into a 🔎 RAG pipeline.

reacted to andito's post with 👀 5 months ago

Post

4056

🧠👁️ Can AI visualize solutions?

Humans often solve visual problems by sketching ideas in our minds. What if Vision-Language Models (VLMs) could do something similar, not by generating full images, but by using internal “mental sketches”?

That’s the idea behind Mirage, a new framework that empowers VLMs to reason using latent visual tokens. Instead of just thinking in words, Mirage mixes in abstract visual representations that help the model solve complex tasks.

These aren't photorealistic images. They're compact, internal representations optimized purely to support reasoning.

🔧 Mirage is trained in two phases:

1) Grounding: It learns to produce latent tokens anchored in real images.
2) Refinement: The model drops the images and learns to generate visual tokens on its own.

📈 And yes, it works!
On challenging benchmarks like Visual Spatial Planning, Jigsaw puzzles, and Spatial Attention Tasks, Mirage clearly outperforms GPT-4o and other strong baselines.
Smart sketches > empty words.

By mimicking the way humans visualize solutions, Mirage gives AI a new kind of imagination, one that’s faster, more efficient, and more human-like.
Kudos to the teams at UMass Amherst and MIT behind this exciting work.
Check the paper: Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (2506.17218)

4 replies

posted an update 6 months ago

Post

1194

🧰 Free up space on the Hub with super_squash_history 🧹

As you may know, Hugging Face Hub has storage limits on private repos (100 GB for free users, 1 TB for PROs).

This weekend I did some cleanup on my private repos
I went 1.58 TB down to 1 GB. 😅

Besides deleting old, unused models, the main tool I used was a lesser-known command:
super_squash_history.

When you train a model, you often push multiple checkpoints to the Hub.
Each checkpoint = a commit.
A 2.6B model in BF16 is ~5 GB.
So 10 checkpoints = 50 GB. That adds up fast.

While full commit history can be useful for rollbacks, it's often unnecessary for older experiments where only the final model matters.

In these cases, you can use super_squash_history: it reduces your entire repo history to a single commit.

https://huggingface.co/docs/huggingface_hub/main/en/package_reference/hf_api#huggingface_hub.HfApi.super_squash_history

⚠️ super_squash_history is a non-revertible operation. Once squashed, the commit history cannot be retrieved.

Hope this is useful to others.

2 replies

reacted to as-cle-bert's post with ❤️ 7 months ago

Post

1975

One of the biggest challenges I've been facing since I started developing [𝐏𝐝𝐟𝐈𝐭𝐃𝐨𝐰𝐧](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks🫣

That's why today I'm excited to introduce 𝐫𝐞𝐚𝐝𝐞𝐫𝐬, the new feature of PdfItDown v1.4.0!🎉

With 𝘳𝘦𝘢𝘥𝘦𝘳𝘴, you can choose among three (for now👀) flavors of text extraction and conversion to PDF:

- 𝗗𝗼𝗰𝗹𝗶𝗻𝗴, which does a fantastic work with presentations, spreadsheets and word documents🦆

- 𝗟𝗹𝗮𝗺𝗮𝗣𝗮𝗿𝘀𝗲 by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables🦙

- 𝗠𝗮𝗿𝗸𝗜𝘁𝗗𝗼𝘄𝗻 by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)✒️

You can use this new feature in your python scripts (check the attached code snippet!😉) and in the command line interface as well!🐍

Have fun and don't forget to star the repo on GitHub ➡️ https://github.com/AstraBert/PdfItDown

posted an update 8 months ago

Post

3516

𝗜 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗮 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹 𝘁𝗼 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗲𝘃𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵 𝗚𝗥𝗣𝗢! 👑 🗓️

✍️ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo

I experimented with GRPO lately.

I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.

After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game...

I wanted a different challenge, like 𝘁𝗲𝗮𝗰𝗵𝗶𝗻𝗴 𝗮 𝗺𝗼𝗱𝗲𝗹 𝘁𝗼 𝗰𝗿𝗲𝗮𝘁𝗲 𝗮 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗳𝗿𝗼𝗺 𝗮 𝗹𝗶𝘀𝘁 𝗼𝗳 𝗲𝘃𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝗽𝗿𝗶𝗼𝗿𝗶𝘁𝗶𝗲𝘀.

Choosing an original problem forced me to:
🤔 Think about the problem setting
🧬 Generate data
🤏 Choose the right base model
🏆 Design reward functions (and experiencing reward hacking)
🔄 Run multiple rounds of training, hoping that my model would learn something.

A fun and rewarding 😄 experience.

I learned a lot of things, that I want to share with you. 👇
✍️ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo
💻 Code: https://github.com/anakin87/qwen-scheduler-grpo
🤗 Hugging Face collection (dataset and model): anakin87/qwen-scheduler-grpo-680bcc583e817390525a8837

2 replies

reacted to as-cle-bert's post with ❤️ 8 months ago

Post

2958

Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?🗂️
What if I told you that you can do it within three to six lines of code?🤯
Well, with my latest open-source project, 𝐢𝐧𝐠𝐞𝐬𝐭-𝐚𝐧𝐲𝐭𝐡𝐢𝐧𝐠 (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!🚀
How? It's pretty simple!
📁 The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
📑 The PDF text is extracted using LlamaIndex readers
🦛 The text is chunked exploiting Chonkie
🧮 The chunks are embedded thanks to Sentence Transformers models
🗄️ The embeddings are loaded into a Qdrant vector database

And you're done!✅
Curious of trying it? Install it by running:

𝘱𝘪𝘱 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 𝘪𝘯𝘨𝘦𝘴𝘵-𝘢𝘯𝘺𝘵𝘩𝘪𝘯𝘨

And you can start using it in your python scripts!🐍
Don't forget to star it on GitHub and let me know if you have any feedback! ➡️ https://github.com/AstraBert/ingest-anything

5 replies

reacted to giux78's post with ❤️ 9 months ago

Post

3241

This is truly an inspirational story please help us spread the word, @clem , @thomwolf and everyone who supports open source AI.

A few weeks ago, @mmuffo94 and @cittiberto from indigo_ai launched the Chatbot Arena for the Italian language: https://indigo.ai/it/chatbot-arena-italia/.

To our surprise, among the top-ranked models is mii-llm/maestrale-chat-v0.4-beta a carefully fine-tuned version of mistralai/Mistral-7B-v0.1, developed by @efederici and @mferraretto from

mii-llm , and released nearly a year ago.

At this very moment, as shown in the screenshot, mii-llm/maestrale-chat-v0.4-beta is ranked 8th right between ChatGPT-4.5 and ChatGPT-4o.

It's likely that for several months, the best Italian speaking LLM has been an open source 7B model created by open source contributors and hardly anyone knew it.

2 replies

replied to their post 11 months ago

Ok, I understand...

In the past, I've also fine-tuned models with different licenses.
You may be interested in https://huggingface.co/anakin87/Phi-3.5-mini-ITA (MIT license).

posted an update 11 months ago

Post

1753

𝐍𝐞𝐰 𝐈𝐭𝐚𝐥𝐢𝐚𝐧 𝐒𝐦𝐚𝐥𝐥 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬: 𝐆𝐞𝐦𝐦𝐚 𝐍𝐞𝐨𝐠𝐞𝐧𝐞𝐬𝐢𝐬 𝐜𝐨𝐥𝐥𝐞𝐜𝐭𝐢𝐨𝐧 💎🌍🇮🇹

I am happy to release two new language models for the Italian Language!

💪 Gemma 2 9B Neogenesis ITA
anakin87/gemma-2-9b-neogenesis-ita
Building on the impressive work by VAGO Solutions, I applied Direct Preference Optimization with a mix of Italian and English data.
Using Spectrum, I trained 20% of model layers.

📊 Evaluated on the Open ITA LLM leaderboard ( mii-llm/open_ita_llm_leaderboard), this model achieves strong performance.
To beat it on this benchmark, you'd need a 27B model 😎

🤏 Gemma 2 2B Neogenesis ITA
anakin87/gemma-2-2b-neogenesis-ita
This smaller variant is fine-tuned from the original Gemma 2 2B it by Google.
Through a combination of Supervised Fine-Tuning and Direct Preference Optimization, I trained 25% of the layers using Spectrum.

📈 Compared to the original model, it shows improved Italian proficiency, good for its small size.

Both models were developed during the recent #gemma competition on Kaggle.
📓 Training code: https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond

🙏 Thanks @FinancialSupport and mii-llm for the help during evaluation.

3 replies

reacted to tomaarsen's post with ❤️ 11 months ago

Post

4862

🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
📜 my training scripts, using the Sentence Transformers library
📊 my Weights & Biases reports with losses & metrics
📕 my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
📏 No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
📐 Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
🪆 Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1

1 reply

posted an update 11 months ago

Post

574

Hey, it has been a while... I was busy participating in 💎 𝐆𝐞𝐦𝐦𝐚 𝐜𝐨𝐦𝐩𝐞𝐭𝐢𝐭𝐢𝐨𝐧!

Here's the idea: Gemma open models have a large vocabulary size (256K), so improving them for a specific language or cultural context should be pretty affordable - no need for continued pre-training.

My submission: 💎🌍🇮🇹 𝐍𝐞𝐨𝐠𝐞𝐧𝐞𝐬𝐢𝐬 - 𝐏𝐨𝐬𝐭-𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐆𝐞𝐦𝐦𝐚 𝐟𝐨𝐫 𝐈𝐭𝐚𝐥𝐢𝐚𝐧 𝐚𝐧𝐝 𝐛𝐞𝐲𝐨𝐧𝐝
📓 Kaggle notebook: https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond

In this notebook, I show how I improve the performance of Gemma 2 2B on Italian via Post-Training.
I believe this method is adaptable to other languages and model sizes.

𝘒𝘦𝘺 𝘚𝘵𝘦𝘱𝘴
📊 Choose reference metrics
🧑‍🔬 Data curation for Instruction Fine Tuning: identify existing datasets + generate synthetic data
🏋️‍♂️ Efficient Instruction Fine Tuning with Spectrum
🧑‍🔬 Data curation for Preference Tuning: identify existing datasets + generate synthetic data
👍👎 Efficient Direct Preference Optimization with Spectrum
📈 Evaluation

🤗 Hugging Face collection (with models and datasets): anakin87/gemma-neogenesis-67824b7bf13ac9cfe091fe2e

I'm also planning a 🎁 Gemma Giveaway (on LinkedIn - https://www.linkedin.com/in/stefano-fiorucci) in the next few days - sharing techniques, datasets, and models I used for my project... so stay tuned! 📻

Stefano Fiorucci PRO

AI & ML interests

Recent Activity

Organizations

anakin87's activity