Gemma 2 9B Neogenesis ITA

Fine-tuned version of VAGOsolutions/SauerkrautLM-gemma-2-9b-it optimized for better performance in Italian.

  • Good model with 9.24 billion parameters
  • Supports 8k context length

Need a smaller model? Try gemma-2-2b-neogenesis-ita.

๐ŸŽฎ Usage

๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Try the model on Hugging Face Spaces

Text generation with Transformers

import torch
from transformers import pipeline

model_id="anakin87/gemma-2-9b-neogenesis-ita"

pipe = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [{"role": "user", "content": "Cos'รจ l'interesse composto? Spiega in maniera semplice e chiara."}]
outputs = pipe(messages, max_new_tokens=500)

print(outputs[0]["generated_text"][1]["content"])

๐Ÿ† Evaluation Results

The model was submitted and evaluated in the Open Ita LLM Leaderboard, the most popular leaderboard for Italian Language Models.

Model MMLU_IT ARC_IT HELLASWAG_IT Average
google/gemma-2-9b-it 65.67 55.6 68.95 63.41
VAGOsolutions/SauerkrautLM-gemma-2-9b-it 65.76 61.25 72.10 66.37
anakin87/gemma-2-9b-neogenesis-ita 65.82 61.25 73.29 66.79

These results establish this model as a strong 9B model for Italian, outperforming 13-14B models and even surpassing some in the 30-70B range.

๐Ÿ”ง Training details

The model was fine-tuned using Hugging Face TRL and applying Direct Preference Optimization.

I adopted a relatively new technique for parameter-efficient learning: Spectrum. The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest. Specifically, training focused on the top 20% most informative layers.

Batch size: 16; learning rate: 1e-6; epochs: 1.

The training process took approximately 12 hours on a single NVIDIA A100 GPU (80GB VRAM).

For the training code, see the DPO section in this ๐Ÿ““ Kaggle notebook, modified to use a different base model, hyperparameters, and no on-policy data.

๐Ÿ—ƒ๏ธ Training data

The model was trained primarily on Italian data, with a small portion of English data included.

For Direct Preference Optimization

๐Ÿ™ Thanks to the authors for providing these datasets.

๐Ÿ›ก๏ธ Safety

While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model.

Downloads last month
680
Safetensors
Model size
9.24B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for anakin87/gemma-2-9b-neogenesis-ita

Finetuned
(1)
this model

Datasets used to train anakin87/gemma-2-9b-neogenesis-ita

Space using anakin87/gemma-2-9b-neogenesis-ita 1

Collection including anakin87/gemma-2-9b-neogenesis-ita