Gemma 2 2B Neogenesis ITA

Fine-tuned version of google/gemma-2-2b-it optimized for better performance in Italian.

  • Small yet powerful model with 2.6 billion parameters
  • Supports 8k context length

GGUF quants: static - weighted/imatrix

Need a stronger model? Try gemma-2-9b-neogenesis-ita.

๐ŸŽฎ Usage

๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Try the model on Hugging Face Spaces

Text generation with Transformers

import torch
from transformers import pipeline

model_id="anakin87/gemma-2-2b-neogenesis-ita"

pipe = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [{"role": "user", "content": "Cos'รจ l'interesse composto? Spiega in maniera semplice e chiara."}]
outputs = pipe(messages, max_new_tokens=500)

print(outputs[0]["generated_text"][1]["content"])

>>> Immagina di avere 100 euro e di depositarli in un conto che ti dร  un interesse del 5% all'anno....

For more usage examples and applications, refer to the ๐Ÿ““ Kaggle notebook.

๐Ÿ† Evaluation Results

The model was submitted and evaluated in the Open Ita LLM Leaderboard, the most popular leaderboard for Italian Language Models.

Model MMLU_IT ARC_IT HELLASWAG_IT Average
google/gemma-2-2b-it 47.65 40.03 54.69 47.46
anakin87/gemma-2-2b-ita-sft (SFT checkpoint) 47.77 41.15 55.66 48.19
anakin87/gemma-2-2b-neogenesis-ita (DPO) 48.03 40.46 56.97 48.49

Qualitative evaluation across various domains is available here.

๐Ÿ”ง Training details

The model was fine-tuned using Hugging Face TRL.

The training involved Instruction Fine Tuning and Direct Preference Optimization.

I adopted a relatively new technique for parameter-efficient learning: Spectrum. The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest. Specifically, training focused on the top 25% most informative layers.

Batch size: 16; learning rate: 5e-6; epochs: 1 for SFT and 1 for DPO.

Training required about 15 hours on a single NVIDIA A6000 GPU (48GB VRAM).

For comprehensive training code and details, check out the ๐Ÿ““ Kaggle notebook.

๐Ÿ—ƒ๏ธ Training data

The model was trained primarily on Italian data, with a small portion of English data included.

For Instruction Fine Tuning:

For Direct Preference Optimization

๐Ÿ™ Thanks to the authors for providing these datasets.

Usage limitations

Although the model demonstrates solid Italian fluency and good reasoning capabilities for its small size, it is expected to have limited world knowledge due to its restricted number of parameters. This limitation can be mitigated by pairing it with techniques like Retrieval-Augmented Generation. Check out the ๐Ÿ““ Kaggle notebook for an example.

๐Ÿ›ก๏ธ Safety

While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model, as emerged in the qualitative evaluation.

Downloads last month
686
Safetensors
Model size
2.61B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for anakin87/gemma-2-2b-neogenesis-ita

Base model

google/gemma-2-2b
Finetuned
(144)
this model
Quantizations
2 models

Datasets used to train anakin87/gemma-2-2b-neogenesis-ita

Space using anakin87/gemma-2-2b-neogenesis-ita 1

Collection including anakin87/gemma-2-2b-neogenesis-ita