---
library_name: peft
license: other
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
- generated_from_trainer
model-index:
- name: pancho-v1-qw25-3B-UNAMGS
  results: []
datasets:
- Magpie-Align/Magpie-Pro-MT-300K-v0.1
- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
language:
- en
---

# pancho-v1-qw25-3B-UNAMGS

This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct):
It achieves the following results on the evaluation set:
- Loss: 0.6555
![pancho-v1-qw25-3B-UNAMGS](https://huggingface.co/fblgit/pancho-v1-qw25-3B-UNAMGS/resolve/main/pancho-v1-qw25-3B.png)

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)

## Model description
Trained with MagPie:
- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
- Magpie-Align/Magpie-Pro-MT-300K-v0.1

UNA on MLPs `4, 10, 16, 22, 28`

MGS on 3 Scales.

Following https://arxiv.org/abs//2410.21228 facts.

## License & Derivatives
Any derivative (sft, merges, etc) using **ANY** layer from this model **MUST** include either `UNA` or `MGS` or `PANCHO` in their model name in order to obtain a LICENSE for derivatives of this model.

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 256
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 1.2127        | 0.0015 | 1    | 0.8711          |
| 0.9905        | 0.0509 | 35   | 0.7338          |
| 0.9685        | 0.1019 | 70   | 0.7114          |
| 0.9554        | 0.1528 | 105  | 0.6994          |
| 0.9077        | 0.2037 | 140  | 0.6915          |
| 0.9149        | 0.2547 | 175  | 0.6859          |
| 0.9363        | 0.3056 | 210  | 0.6795          |
| 0.8975        | 0.3566 | 245  | 0.6745          |
| 0.9095        | 0.4075 | 280  | 0.6709          |
| 0.9216        | 0.4584 | 315  | 0.6681          |
| 0.9143        | 0.5094 | 350  | 0.6666          |
| 0.8879        | 0.5603 | 385  | 0.6645          |
| 0.9194        | 0.6112 | 420  | 0.6625          |
| 0.9123        | 0.6622 | 455  | 0.6615          |
| 0.9056        | 0.7131 | 490  | 0.6591          |
| 0.9172        | 0.7641 | 525  | 0.6578          |
| 0.886         | 0.8150 | 560  | 0.6566          |
| 0.9155        | 0.8659 | 595  | 0.6568          |
| 0.9029        | 0.9169 | 630  | 0.6560          |
| 0.8942        | 0.9678 | 665  | 0.6555          |


### Framework versions

- PEFT 0.13.2
- Transformers 4.45.2
- Pytorch 2.3.0+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1#