File size: 2,879 Bytes
6390403 0114350 6390403 3807db7 6390403 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
library_name: peft
license: other
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
- generated_from_trainer
model-index:
- name: pancho-v1-qw25-3B-UNAMGS
results: []
datasets:
- Magpie-Align/Magpie-Pro-MT-300K-v0.1
- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
language:
- en
---
# pancho-v1-qw25-3B-UNAMGS
This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct):
It achieves the following results on the evaluation set:
- Loss: 0.6555
![pancho-v1-qw25-3B-UNAMGS](https://huggingface.co/fblgit/pancho-v1-qw25-3B-UNAMGS/resolve/main/pancho-v1-qw25-3B.png)
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
## Model description
Trained with MagPie:
- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
- Magpie-Align/Magpie-Pro-MT-300K-v0.1
UNA on MLPs `4, 10, 16, 22, 28`
MGS on 3 Scales.
Following https://arxiv.org/abs//2410.21228 facts.
## License & Derivatives
Any derivative (sft, merges, etc) using **ANY** layer from this model **MUST** include either `UNA` or `MGS` or `PANCHO` in their model name in order to obtain a LICENSE for derivatives of this model.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 256
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 1.2127 | 0.0015 | 1 | 0.8711 |
| 0.9905 | 0.0509 | 35 | 0.7338 |
| 0.9685 | 0.1019 | 70 | 0.7114 |
| 0.9554 | 0.1528 | 105 | 0.6994 |
| 0.9077 | 0.2037 | 140 | 0.6915 |
| 0.9149 | 0.2547 | 175 | 0.6859 |
| 0.9363 | 0.3056 | 210 | 0.6795 |
| 0.8975 | 0.3566 | 245 | 0.6745 |
| 0.9095 | 0.4075 | 280 | 0.6709 |
| 0.9216 | 0.4584 | 315 | 0.6681 |
| 0.9143 | 0.5094 | 350 | 0.6666 |
| 0.8879 | 0.5603 | 385 | 0.6645 |
| 0.9194 | 0.6112 | 420 | 0.6625 |
| 0.9123 | 0.6622 | 455 | 0.6615 |
| 0.9056 | 0.7131 | 490 | 0.6591 |
| 0.9172 | 0.7641 | 525 | 0.6578 |
| 0.886 | 0.8150 | 560 | 0.6566 |
| 0.9155 | 0.8659 | 595 | 0.6568 |
| 0.9029 | 0.9169 | 630 | 0.6560 |
| 0.8942 | 0.9678 | 665 | 0.6555 |
### Framework versions
- PEFT 0.13.2
- Transformers 4.45.2
- Pytorch 2.3.0+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1# |