pancho-v1-qw25-3B-UNAMGS

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct: It achieves the following results on the evaluation set:

Loss: 0.6555

Model description

Trained with MagPie:

Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
Magpie-Align/Magpie-Pro-MT-300K-v0.1

UNA on MLPs 4, 10, 16, 22, 28

MGS on 3 Scales.

Following https://arxiv.org/abs//2410.21228 facts.

License & Derivatives

Any derivative (sft, merges, etc) using ANY layer from this model MUST include either UNA or MGS or PANCHO in their model name in order to obtain a LICENSE for derivatives of this model.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 256
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
1.2127	0.0015	1	0.8711
0.9905	0.0509	35	0.7338
0.9685	0.1019	70	0.7114
0.9554	0.1528	105	0.6994
0.9077	0.2037	140	0.6915
0.9149	0.2547	175	0.6859
0.9363	0.3056	210	0.6795
0.8975	0.3566	245	0.6745
0.9095	0.4075	280	0.6709
0.9216	0.4584	315	0.6681
0.9143	0.5094	350	0.6666
0.8879	0.5603	385	0.6645
0.9194	0.6112	420	0.6625
0.9123	0.6622	455	0.6615
0.9056	0.7131	490	0.6591
0.9172	0.7641	525	0.6578
0.886	0.8150	560	0.6566
0.9155	0.8659	595	0.6568
0.9029	0.9169	630	0.6560
0.8942	0.9678	665	0.6555

Framework versions

PEFT 0.13.2
Transformers 4.45.2
Pytorch 2.3.0+cu121
Datasets 3.0.1
Tokenizers 0.20.1#

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fblgit/pancho-v1-qw25-3B-UNAMGS

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(561)

this model

Adapters

3 models

Datasets used to train fblgit/pancho-v1-qw25-3B-UNAMGS

Evaluation results

Metadata error: specify a dataset to view leaderboard