metadata

license: llama3.2
base_model: meta-llama/Llama-3.2-1B
tags:
  - generated_from_trainer
model-index:
  - name: quality-lr5e-06-rr0.1-epochs2-bs16-wd0.01-warmup0.05-Llama3.21B
    results: []

quality-lr5e-06-rr0.1-epochs2-bs16-wd0.01-warmup0.05-Llama3.21B

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.4583

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 1
eval_batch_size: 3
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 16
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.3438	0.1000	1372	2.4412
1.3308	0.2001	2744	2.4745
1.1389	0.3001	4116	2.4700
1.0742	0.4001	5488	2.4735
1.2025	0.5002	6860	2.4791
0.9616	0.6002	8232	2.4880
1.0427	0.7002	9604	2.4838
1.021	0.8003	10976	2.4824
0.9657	0.9003	12348	2.4816
0.9601	1.0003	13720	2.4775
0.9308	1.1004	15092	2.4743
0.9075	1.2004	16464	2.4721
0.9257	1.3004	17836	2.4684
0.9466	1.4005	19208	2.4655
1.9584	1.5005	20580	2.4628
0.8827	1.6005	21952	2.4609
0.9602	1.7006	23324	2.4596
0.9366	1.8006	24696	2.4587
0.87	1.9006	26068	2.4583

Framework versions

Transformers 4.43.3
Pytorch 2.3.1+cu118
Datasets 2.20.0
Tokenizers 0.19.1