Meta-Llama-3-8B-Instruct-mirage-all-teacher-instruct-llama-3-sft

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the nthakur/mirage-gpt-4o-sft-instruct-llama-3 and the nthakur/mirage-meta-llama-3-mistral-sft-instruct-meta-llama-tokenizer datasets. It achieves the following results on the evaluation set:

Loss: 0.2593

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 16
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
0.3535	0.0412	200	0.3586
0.4117	0.0824	400	0.3371
0.3577	0.1236	600	0.3277
0.3594	0.1649	800	0.3194
0.3603	0.2061	1000	0.3096
0.3633	0.2473	1200	0.3063
0.3078	0.2885	1400	0.3000
0.3274	0.3297	1600	0.2948
0.3474	0.3709	1800	0.2925
0.3401	0.4122	2000	0.2875
0.3124	0.4534	2200	0.2839
0.3095	0.4946	2400	0.2802
0.3532	0.5358	2600	0.2775
0.301	0.5770	2800	0.2757
0.3204	0.6182	3000	0.2712
0.3158	0.6595	3200	0.2687
0.3032	0.7007	3400	0.2667
0.2851	0.7419	3600	0.2645
0.2903	0.7831	3800	0.2629
0.2943	0.8243	4000	0.2613
0.2787	0.8655	4200	0.2603
0.2558	0.9067	4400	0.2596
0.3107	0.9480	4600	0.2593
0.2894	0.9892	4800	0.2593

Framework versions

PEFT 0.10.0
Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

nthakur
/

Meta-Llama-3-8B-Instruct-mirage-all-teacher-instruct-llama-3-sft

Meta-Llama-3-8B-Instruct-mirage-all-teacher-instruct-llama-3-sft

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for nthakur/Meta-Llama-3-8B-Instruct-mirage-all-teacher-instruct-llama-3-sft

Datasets used to train nthakur/Meta-Llama-3-8B-Instruct-mirage-all-teacher-instruct-llama-3-sft

Evaluation results