Edit model card

lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4886
  • Accuracy: 0.5837

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.7886 0.9973 187 1.6901 0.6061
1.6544 2.0 375 1.6766 0.6077
1.5273 2.9973 562 1.6929 0.6080
1.3871 4.0 750 1.7257 0.6069
1.23 4.9973 937 1.7813 0.6061
1.0749 6.0 1125 1.8776 0.6018
0.8957 6.9973 1312 1.9782 0.5998
0.729 8.0 1500 2.0974 0.5966
0.5643 8.9973 1687 2.2553 0.5931
0.4538 10.0 1875 2.4089 0.5901
0.3563 10.9973 2062 2.5298 0.5889
0.2787 12.0 2250 2.6848 0.5871
0.2314 12.9973 2437 2.7943 0.5863
0.1923 14.0 2625 2.8624 0.5857
0.1687 14.9973 2812 2.9783 0.5848
0.1514 16.0 3000 3.0238 0.5850
0.1282 16.9973 3187 3.0914 0.5842
0.121 18.0 3375 3.1432 0.5848
0.1164 18.9973 3562 3.2314 0.5848
0.1103 20.0 3750 3.2781 0.5844
0.1077 20.9973 3937 3.2768 0.5842
0.1053 22.0 4125 3.3154 0.5845
0.1025 22.9973 4312 3.3168 0.5846
0.1019 24.0 4500 3.3672 0.5839
0.0957 24.9973 4687 3.3245 0.5843
0.0973 26.0 4875 3.3455 0.5846
0.0976 26.9973 5062 3.3746 0.5831
0.0956 28.0 5250 3.3458 0.5836
0.0963 28.9973 5437 3.3881 0.5845
0.0951 30.0 5625 3.4071 0.5842
0.0932 30.9973 5812 3.4574 0.5837
0.0932 32.0 6000 3.4498 0.5841
0.0876 32.9973 6187 3.4677 0.5830
0.0888 34.0 6375 3.4690 0.5835
0.0887 34.9973 6562 3.4481 0.5831
0.0883 36.0 6750 3.4745 0.5839
0.0893 36.9973 6937 3.4574 0.5831
0.0903 38.0 7125 3.4798 0.5838
0.0902 38.9973 7312 3.4863 0.5838
0.0896 40.0 7500 3.4676 0.5839
0.0841 40.9973 7687 3.5157 0.5837
0.0844 42.0 7875 3.5171 0.5833
0.0838 42.9973 8062 3.5576 0.5831
0.0854 44.0 8250 3.5440 0.5838
0.085 44.9973 8437 3.4777 0.5842
0.0863 46.0 8625 3.4933 0.5832
0.0875 46.9973 8812 3.5282 0.5841
0.087 48.0 9000 3.5321 0.5830
0.0832 48.9973 9187 3.5294 0.5836
0.0826 49.8667 9350 3.4886 0.5837

Framework versions

  • PEFT 0.5.0
  • Transformers 4.41.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
146
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

Base model

Qwen/Qwen1.5-4B
Adapter
(268)
this model

Dataset used to train tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

Evaluation results

  • Accuracy on tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3
    self-reported
    0.584