lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2
This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:
- Loss: 3.4886
- Accuracy: 0.5837
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 50.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
1.7886 | 0.9973 | 187 | 1.6901 | 0.6061 |
1.6544 | 2.0 | 375 | 1.6766 | 0.6077 |
1.5273 | 2.9973 | 562 | 1.6929 | 0.6080 |
1.3871 | 4.0 | 750 | 1.7257 | 0.6069 |
1.23 | 4.9973 | 937 | 1.7813 | 0.6061 |
1.0749 | 6.0 | 1125 | 1.8776 | 0.6018 |
0.8957 | 6.9973 | 1312 | 1.9782 | 0.5998 |
0.729 | 8.0 | 1500 | 2.0974 | 0.5966 |
0.5643 | 8.9973 | 1687 | 2.2553 | 0.5931 |
0.4538 | 10.0 | 1875 | 2.4089 | 0.5901 |
0.3563 | 10.9973 | 2062 | 2.5298 | 0.5889 |
0.2787 | 12.0 | 2250 | 2.6848 | 0.5871 |
0.2314 | 12.9973 | 2437 | 2.7943 | 0.5863 |
0.1923 | 14.0 | 2625 | 2.8624 | 0.5857 |
0.1687 | 14.9973 | 2812 | 2.9783 | 0.5848 |
0.1514 | 16.0 | 3000 | 3.0238 | 0.5850 |
0.1282 | 16.9973 | 3187 | 3.0914 | 0.5842 |
0.121 | 18.0 | 3375 | 3.1432 | 0.5848 |
0.1164 | 18.9973 | 3562 | 3.2314 | 0.5848 |
0.1103 | 20.0 | 3750 | 3.2781 | 0.5844 |
0.1077 | 20.9973 | 3937 | 3.2768 | 0.5842 |
0.1053 | 22.0 | 4125 | 3.3154 | 0.5845 |
0.1025 | 22.9973 | 4312 | 3.3168 | 0.5846 |
0.1019 | 24.0 | 4500 | 3.3672 | 0.5839 |
0.0957 | 24.9973 | 4687 | 3.3245 | 0.5843 |
0.0973 | 26.0 | 4875 | 3.3455 | 0.5846 |
0.0976 | 26.9973 | 5062 | 3.3746 | 0.5831 |
0.0956 | 28.0 | 5250 | 3.3458 | 0.5836 |
0.0963 | 28.9973 | 5437 | 3.3881 | 0.5845 |
0.0951 | 30.0 | 5625 | 3.4071 | 0.5842 |
0.0932 | 30.9973 | 5812 | 3.4574 | 0.5837 |
0.0932 | 32.0 | 6000 | 3.4498 | 0.5841 |
0.0876 | 32.9973 | 6187 | 3.4677 | 0.5830 |
0.0888 | 34.0 | 6375 | 3.4690 | 0.5835 |
0.0887 | 34.9973 | 6562 | 3.4481 | 0.5831 |
0.0883 | 36.0 | 6750 | 3.4745 | 0.5839 |
0.0893 | 36.9973 | 6937 | 3.4574 | 0.5831 |
0.0903 | 38.0 | 7125 | 3.4798 | 0.5838 |
0.0902 | 38.9973 | 7312 | 3.4863 | 0.5838 |
0.0896 | 40.0 | 7500 | 3.4676 | 0.5839 |
0.0841 | 40.9973 | 7687 | 3.5157 | 0.5837 |
0.0844 | 42.0 | 7875 | 3.5171 | 0.5833 |
0.0838 | 42.9973 | 8062 | 3.5576 | 0.5831 |
0.0854 | 44.0 | 8250 | 3.5440 | 0.5838 |
0.085 | 44.9973 | 8437 | 3.4777 | 0.5842 |
0.0863 | 46.0 | 8625 | 3.4933 | 0.5832 |
0.0875 | 46.9973 | 8812 | 3.5282 | 0.5841 |
0.087 | 48.0 | 9000 | 3.5321 | 0.5830 |
0.0832 | 48.9973 | 9187 | 3.5294 | 0.5836 |
0.0826 | 49.8667 | 9350 | 3.4886 | 0.5837 |
Framework versions
- PEFT 0.5.0
- Transformers 4.41.1
- Pytorch 2.1.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 146
Model tree for tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2
Base model
Qwen/Qwen1.5-4BDataset used to train tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2
Evaluation results
- Accuracy on tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3self-reported0.584