lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_3e-5_lora2
This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:
- Loss: 3.5443
- Accuracy: 0.5730
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 50.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
1.8545 | 0.9973 | 187 | 1.7123 | 0.6037 |
1.7063 | 2.0 | 375 | 1.6945 | 0.6059 |
1.6702 | 2.9973 | 562 | 1.6851 | 0.6066 |
1.6356 | 4.0 | 750 | 1.6826 | 0.6075 |
1.5775 | 4.9973 | 937 | 1.6911 | 0.6071 |
1.529 | 6.0 | 1125 | 1.7067 | 0.6069 |
1.457 | 6.9973 | 1312 | 1.7279 | 0.6056 |
1.3907 | 8.0 | 1500 | 1.7512 | 0.6046 |
1.3309 | 8.9973 | 1687 | 1.7774 | 0.6025 |
1.2841 | 10.0 | 1875 | 1.8043 | 0.6013 |
1.2308 | 10.9973 | 2062 | 1.8528 | 0.6001 |
1.1722 | 12.0 | 2250 | 1.8851 | 0.5988 |
1.1354 | 12.9973 | 2437 | 1.9114 | 0.5980 |
1.0793 | 14.0 | 2625 | 1.9585 | 0.5961 |
1.037 | 14.9973 | 2812 | 1.9967 | 0.5948 |
0.9901 | 16.0 | 3000 | 2.0336 | 0.5934 |
0.9316 | 16.9973 | 3187 | 2.0880 | 0.5914 |
0.8802 | 18.0 | 3375 | 2.1440 | 0.5901 |
0.8382 | 18.9973 | 3562 | 2.1715 | 0.5893 |
0.7962 | 20.0 | 3750 | 2.2237 | 0.5879 |
0.7553 | 20.9973 | 3937 | 2.2957 | 0.5861 |
0.7238 | 22.0 | 4125 | 2.3312 | 0.5851 |
0.676 | 22.9973 | 4312 | 2.4043 | 0.5832 |
0.644 | 24.0 | 4500 | 2.4440 | 0.5824 |
0.5939 | 24.9973 | 4687 | 2.5127 | 0.5818 |
0.5551 | 26.0 | 4875 | 2.5390 | 0.5810 |
0.5163 | 26.9973 | 5062 | 2.5809 | 0.5798 |
0.4892 | 28.0 | 5250 | 2.6670 | 0.5789 |
0.4669 | 28.9973 | 5437 | 2.6695 | 0.5786 |
0.4353 | 30.0 | 5625 | 2.7646 | 0.5787 |
0.4104 | 30.9973 | 5812 | 2.8291 | 0.5775 |
0.3885 | 32.0 | 6000 | 2.8933 | 0.5764 |
0.342 | 32.9973 | 6187 | 2.9434 | 0.5756 |
0.3213 | 34.0 | 6375 | 2.9346 | 0.5756 |
0.3065 | 34.9973 | 6562 | 3.0082 | 0.5758 |
0.2842 | 36.0 | 6750 | 3.0947 | 0.5739 |
0.2695 | 36.9973 | 6937 | 3.0905 | 0.5752 |
0.2541 | 38.0 | 7125 | 3.1831 | 0.5738 |
0.2411 | 38.9973 | 7312 | 3.2135 | 0.5740 |
0.228 | 40.0 | 7500 | 3.2505 | 0.5739 |
0.2067 | 40.9973 | 7687 | 3.2867 | 0.5743 |
0.1952 | 42.0 | 7875 | 3.3047 | 0.5751 |
0.1886 | 42.9973 | 8062 | 3.3528 | 0.5742 |
0.1828 | 44.0 | 8250 | 3.4431 | 0.5730 |
0.1743 | 44.9973 | 8437 | 3.4166 | 0.5727 |
0.1691 | 46.0 | 8625 | 3.4326 | 0.5739 |
0.1633 | 46.9973 | 8812 | 3.4555 | 0.5728 |
0.156 | 48.0 | 9000 | 3.4876 | 0.5729 |
0.1441 | 48.9973 | 9187 | 3.5368 | 0.5727 |
0.1401 | 49.8667 | 9350 | 3.5443 | 0.5730 |
Framework versions
- PEFT 0.5.0
- Transformers 4.41.1
- Pytorch 2.1.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 81
Model tree for tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_3e-5_lora2
Base model
Qwen/Qwen1.5-4BDataset used to train tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_3e-5_lora2
Evaluation results
- Accuracy on tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3self-reported0.573