Mistral_Sparse_refined_web_50p_cut_pre_mlp_cut_pre_attn_2024-03-23
This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.1772
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 0
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- training_steps: 2000
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.5343 | 0.0 | 25 | 2.6759 |
2.3148 | 0.01 | 50 | 2.6092 |
2.356 | 0.01 | 75 | 2.5789 |
2.32 | 0.02 | 100 | 2.5624 |
2.3107 | 0.02 | 125 | 2.5371 |
2.3921 | 0.02 | 150 | 2.5225 |
2.3138 | 0.03 | 175 | 2.5241 |
2.3012 | 0.03 | 200 | 2.5020 |
2.3375 | 0.04 | 225 | 2.5107 |
2.3714 | 0.04 | 250 | 2.5091 |
2.2891 | 0.04 | 275 | 2.4997 |
2.2997 | 0.05 | 300 | 2.4911 |
2.3221 | 0.05 | 325 | 2.4886 |
2.1478 | 0.06 | 350 | 2.4888 |
2.3131 | 0.06 | 375 | 2.4837 |
2.3461 | 0.06 | 400 | 2.4811 |
2.3443 | 0.07 | 425 | 2.4747 |
2.3041 | 0.07 | 450 | 2.4760 |
2.2192 | 0.08 | 475 | 2.4765 |
2.2829 | 0.08 | 500 | 2.4698 |
2.347 | 0.08 | 525 | 2.4709 |
2.2503 | 0.09 | 550 | 2.4663 |
2.3858 | 0.09 | 575 | 2.4688 |
2.1889 | 0.1 | 600 | 2.4642 |
2.2762 | 0.1 | 625 | 2.4636 |
2.355 | 0.1 | 650 | 2.4612 |
2.3422 | 0.11 | 675 | 2.4697 |
2.304 | 0.11 | 700 | 2.4545 |
2.2965 | 0.12 | 725 | 2.4606 |
2.2014 | 0.12 | 750 | 2.4610 |
2.2404 | 0.12 | 775 | 2.4558 |
2.3355 | 0.13 | 800 | 2.4527 |
2.3421 | 0.13 | 825 | 2.4503 |
2.3193 | 0.14 | 850 | 2.4534 |
2.1828 | 0.14 | 875 | 2.4556 |
2.2652 | 0.14 | 900 | 2.4490 |
2.203 | 0.15 | 925 | 2.4531 |
2.3358 | 0.15 | 950 | 2.4626 |
2.2625 | 0.16 | 975 | 2.4448 |
2.3168 | 0.16 | 1000 | 2.4459 |
2.3163 | 0.16 | 1025 | 2.4438 |
2.3319 | 0.17 | 1050 | 2.4427 |
2.36 | 0.17 | 1075 | 2.4439 |
2.2884 | 0.18 | 1100 | 2.4501 |
2.3153 | 0.18 | 1125 | 2.4493 |
2.2807 | 0.18 | 1150 | 2.4386 |
2.3341 | 0.19 | 1175 | 2.4484 |
2.1909 | 0.19 | 1200 | 2.4458 |
2.2831 | 0.2 | 1225 | 2.4417 |
2.2759 | 0.2 | 1250 | 2.4472 |
2.3158 | 0.2 | 1275 | 2.4422 |
2.3413 | 0.21 | 1300 | 2.4450 |
2.3078 | 0.21 | 1325 | 2.4494 |
2.2061 | 0.22 | 1350 | 2.4451 |
2.2846 | 0.22 | 1375 | 2.4359 |
2.2929 | 0.22 | 1400 | 2.4358 |
2.2341 | 0.23 | 1425 | 2.4389 |
2.2222 | 0.23 | 1450 | 2.4452 |
2.1849 | 0.24 | 1475 | 2.4427 |
2.2468 | 0.24 | 1500 | 2.4396 |
2.1769 | 0.24 | 1525 | 2.4431 |
2.2323 | 0.25 | 1550 | 2.4403 |
2.3575 | 0.25 | 1575 | 2.4421 |
2.3032 | 0.26 | 1600 | 2.4437 |
2.2787 | 0.26 | 1625 | 2.4390 |
2.3523 | 0.26 | 1650 | 2.4374 |
2.2613 | 0.27 | 1675 | 2.4397 |
2.3048 | 0.27 | 1700 | 2.4300 |
2.3016 | 0.28 | 1725 | 2.4377 |
2.2821 | 0.28 | 1750 | 2.4394 |
2.2642 | 0.28 | 1775 | 2.4356 |
2.2181 | 0.29 | 1800 | 2.4369 |
2.2917 | 0.29 | 1825 | 2.4429 |
2.2922 | 0.3 | 1850 | 2.4364 |
2.2718 | 0.3 | 1875 | 2.4446 |
2.2961 | 0.3 | 1900 | 2.4378 |
2.3482 | 0.31 | 1925 | 2.4374 |
2.2985 | 0.31 | 1950 | 2.4369 |
2.3086 | 0.32 | 1975 | 2.4352 |
2.2412 | 0.32 | 2000 | 2.4395 |
Framework versions
- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.15.0
- Tokenizers 0.15.0
- Downloads last month
- 10
Inference API (serverless) does not yet support model repos that contain custom code.
Model tree for thrunlab/Mistral_Sparse_refined_web_50p_cut_pre_mlp_cut_pre_attn_2024-03-23
Base model
mistralai/Mistral-7B-v0.1