martimfasantos commited on
Commit
22a4cee
1 Parent(s): 0cd229f

Model save

Browse files
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: martimfasantos/tinyllama-1.1b-sum-sft-full
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs
17
+
18
+ This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.6500
21
+ - Rewards/chosen: -1.0595
22
+ - Rewards/rejected: -1.2334
23
+ - Rewards/accuracies: 0.6046
24
+ - Rewards/margins: 0.1739
25
+ - Logps/rejected: -186.0905
26
+ - Logps/chosen: -164.9614
27
+ - Logits/rejected: -2.3429
28
+ - Logits/chosen: -2.3549
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 1e-07
48
+ - train_batch_size: 8
49
+ - eval_batch_size: 8
50
+ - seed: 42
51
+ - distributed_type: multi-GPU
52
+ - gradient_accumulation_steps: 2
53
+ - total_train_batch_size: 16
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: cosine
56
+ - lr_scheduler_warmup_ratio: 0.1
57
+ - num_epochs: 3
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.693 | 0.0689 | 400 | 0.6931 | 0.0003 | 0.0002 | 0.5112 | 0.0001 | -62.7270 | -58.9858 | -2.9691 | -2.9727 |
64
+ | 0.6923 | 0.1378 | 800 | 0.6926 | 0.0024 | 0.0012 | 0.5493 | 0.0011 | -62.6258 | -58.7797 | -2.9667 | -2.9701 |
65
+ | 0.6901 | 0.2068 | 1200 | 0.6907 | -0.0080 | -0.0133 | 0.5697 | 0.0053 | -64.0827 | -59.8146 | -2.9579 | -2.9613 |
66
+ | 0.6835 | 0.2757 | 1600 | 0.6880 | -0.0321 | -0.0436 | 0.5764 | 0.0114 | -67.1050 | -62.2266 | -2.9410 | -2.9442 |
67
+ | 0.6865 | 0.3446 | 2000 | 0.6852 | -0.0690 | -0.0874 | 0.5713 | 0.0184 | -71.4878 | -65.9158 | -2.9158 | -2.9192 |
68
+ | 0.6767 | 0.4135 | 2400 | 0.6817 | -0.1086 | -0.1352 | 0.5816 | 0.0265 | -76.2651 | -69.8803 | -2.8906 | -2.8938 |
69
+ | 0.6726 | 0.4824 | 2800 | 0.6792 | -0.1614 | -0.1943 | 0.5767 | 0.0328 | -82.1753 | -75.1597 | -2.8617 | -2.8651 |
70
+ | 0.6643 | 0.5513 | 3200 | 0.6729 | -0.2581 | -0.3074 | 0.5948 | 0.0493 | -93.4915 | -84.8225 | -2.8387 | -2.8420 |
71
+ | 0.6614 | 0.6203 | 3600 | 0.6740 | -0.2589 | -0.3059 | 0.5904 | 0.0470 | -93.3416 | -84.9094 | -2.8113 | -2.8144 |
72
+ | 0.6609 | 0.6892 | 4000 | 0.6696 | -0.3009 | -0.3603 | 0.6053 | 0.0594 | -98.7785 | -89.1073 | -2.7879 | -2.7912 |
73
+ | 0.6562 | 0.7581 | 4400 | 0.6667 | -0.4072 | -0.4790 | 0.5983 | 0.0718 | -110.6499 | -99.7330 | -2.7515 | -2.7548 |
74
+ | 0.6569 | 0.8270 | 4800 | 0.6637 | -0.4951 | -0.5782 | 0.6059 | 0.0831 | -120.5742 | -108.5273 | -2.7283 | -2.7316 |
75
+ | 0.6383 | 0.8959 | 5200 | 0.6621 | -0.5180 | -0.6112 | 0.6055 | 0.0932 | -123.8654 | -110.8119 | -2.7112 | -2.7149 |
76
+ | 0.6411 | 0.9649 | 5600 | 0.6623 | -0.5228 | -0.6134 | 0.6055 | 0.0906 | -124.0929 | -111.2965 | -2.6869 | -2.6910 |
77
+ | 0.6293 | 1.0338 | 6000 | 0.6618 | -0.6210 | -0.7260 | 0.6064 | 0.1049 | -135.3463 | -121.1192 | -2.6526 | -2.6573 |
78
+ | 0.6247 | 1.1027 | 6400 | 0.6587 | -0.7088 | -0.8268 | 0.5990 | 0.1180 | -145.4310 | -129.8984 | -2.6201 | -2.6254 |
79
+ | 0.6194 | 1.1716 | 6800 | 0.6580 | -0.7955 | -0.9191 | 0.5980 | 0.1236 | -154.6599 | -138.5692 | -2.5858 | -2.5912 |
80
+ | 0.6127 | 1.2405 | 7200 | 0.6558 | -0.6612 | -0.7815 | 0.6039 | 0.1203 | -140.8955 | -125.1357 | -2.5822 | -2.5877 |
81
+ | 0.6531 | 1.3094 | 7600 | 0.6534 | -0.7460 | -0.8804 | 0.6041 | 0.1344 | -150.7862 | -133.6133 | -2.5502 | -2.5564 |
82
+ | 0.5995 | 1.3784 | 8000 | 0.6528 | -0.8128 | -0.9555 | 0.6006 | 0.1427 | -158.2948 | -140.2942 | -2.5195 | -2.5267 |
83
+ | 0.61 | 1.4473 | 8400 | 0.6540 | -0.7310 | -0.8603 | 0.5980 | 0.1293 | -148.7821 | -132.1185 | -2.5198 | -2.5268 |
84
+ | 0.6575 | 1.5162 | 8800 | 0.6527 | -0.8369 | -0.9764 | 0.5997 | 0.1395 | -160.3900 | -142.7025 | -2.4947 | -2.5022 |
85
+ | 0.5969 | 1.5851 | 9200 | 0.6516 | -0.8922 | -1.0366 | 0.6101 | 0.1444 | -166.4089 | -148.2315 | -2.4661 | -2.4746 |
86
+ | 0.6211 | 1.6540 | 9600 | 0.6526 | -0.7875 | -0.9248 | 0.6094 | 0.1373 | -155.2340 | -137.7698 | -2.4725 | -2.4804 |
87
+ | 0.6011 | 1.7229 | 10000 | 0.6517 | -0.8912 | -1.0379 | 0.6099 | 0.1467 | -166.5410 | -148.1359 | -2.4396 | -2.4489 |
88
+ | 0.571 | 1.7919 | 10400 | 0.6514 | -0.8234 | -0.9653 | 0.6122 | 0.1419 | -159.2782 | -141.3557 | -2.4401 | -2.4489 |
89
+ | 0.5889 | 1.8608 | 10800 | 0.6506 | -1.0172 | -1.1751 | 0.6055 | 0.1579 | -180.2568 | -160.7332 | -2.3932 | -2.4039 |
90
+ | 0.5685 | 1.9297 | 11200 | 0.6486 | -1.0256 | -1.1907 | 0.5992 | 0.1651 | -181.8200 | -161.5783 | -2.3887 | -2.3992 |
91
+ | 0.63 | 1.9986 | 11600 | 0.6502 | -0.8869 | -1.0380 | 0.6004 | 0.1511 | -166.5461 | -147.7054 | -2.4012 | -2.4108 |
92
+ | 0.5891 | 2.0675 | 12000 | 0.6490 | -1.0453 | -1.2122 | 0.6046 | 0.1670 | -183.9714 | -163.5418 | -2.3713 | -2.3825 |
93
+ | 0.5808 | 2.1365 | 12400 | 0.6490 | -1.1906 | -1.3718 | 0.6039 | 0.1811 | -199.9255 | -178.0778 | -2.3382 | -2.3508 |
94
+ | 0.6051 | 2.2054 | 12800 | 0.6496 | -1.0959 | -1.2648 | 0.6053 | 0.1689 | -189.2301 | -168.6040 | -2.3542 | -2.3658 |
95
+ | 0.6223 | 2.2743 | 13200 | 0.6502 | -1.0865 | -1.2588 | 0.6069 | 0.1723 | -188.6267 | -167.6660 | -2.3460 | -2.3579 |
96
+ | 0.6245 | 2.3432 | 13600 | 0.6506 | -1.0806 | -1.2530 | 0.5983 | 0.1724 | -188.0497 | -167.0715 | -2.3462 | -2.3583 |
97
+ | 0.5716 | 2.4121 | 14000 | 0.6511 | -1.0306 | -1.1979 | 0.5941 | 0.1672 | -182.5368 | -162.0786 | -2.3533 | -2.3651 |
98
+ | 0.6078 | 2.4810 | 14400 | 0.6506 | -1.0889 | -1.2642 | 0.6004 | 0.1753 | -189.1684 | -167.9059 | -2.3417 | -2.3540 |
99
+ | 0.6112 | 2.5500 | 14800 | 0.6500 | -1.1067 | -1.2865 | 0.5971 | 0.1798 | -191.4036 | -169.6898 | -2.3390 | -2.3514 |
100
+ | 0.5773 | 2.6189 | 15200 | 0.6508 | -1.0435 | -1.2146 | 0.6025 | 0.1712 | -184.2123 | -163.3605 | -2.3468 | -2.3588 |
101
+ | 0.5983 | 2.6878 | 15600 | 0.6505 | -1.0660 | -1.2397 | 0.6018 | 0.1737 | -186.7185 | -165.6157 | -2.3419 | -2.3540 |
102
+ | 0.5983 | 2.7567 | 16000 | 0.6501 | -1.0707 | -1.2465 | 0.6029 | 0.1758 | -187.3989 | -166.0839 | -2.3408 | -2.3530 |
103
+ | 0.5956 | 2.8256 | 16400 | 0.6500 | -1.0594 | -1.2333 | 0.6008 | 0.1739 | -186.0803 | -164.9520 | -2.3429 | -2.3550 |
104
+ | 0.6221 | 2.8946 | 16800 | 0.6499 | -1.0592 | -1.2333 | 0.6041 | 0.1742 | -186.0846 | -164.9336 | -2.3430 | -2.3551 |
105
+ | 0.6096 | 2.9635 | 17200 | 0.6500 | -1.0595 | -1.2334 | 0.6046 | 0.1739 | -186.0905 | -164.9614 | -2.3429 | -2.3549 |
106
+
107
+
108
+ ### Framework versions
109
+
110
+ - Transformers 4.41.2
111
+ - Pytorch 2.1.2
112
+ - Datasets 2.19.2
113
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6300815686244623,
5
+ "train_runtime": 86182.1351,
6
+ "train_samples": 92858,
7
+ "train_samples_per_second": 3.232,
8
+ "train_steps_per_second": 0.202
9
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "max_length": 2048,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.41.2"
7
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aa57c7c8384a0fda26a32ffa6ab2dc372735c0403430e5e2f4247f6e669b957d
3
  size 4400216536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5edd20500f6c430b66eb8abf51dc51b2cc3d69b8d736b42cb1c78ac3a8ec7b67
3
  size 4400216536
runs/Jun09_13-29-09_poseidon/events.out.tfevents.1717940102.poseidon.4028099.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:26bb1860391b96ac2b9a29f7108ecfd875c08e954ee14d0569f57ca1f7ffe41e
3
- size 1221875
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce3c5d845c221dbb9bb21b0633704235c6ca092851f291a78f49a78a21492e96
3
+ size 1236935
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6300815686244623,
5
+ "train_runtime": 86182.1351,
6
+ "train_samples": 92858,
7
+ "train_samples_per_second": 3.232,
8
+ "train_steps_per_second": 0.202
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff