File size: 5,381 Bytes
ad50cdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
license: gemma
base_model: google/gemma-2-27b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9280
- Num Input Tokens Seen: 13412700

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.1282          | 0                 |
| 2.1585        | 0.0187 | 5    | 1.0516          | 253472            |
| 2.0366        | 0.0374 | 10   | 0.9878          | 506396            |
| 2.2853        | 0.0562 | 15   | 0.9800          | 760944            |
| 1.9353        | 0.0749 | 20   | 0.9748          | 1012816           |
| 1.7788        | 0.0936 | 25   | 0.9765          | 1258660           |
| 1.5677        | 0.1123 | 30   | 0.9865          | 1505980           |
| 1.6266        | 0.1310 | 35   | 0.9797          | 1748944           |
| 1.3893        | 0.1498 | 40   | 0.9770          | 1996076           |
| 1.3214        | 0.1685 | 45   | 0.9758          | 2249964           |
| 1.2104        | 0.1872 | 50   | 0.9732          | 2502428           |
| 1.1943        | 0.2059 | 55   | 0.9673          | 2758156           |
| 0.9618        | 0.2246 | 60   | 0.9648          | 3002952           |
| 0.9917        | 0.2434 | 65   | 0.9608          | 3250420           |
| 0.9458        | 0.2621 | 70   | 0.9592          | 3498588           |
| 0.8799        | 0.2808 | 75   | 0.9541          | 3753220           |
| 0.9288        | 0.2995 | 80   | 0.9547          | 4005744           |
| 0.9042        | 0.3182 | 85   | 0.9524          | 4251648           |
| 0.7466        | 0.3370 | 90   | 0.9507          | 4504748           |
| 0.802         | 0.3557 | 95   | 0.9492          | 4759604           |
| 0.786         | 0.3744 | 100  | 0.9468          | 5010224           |
| 0.8059        | 0.3931 | 105  | 0.9463          | 5261388           |
| 0.7014        | 0.4118 | 110  | 0.9448          | 5508984           |
| 0.7977        | 0.4306 | 115  | 0.9438          | 5767344           |
| 0.9226        | 0.4493 | 120  | 0.9425          | 6015220           |
| 0.9092        | 0.4680 | 125  | 0.9414          | 6270096           |
| 0.692         | 0.4867 | 130  | 0.9401          | 6522928           |
| 0.7488        | 0.5054 | 135  | 0.9394          | 6774308           |
| 0.6813        | 0.5242 | 140  | 0.9378          | 7026956           |
| 0.9565        | 0.5429 | 145  | 0.9353          | 7281764           |
| 0.7867        | 0.5616 | 150  | 0.9364          | 7535708           |
| 0.6354        | 0.5803 | 155  | 0.9373          | 7783224           |
| 0.8341        | 0.5990 | 160  | 0.9340          | 8026812           |
| 0.834         | 0.6178 | 165  | 0.9358          | 8276260           |
| 0.7364        | 0.6365 | 170  | 0.9338          | 8529636           |
| 0.7822        | 0.6552 | 175  | 0.9329          | 8787372           |
| 0.8144        | 0.6739 | 180  | 0.9337          | 9033612           |
| 0.7588        | 0.6926 | 185  | 0.9321          | 9283952           |
| 0.6757        | 0.7114 | 190  | 0.9320          | 9528272           |
| 0.5925        | 0.7301 | 195  | 0.9327          | 9775216           |
| 0.6711        | 0.7488 | 200  | 0.9321          | 10031428          |
| 0.7888        | 0.7675 | 205  | 0.9301          | 10287112          |
| 0.7551        | 0.7862 | 210  | 0.9322          | 10539552          |
| 0.7367        | 0.8050 | 215  | 0.9328          | 10786728          |
| 0.6682        | 0.8237 | 220  | 0.9318          | 11033040          |
| 0.7802        | 0.8424 | 225  | 0.9310          | 11281864          |
| 0.7423        | 0.8611 | 230  | 0.9317          | 11537232          |
| 0.8502        | 0.8798 | 235  | 0.9309          | 11791856          |
| 0.7691        | 0.8986 | 240  | 0.9283          | 12041012          |
| 0.7173        | 0.9173 | 245  | 0.9318          | 12291188          |
| 0.7158        | 0.9360 | 250  | 0.9296          | 12542864          |
| 0.7733        | 0.9547 | 255  | 0.9307          | 12794508          |
| 0.6864        | 0.9734 | 260  | 0.9298          | 13055348          |
| 0.6458        | 0.9922 | 265  | 0.9288          | 13306708          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1