File size: 17,934 Bytes
26ccd30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---
tags:
- generated_from_trainer
datasets:
- roneneldan/TinyStories
metrics:
- accuracy
model-index:
- name: output_main
  results:
  - task:
      name: Causal Language Modeling
      type: text-generation
    dataset:
      name: roneneldan/TinyStories
      type: roneneldan/TinyStories
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.5791389432485323
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# output_main

This model is a fine-tuned version of [roneneldan/TinyStories-1Layer-21M](https://huggingface.co/roneneldan/TinyStories-1Layer-21M) on the roneneldan/TinyStories dataset.
It achieves the following results on the evaluation set:
- Loss: 1.6604
- Accuracy: 0.5791
- Multicode K: 1
- Dead Code Fraction/layer0: 0.1982
- Mse/layer0: 6073.8637
- Input Norm/layer0: 0.7182
- Output Norm/layer0: 76.7891

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 96
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- training_steps: 100000

### Training results

| Training Loss | Epoch | Step   | Validation Loss | Accuracy | Multicode K | Dead Code Fraction/layer0 | Mse/layer0 | Input Norm/layer0 | Output Norm/layer0 |
|:-------------:|:-----:|:------:|:---------------:|:--------:|:-----------:|:-------------------------:|:----------:|:-----------------:|:------------------:|
| 2.2319        | 0.1   | 1000   | 1.9134          | 0.5317   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.8521        | 0.21  | 2000   | 1.7990          | 0.5495   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7879        | 0.31  | 3000   | 1.7739          | 0.5557   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7728        | 0.42  | 4000   | 1.7666          | 0.5564   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7686        | 0.52  | 5000   | 1.7609          | 0.5595   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7635        | 0.63  | 6000   | 1.7555          | 0.5598   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7523        | 0.73  | 7000   | 1.7383          | 0.5632   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7471        | 0.83  | 8000   | 1.7368          | 0.5643   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7404        | 0.94  | 9000   | 1.7277          | 0.5659   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.728         | 1.04  | 10000  | 1.7290          | 0.5647   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7195        | 1.15  | 11000  | 1.7244          | 0.5667   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7198        | 1.25  | 12000  | 1.7230          | 0.5671   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7171        | 1.36  | 13000  | 1.7177          | 0.5689   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7185        | 1.46  | 14000  | 1.7150          | 0.5688   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7149        | 1.56  | 15000  | 1.7125          | 0.5695   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7105        | 1.67  | 16000  | 1.7097          | 0.5695   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7107        | 1.77  | 17000  | 1.7073          | 0.5689   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7113        | 1.88  | 18000  | 1.7025          | 0.5712   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.7078        | 1.98  | 19000  | 1.7048          | 0.5702   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.693         | 2.09  | 20000  | 1.7045          | 0.5696   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6935        | 2.19  | 21000  | 1.7068          | 0.5695   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6962        | 2.29  | 22000  | 1.7046          | 0.5687   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6954        | 2.4   | 23000  | 1.7019          | 0.5706   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6933        | 2.5   | 24000  | 1.7002          | 0.5725   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6942        | 2.61  | 25000  | 1.6983          | 0.5717   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6935        | 2.71  | 26000  | 1.6938          | 0.5730   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6928        | 2.82  | 27000  | 1.6978          | 0.5719   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6927        | 2.92  | 28000  | 1.6935          | 0.5715   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6855        | 3.02  | 29000  | 1.6978          | 0.5726   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6773        | 3.13  | 30000  | 1.6951          | 0.5732   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6788        | 3.23  | 31000  | 1.6926          | 0.5728   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6813        | 3.34  | 32000  | 1.6920          | 0.5726   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6782        | 3.44  | 33000  | 1.6926          | 0.5733   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6801        | 3.55  | 34000  | 1.6894          | 0.5719   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6796        | 3.65  | 35000  | 1.6890          | 0.5728   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6768        | 3.75  | 36000  | 1.6882          | 0.5722   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6802        | 3.86  | 37000  | 1.6872          | 0.5732   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6809        | 3.96  | 38000  | 1.6855          | 0.5750   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6701        | 4.07  | 39000  | 1.6886          | 0.5742   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6646        | 4.17  | 40000  | 1.6890          | 0.5734   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.669         | 4.28  | 41000  | 1.6859          | 0.5747   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6713        | 4.38  | 42000  | 1.6867          | 0.5740   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6693        | 4.48  | 43000  | 1.6821          | 0.5750   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6693        | 4.59  | 44000  | 1.6822          | 0.5747   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6692        | 4.69  | 45000  | 1.6801          | 0.5745   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6703        | 4.8   | 46000  | 1.6834          | 0.5761   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6677        | 4.9   | 47000  | 1.6819          | 0.5756   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6682        | 5.01  | 48000  | 1.6778          | 0.5752   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6547        | 5.11  | 49000  | 1.6825          | 0.5751   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6566        | 5.21  | 50000  | 1.6825          | 0.5758   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6605        | 5.32  | 51000  | 1.6814          | 0.5746   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6603        | 5.42  | 52000  | 1.6768          | 0.5755   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6595        | 5.53  | 53000  | 1.6757          | 0.5753   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6603        | 5.63  | 54000  | 1.6769          | 0.5738   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.662         | 5.74  | 55000  | 1.6758          | 0.5759   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6602        | 5.84  | 56000  | 1.6771          | 0.5757   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6624        | 5.94  | 57000  | 1.6749          | 0.5770   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6527        | 6.05  | 58000  | 1.6791          | 0.5758   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6474        | 6.15  | 59000  | 1.6763          | 0.5773   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6494        | 6.26  | 60000  | 1.6765          | 0.5761   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6539        | 6.36  | 61000  | 1.6741          | 0.5764   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6539        | 6.47  | 62000  | 1.6752          | 0.5768   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6529        | 6.57  | 63000  | 1.6737          | 0.5775   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6533        | 6.67  | 64000  | 1.6725          | 0.5758   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.653         | 6.78  | 65000  | 1.6722          | 0.5774   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6522        | 6.88  | 66000  | 1.6726          | 0.5762   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6528        | 6.99  | 67000  | 1.6726          | 0.5768   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6439        | 7.09  | 68000  | 1.6728          | 0.5771   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6403        | 7.19  | 69000  | 1.6703          | 0.5758   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6447        | 7.3   | 70000  | 1.6697          | 0.5772   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6458        | 7.4   | 71000  | 1.6694          | 0.5777   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6447        | 7.51  | 72000  | 1.6716          | 0.5771   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6449        | 7.61  | 73000  | 1.6680          | 0.5779   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6458        | 7.72  | 74000  | 1.6683          | 0.5779   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6447        | 7.82  | 75000  | 1.6681          | 0.5778   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6451        | 7.92  | 76000  | 1.6677          | 0.5781   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6418        | 8.03  | 77000  | 1.6665          | 0.5789   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6361        | 8.13  | 78000  | 1.6684          | 0.5779   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.636         | 8.24  | 79000  | 1.6687          | 0.5786   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6357        | 8.34  | 80000  | 1.6670          | 0.5790   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6379        | 8.45  | 81000  | 1.6658          | 0.5788   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6405        | 8.55  | 82000  | 1.6661          | 0.5788   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6378        | 8.65  | 83000  | 1.6650          | 0.5789   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6386        | 8.76  | 84000  | 1.6650          | 0.5784   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.638         | 8.86  | 85000  | 1.6644          | 0.5785   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6374        | 8.97  | 86000  | 1.6635          | 0.5777   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6298        | 9.07  | 87000  | 1.6647          | 0.5785   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6302        | 9.18  | 88000  | 1.6649          | 0.5787   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6315        | 9.28  | 89000  | 1.6651          | 0.5782   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.631         | 9.38  | 90000  | 1.6636          | 0.5788   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6316        | 9.49  | 91000  | 1.6627          | 0.5782   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6286        | 9.59  | 92000  | 1.6646          | 0.5783   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6304        | 9.7   | 93000  | 1.6632          | 0.5801   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6298        | 9.8   | 94000  | 1.6623          | 0.5800   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6309        | 9.91  | 95000  | 1.6620          | 0.5800   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6302        | 10.01 | 96000  | 1.6602          | 0.5801   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6242        | 10.11 | 97000  | 1.6610          | 0.5786   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6258        | 10.22 | 98000  | 1.6605          | 0.5795   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6234        | 10.32 | 99000  | 1.6605          | 0.5791   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
| 1.6245        | 10.43 | 100000 | 1.6604          | 0.5791   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |


### Framework versions

- Transformers 4.29.2
- Pytorch 2.0.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3