SADATO commited on
Commit
1d0a8d6
1 Parent(s): f4bd5e8

Upload 11 files

Browse files
README.md ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: scb10x/typhoon-7b
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+
201
+
202
+ ### Framework versions
203
+
204
+ - PEFT 0.8.1
adapter_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "scb10x/typhoon-7b",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layers_pattern": null,
10
+ "layers_to_transform": null,
11
+ "loftq_config": {},
12
+ "lora_alpha": 16,
13
+ "lora_dropout": 0.05,
14
+ "megatron_config": null,
15
+ "megatron_core": "megatron.core",
16
+ "modules_to_save": null,
17
+ "peft_type": "LORA",
18
+ "r": 32,
19
+ "rank_pattern": {},
20
+ "revision": "unsloth",
21
+ "target_modules": [
22
+ "q_proj",
23
+ "o_proj",
24
+ "down_proj",
25
+ "v_proj",
26
+ "up_proj",
27
+ "gate_proj",
28
+ "k_proj"
29
+ ],
30
+ "task_type": "CAUSAL_LM",
31
+ "use_rslora": false
32
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c5eda2700ed5db819c2e8ca2fca8548439234379258be436a4349335c69c6c1
3
+ size 335604696
all_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "train_loss": 0.43939053083042673,
4
+ "train_runtime": 38556.6993,
5
+ "train_samples_per_second": 11.673,
6
+ "train_steps_per_second": 0.365
7
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba0260fe22b9efe79df479f2619890767ab9c44912142f21648f1980c32297ed
3
+ size 562945
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": true,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [],
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "legacy": true,
35
+ "model_max_length": 32768,
36
+ "pad_token": "</s>",
37
+ "padding_side": "right",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": true
43
+ }
train_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "train_loss": 0.43939053083042673,
4
+ "train_runtime": 38556.6993,
5
+ "train_samples_per_second": 11.673,
6
+ "train_steps_per_second": 0.365
7
+ }
trainer_state.json ADDED
@@ -0,0 +1,4328 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.43305763602256775,
3
+ "best_model_checkpoint": "model/E5/typhoon_E5_typhoon_shuffle_augment_gpt4/checkpoint-8439",
4
+ "epoch": 9.996445076430856,
5
+ "eval_steps": 500,
6
+ "global_step": 14060,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.01,
13
+ "learning_rate": 2.777777777777778e-05,
14
+ "loss": 3.0727,
15
+ "step": 20
16
+ },
17
+ {
18
+ "epoch": 0.03,
19
+ "learning_rate": 4.999998996341642e-05,
20
+ "loss": 2.1876,
21
+ "step": 40
22
+ },
23
+ {
24
+ "epoch": 0.04,
25
+ "learning_rate": 4.999963868383706e-05,
26
+ "loss": 1.9128,
27
+ "step": 60
28
+ },
29
+ {
30
+ "epoch": 0.06,
31
+ "learning_rate": 4.9998785583137e-05,
32
+ "loss": 1.8314,
33
+ "step": 80
34
+ },
35
+ {
36
+ "epoch": 0.07,
37
+ "learning_rate": 4.999743067844064e-05,
38
+ "loss": 1.773,
39
+ "step": 100
40
+ },
41
+ {
42
+ "epoch": 0.09,
43
+ "learning_rate": 4.999557399694518e-05,
44
+ "loss": 1.7195,
45
+ "step": 120
46
+ },
47
+ {
48
+ "epoch": 0.1,
49
+ "learning_rate": 4.9993215575920024e-05,
50
+ "loss": 1.6424,
51
+ "step": 140
52
+ },
53
+ {
54
+ "epoch": 0.11,
55
+ "learning_rate": 4.999035546270608e-05,
56
+ "loss": 1.6174,
57
+ "step": 160
58
+ },
59
+ {
60
+ "epoch": 0.13,
61
+ "learning_rate": 4.998699371471479e-05,
62
+ "loss": 1.5565,
63
+ "step": 180
64
+ },
65
+ {
66
+ "epoch": 0.14,
67
+ "learning_rate": 4.9983130399426966e-05,
68
+ "loss": 1.5504,
69
+ "step": 200
70
+ },
71
+ {
72
+ "epoch": 0.16,
73
+ "learning_rate": 4.9978765594391474e-05,
74
+ "loss": 1.5254,
75
+ "step": 220
76
+ },
77
+ {
78
+ "epoch": 0.17,
79
+ "learning_rate": 4.9973899387223616e-05,
80
+ "loss": 1.4849,
81
+ "step": 240
82
+ },
83
+ {
84
+ "epoch": 0.18,
85
+ "learning_rate": 4.996853187560343e-05,
86
+ "loss": 1.472,
87
+ "step": 260
88
+ },
89
+ {
90
+ "epoch": 0.2,
91
+ "learning_rate": 4.996266316727371e-05,
92
+ "loss": 1.4301,
93
+ "step": 280
94
+ },
95
+ {
96
+ "epoch": 0.21,
97
+ "learning_rate": 4.995629338003782e-05,
98
+ "loss": 1.4209,
99
+ "step": 300
100
+ },
101
+ {
102
+ "epoch": 0.23,
103
+ "learning_rate": 4.994942264175737e-05,
104
+ "loss": 1.4046,
105
+ "step": 320
106
+ },
107
+ {
108
+ "epoch": 0.24,
109
+ "learning_rate": 4.9942051090349606e-05,
110
+ "loss": 1.4021,
111
+ "step": 340
112
+ },
113
+ {
114
+ "epoch": 0.26,
115
+ "learning_rate": 4.9934178873784674e-05,
116
+ "loss": 1.3303,
117
+ "step": 360
118
+ },
119
+ {
120
+ "epoch": 0.27,
121
+ "learning_rate": 4.992580615008264e-05,
122
+ "loss": 1.3696,
123
+ "step": 380
124
+ },
125
+ {
126
+ "epoch": 0.28,
127
+ "learning_rate": 4.991693308731033e-05,
128
+ "loss": 1.3401,
129
+ "step": 400
130
+ },
131
+ {
132
+ "epoch": 0.3,
133
+ "learning_rate": 4.990755986357791e-05,
134
+ "loss": 1.332,
135
+ "step": 420
136
+ },
137
+ {
138
+ "epoch": 0.31,
139
+ "learning_rate": 4.989768666703538e-05,
140
+ "loss": 1.2977,
141
+ "step": 440
142
+ },
143
+ {
144
+ "epoch": 0.33,
145
+ "learning_rate": 4.988731369586874e-05,
146
+ "loss": 1.3081,
147
+ "step": 460
148
+ },
149
+ {
150
+ "epoch": 0.34,
151
+ "learning_rate": 4.987644115829604e-05,
152
+ "loss": 1.2828,
153
+ "step": 480
154
+ },
155
+ {
156
+ "epoch": 0.36,
157
+ "learning_rate": 4.9865069272563195e-05,
158
+ "loss": 1.2835,
159
+ "step": 500
160
+ },
161
+ {
162
+ "epoch": 0.37,
163
+ "learning_rate": 4.98531982669396e-05,
164
+ "loss": 1.2671,
165
+ "step": 520
166
+ },
167
+ {
168
+ "epoch": 0.38,
169
+ "learning_rate": 4.9840828379713556e-05,
170
+ "loss": 1.2453,
171
+ "step": 540
172
+ },
173
+ {
174
+ "epoch": 0.4,
175
+ "learning_rate": 4.9827959859187476e-05,
176
+ "loss": 1.2327,
177
+ "step": 560
178
+ },
179
+ {
180
+ "epoch": 0.41,
181
+ "learning_rate": 4.9814592963672915e-05,
182
+ "loss": 1.2441,
183
+ "step": 580
184
+ },
185
+ {
186
+ "epoch": 0.43,
187
+ "learning_rate": 4.980072796148535e-05,
188
+ "loss": 1.2119,
189
+ "step": 600
190
+ },
191
+ {
192
+ "epoch": 0.44,
193
+ "learning_rate": 4.978636513093887e-05,
194
+ "loss": 1.217,
195
+ "step": 620
196
+ },
197
+ {
198
+ "epoch": 0.46,
199
+ "learning_rate": 4.9771504760340494e-05,
200
+ "loss": 1.1747,
201
+ "step": 640
202
+ },
203
+ {
204
+ "epoch": 0.47,
205
+ "learning_rate": 4.975614714798445e-05,
206
+ "loss": 1.1939,
207
+ "step": 660
208
+ },
209
+ {
210
+ "epoch": 0.48,
211
+ "learning_rate": 4.9740292602146154e-05,
212
+ "loss": 1.1603,
213
+ "step": 680
214
+ },
215
+ {
216
+ "epoch": 0.5,
217
+ "learning_rate": 4.972394144107606e-05,
218
+ "loss": 1.1204,
219
+ "step": 700
220
+ },
221
+ {
222
+ "epoch": 0.51,
223
+ "learning_rate": 4.970709399299322e-05,
224
+ "loss": 1.1431,
225
+ "step": 720
226
+ },
227
+ {
228
+ "epoch": 0.53,
229
+ "learning_rate": 4.968975059607874e-05,
230
+ "loss": 1.1524,
231
+ "step": 740
232
+ },
233
+ {
234
+ "epoch": 0.54,
235
+ "learning_rate": 4.967191159846896e-05,
236
+ "loss": 1.1309,
237
+ "step": 760
238
+ },
239
+ {
240
+ "epoch": 0.55,
241
+ "learning_rate": 4.9653577358248484e-05,
242
+ "loss": 1.1314,
243
+ "step": 780
244
+ },
245
+ {
246
+ "epoch": 0.57,
247
+ "learning_rate": 4.9634748243442994e-05,
248
+ "loss": 1.0861,
249
+ "step": 800
250
+ },
251
+ {
252
+ "epoch": 0.58,
253
+ "learning_rate": 4.9615424632011857e-05,
254
+ "loss": 1.1026,
255
+ "step": 820
256
+ },
257
+ {
258
+ "epoch": 0.6,
259
+ "learning_rate": 4.959560691184052e-05,
260
+ "loss": 1.108,
261
+ "step": 840
262
+ },
263
+ {
264
+ "epoch": 0.61,
265
+ "learning_rate": 4.957529548073276e-05,
266
+ "loss": 1.0604,
267
+ "step": 860
268
+ },
269
+ {
270
+ "epoch": 0.63,
271
+ "learning_rate": 4.9554490746402696e-05,
272
+ "loss": 1.1033,
273
+ "step": 880
274
+ },
275
+ {
276
+ "epoch": 0.64,
277
+ "learning_rate": 4.953319312646653e-05,
278
+ "loss": 1.0724,
279
+ "step": 900
280
+ },
281
+ {
282
+ "epoch": 0.65,
283
+ "learning_rate": 4.951140304843428e-05,
284
+ "loss": 1.0607,
285
+ "step": 920
286
+ },
287
+ {
288
+ "epoch": 0.67,
289
+ "learning_rate": 4.948912094970113e-05,
290
+ "loss": 1.0467,
291
+ "step": 940
292
+ },
293
+ {
294
+ "epoch": 0.68,
295
+ "learning_rate": 4.946634727753864e-05,
296
+ "loss": 1.0388,
297
+ "step": 960
298
+ },
299
+ {
300
+ "epoch": 0.7,
301
+ "learning_rate": 4.9443082489085814e-05,
302
+ "loss": 0.9867,
303
+ "step": 980
304
+ },
305
+ {
306
+ "epoch": 0.71,
307
+ "learning_rate": 4.9419327051339883e-05,
308
+ "loss": 1.0129,
309
+ "step": 1000
310
+ },
311
+ {
312
+ "epoch": 0.73,
313
+ "learning_rate": 4.939508144114696e-05,
314
+ "loss": 1.0029,
315
+ "step": 1020
316
+ },
317
+ {
318
+ "epoch": 0.74,
319
+ "learning_rate": 4.937034614519245e-05,
320
+ "loss": 1.0039,
321
+ "step": 1040
322
+ },
323
+ {
324
+ "epoch": 0.75,
325
+ "learning_rate": 4.934512165999128e-05,
326
+ "loss": 0.9735,
327
+ "step": 1060
328
+ },
329
+ {
330
+ "epoch": 0.77,
331
+ "learning_rate": 4.931940849187795e-05,
332
+ "loss": 0.9852,
333
+ "step": 1080
334
+ },
335
+ {
336
+ "epoch": 0.78,
337
+ "learning_rate": 4.9293207156996354e-05,
338
+ "loss": 0.9656,
339
+ "step": 1100
340
+ },
341
+ {
342
+ "epoch": 0.8,
343
+ "learning_rate": 4.9266518181289414e-05,
344
+ "loss": 1.0178,
345
+ "step": 1120
346
+ },
347
+ {
348
+ "epoch": 0.81,
349
+ "learning_rate": 4.923934210048856e-05,
350
+ "loss": 0.966,
351
+ "step": 1140
352
+ },
353
+ {
354
+ "epoch": 0.82,
355
+ "learning_rate": 4.921167946010291e-05,
356
+ "loss": 0.9721,
357
+ "step": 1160
358
+ },
359
+ {
360
+ "epoch": 0.84,
361
+ "learning_rate": 4.9183530815408386e-05,
362
+ "loss": 0.9451,
363
+ "step": 1180
364
+ },
365
+ {
366
+ "epoch": 0.85,
367
+ "learning_rate": 4.9154896731436526e-05,
368
+ "loss": 1.0005,
369
+ "step": 1200
370
+ },
371
+ {
372
+ "epoch": 0.87,
373
+ "learning_rate": 4.9125777782963165e-05,
374
+ "loss": 0.9435,
375
+ "step": 1220
376
+ },
377
+ {
378
+ "epoch": 0.88,
379
+ "learning_rate": 4.909617455449689e-05,
380
+ "loss": 0.9306,
381
+ "step": 1240
382
+ },
383
+ {
384
+ "epoch": 0.9,
385
+ "learning_rate": 4.906608764026729e-05,
386
+ "loss": 0.9644,
387
+ "step": 1260
388
+ },
389
+ {
390
+ "epoch": 0.91,
391
+ "learning_rate": 4.903551764421307e-05,
392
+ "loss": 0.9203,
393
+ "step": 1280
394
+ },
395
+ {
396
+ "epoch": 0.92,
397
+ "learning_rate": 4.900446517996987e-05,
398
+ "loss": 0.9102,
399
+ "step": 1300
400
+ },
401
+ {
402
+ "epoch": 0.94,
403
+ "learning_rate": 4.8972930870857994e-05,
404
+ "loss": 0.9134,
405
+ "step": 1320
406
+ },
407
+ {
408
+ "epoch": 0.95,
409
+ "learning_rate": 4.89409153498699e-05,
410
+ "loss": 0.8876,
411
+ "step": 1340
412
+ },
413
+ {
414
+ "epoch": 0.97,
415
+ "learning_rate": 4.890841925965744e-05,
416
+ "loss": 0.9273,
417
+ "step": 1360
418
+ },
419
+ {
420
+ "epoch": 0.98,
421
+ "learning_rate": 4.8875443252519035e-05,
422
+ "loss": 0.9117,
423
+ "step": 1380
424
+ },
425
+ {
426
+ "epoch": 1.0,
427
+ "learning_rate": 4.884198799038652e-05,
428
+ "loss": 0.8953,
429
+ "step": 1400
430
+ },
431
+ {
432
+ "epoch": 1.0,
433
+ "eval_loss": 0.8869273066520691,
434
+ "eval_runtime": 171.3717,
435
+ "eval_samples_per_second": 32.462,
436
+ "eval_steps_per_second": 8.117,
437
+ "step": 1406
438
+ },
439
+ {
440
+ "epoch": 1.01,
441
+ "learning_rate": 4.880805414481189e-05,
442
+ "loss": 0.8234,
443
+ "step": 1420
444
+ },
445
+ {
446
+ "epoch": 1.02,
447
+ "learning_rate": 4.8773642396953796e-05,
448
+ "loss": 0.7706,
449
+ "step": 1440
450
+ },
451
+ {
452
+ "epoch": 1.04,
453
+ "learning_rate": 4.87387534375639e-05,
454
+ "loss": 0.8079,
455
+ "step": 1460
456
+ },
457
+ {
458
+ "epoch": 1.05,
459
+ "learning_rate": 4.8703387966973e-05,
460
+ "loss": 0.8082,
461
+ "step": 1480
462
+ },
463
+ {
464
+ "epoch": 1.07,
465
+ "learning_rate": 4.866754669507696e-05,
466
+ "loss": 0.7857,
467
+ "step": 1500
468
+ },
469
+ {
470
+ "epoch": 1.08,
471
+ "learning_rate": 4.8631230341322455e-05,
472
+ "loss": 0.7698,
473
+ "step": 1520
474
+ },
475
+ {
476
+ "epoch": 1.09,
477
+ "learning_rate": 4.859443963469256e-05,
478
+ "loss": 0.7786,
479
+ "step": 1540
480
+ },
481
+ {
482
+ "epoch": 1.11,
483
+ "learning_rate": 4.855717531369208e-05,
484
+ "loss": 0.7453,
485
+ "step": 1560
486
+ },
487
+ {
488
+ "epoch": 1.12,
489
+ "learning_rate": 4.851943812633279e-05,
490
+ "loss": 0.7654,
491
+ "step": 1580
492
+ },
493
+ {
494
+ "epoch": 1.14,
495
+ "learning_rate": 4.848122883011832e-05,
496
+ "loss": 0.7666,
497
+ "step": 1600
498
+ },
499
+ {
500
+ "epoch": 1.15,
501
+ "learning_rate": 4.844254819202904e-05,
502
+ "loss": 0.7849,
503
+ "step": 1620
504
+ },
505
+ {
506
+ "epoch": 1.17,
507
+ "learning_rate": 4.840339698850661e-05,
508
+ "loss": 0.7768,
509
+ "step": 1640
510
+ },
511
+ {
512
+ "epoch": 1.18,
513
+ "learning_rate": 4.836377600543842e-05,
514
+ "loss": 0.7652,
515
+ "step": 1660
516
+ },
517
+ {
518
+ "epoch": 1.19,
519
+ "learning_rate": 4.832368603814182e-05,
520
+ "loss": 0.7788,
521
+ "step": 1680
522
+ },
523
+ {
524
+ "epoch": 1.21,
525
+ "learning_rate": 4.8283127891348124e-05,
526
+ "loss": 0.7398,
527
+ "step": 1700
528
+ },
529
+ {
530
+ "epoch": 1.22,
531
+ "learning_rate": 4.824210237918649e-05,
532
+ "loss": 0.7389,
533
+ "step": 1720
534
+ },
535
+ {
536
+ "epoch": 1.24,
537
+ "learning_rate": 4.820061032516756e-05,
538
+ "loss": 0.74,
539
+ "step": 1740
540
+ },
541
+ {
542
+ "epoch": 1.25,
543
+ "learning_rate": 4.815865256216693e-05,
544
+ "loss": 0.7435,
545
+ "step": 1760
546
+ },
547
+ {
548
+ "epoch": 1.27,
549
+ "learning_rate": 4.811622993240844e-05,
550
+ "loss": 0.7257,
551
+ "step": 1780
552
+ },
553
+ {
554
+ "epoch": 1.28,
555
+ "learning_rate": 4.807334328744726e-05,
556
+ "loss": 0.7655,
557
+ "step": 1800
558
+ },
559
+ {
560
+ "epoch": 1.29,
561
+ "learning_rate": 4.8029993488152806e-05,
562
+ "loss": 0.7437,
563
+ "step": 1820
564
+ },
565
+ {
566
+ "epoch": 1.31,
567
+ "learning_rate": 4.798618140469143e-05,
568
+ "loss": 0.7188,
569
+ "step": 1840
570
+ },
571
+ {
572
+ "epoch": 1.32,
573
+ "learning_rate": 4.794190791650903e-05,
574
+ "loss": 0.7187,
575
+ "step": 1860
576
+ },
577
+ {
578
+ "epoch": 1.34,
579
+ "learning_rate": 4.789717391231328e-05,
580
+ "loss": 0.7358,
581
+ "step": 1880
582
+ },
583
+ {
584
+ "epoch": 1.35,
585
+ "learning_rate": 4.7851980290055896e-05,
586
+ "loss": 0.6868,
587
+ "step": 1900
588
+ },
589
+ {
590
+ "epoch": 1.37,
591
+ "learning_rate": 4.7806327956914544e-05,
592
+ "loss": 0.6775,
593
+ "step": 1920
594
+ },
595
+ {
596
+ "epoch": 1.38,
597
+ "learning_rate": 4.7760217829274675e-05,
598
+ "loss": 0.6768,
599
+ "step": 1940
600
+ },
601
+ {
602
+ "epoch": 1.39,
603
+ "learning_rate": 4.771365083271112e-05,
604
+ "loss": 0.7081,
605
+ "step": 1960
606
+ },
607
+ {
608
+ "epoch": 1.41,
609
+ "learning_rate": 4.7666627901969454e-05,
610
+ "loss": 0.6925,
611
+ "step": 1980
612
+ },
613
+ {
614
+ "epoch": 1.42,
615
+ "learning_rate": 4.761914998094732e-05,
616
+ "loss": 0.6841,
617
+ "step": 2000
618
+ },
619
+ {
620
+ "epoch": 1.44,
621
+ "learning_rate": 4.7571218022675443e-05,
622
+ "loss": 0.6839,
623
+ "step": 2020
624
+ },
625
+ {
626
+ "epoch": 1.45,
627
+ "learning_rate": 4.7522832989298486e-05,
628
+ "loss": 0.6912,
629
+ "step": 2040
630
+ },
631
+ {
632
+ "epoch": 1.46,
633
+ "learning_rate": 4.747399585205575e-05,
634
+ "loss": 0.6909,
635
+ "step": 2060
636
+ },
637
+ {
638
+ "epoch": 1.48,
639
+ "learning_rate": 4.7424707591261685e-05,
640
+ "loss": 0.6693,
641
+ "step": 2080
642
+ },
643
+ {
644
+ "epoch": 1.49,
645
+ "learning_rate": 4.737496919628619e-05,
646
+ "loss": 0.6909,
647
+ "step": 2100
648
+ },
649
+ {
650
+ "epoch": 1.51,
651
+ "learning_rate": 4.732478166553479e-05,
652
+ "loss": 0.6276,
653
+ "step": 2120
654
+ },
655
+ {
656
+ "epoch": 1.52,
657
+ "learning_rate": 4.727414600642857e-05,
658
+ "loss": 0.6702,
659
+ "step": 2140
660
+ },
661
+ {
662
+ "epoch": 1.54,
663
+ "learning_rate": 4.722306323538392e-05,
664
+ "loss": 0.6458,
665
+ "step": 2160
666
+ },
667
+ {
668
+ "epoch": 1.55,
669
+ "learning_rate": 4.717153437779221e-05,
670
+ "loss": 0.6527,
671
+ "step": 2180
672
+ },
673
+ {
674
+ "epoch": 1.56,
675
+ "learning_rate": 4.711956046799917e-05,
676
+ "loss": 0.6522,
677
+ "step": 2200
678
+ },
679
+ {
680
+ "epoch": 1.58,
681
+ "learning_rate": 4.7067142549284085e-05,
682
+ "loss": 0.6553,
683
+ "step": 2220
684
+ },
685
+ {
686
+ "epoch": 1.59,
687
+ "learning_rate": 4.7014281673838904e-05,
688
+ "loss": 0.6554,
689
+ "step": 2240
690
+ },
691
+ {
692
+ "epoch": 1.61,
693
+ "learning_rate": 4.6960978902747135e-05,
694
+ "loss": 0.6392,
695
+ "step": 2260
696
+ },
697
+ {
698
+ "epoch": 1.62,
699
+ "learning_rate": 4.6907235305962476e-05,
700
+ "loss": 0.6207,
701
+ "step": 2280
702
+ },
703
+ {
704
+ "epoch": 1.64,
705
+ "learning_rate": 4.6853051962287405e-05,
706
+ "loss": 0.6147,
707
+ "step": 2300
708
+ },
709
+ {
710
+ "epoch": 1.65,
711
+ "learning_rate": 4.679842995935149e-05,
712
+ "loss": 0.6345,
713
+ "step": 2320
714
+ },
715
+ {
716
+ "epoch": 1.66,
717
+ "learning_rate": 4.674337039358957e-05,
718
+ "loss": 0.6585,
719
+ "step": 2340
720
+ },
721
+ {
722
+ "epoch": 1.68,
723
+ "learning_rate": 4.668787437021973e-05,
724
+ "loss": 0.6293,
725
+ "step": 2360
726
+ },
727
+ {
728
+ "epoch": 1.69,
729
+ "learning_rate": 4.6631943003221145e-05,
730
+ "loss": 0.6337,
731
+ "step": 2380
732
+ },
733
+ {
734
+ "epoch": 1.71,
735
+ "learning_rate": 4.6575577415311684e-05,
736
+ "loss": 0.6347,
737
+ "step": 2400
738
+ },
739
+ {
740
+ "epoch": 1.72,
741
+ "learning_rate": 4.6518778737925406e-05,
742
+ "loss": 0.6265,
743
+ "step": 2420
744
+ },
745
+ {
746
+ "epoch": 1.73,
747
+ "learning_rate": 4.646154811118982e-05,
748
+ "loss": 0.6139,
749
+ "step": 2440
750
+ },
751
+ {
752
+ "epoch": 1.75,
753
+ "learning_rate": 4.640388668390302e-05,
754
+ "loss": 0.6191,
755
+ "step": 2460
756
+ },
757
+ {
758
+ "epoch": 1.76,
759
+ "learning_rate": 4.6345795613510625e-05,
760
+ "loss": 0.6035,
761
+ "step": 2480
762
+ },
763
+ {
764
+ "epoch": 1.78,
765
+ "learning_rate": 4.6287276066082516e-05,
766
+ "loss": 0.6031,
767
+ "step": 2500
768
+ },
769
+ {
770
+ "epoch": 1.79,
771
+ "learning_rate": 4.6228329216289475e-05,
772
+ "loss": 0.6121,
773
+ "step": 2520
774
+ },
775
+ {
776
+ "epoch": 1.81,
777
+ "learning_rate": 4.616895624737957e-05,
778
+ "loss": 0.5995,
779
+ "step": 2540
780
+ },
781
+ {
782
+ "epoch": 1.82,
783
+ "learning_rate": 4.6109158351154416e-05,
784
+ "loss": 0.5869,
785
+ "step": 2560
786
+ },
787
+ {
788
+ "epoch": 1.83,
789
+ "learning_rate": 4.6048936727945255e-05,
790
+ "loss": 0.5725,
791
+ "step": 2580
792
+ },
793
+ {
794
+ "epoch": 1.85,
795
+ "learning_rate": 4.598829258658885e-05,
796
+ "loss": 0.5962,
797
+ "step": 2600
798
+ },
799
+ {
800
+ "epoch": 1.86,
801
+ "learning_rate": 4.592722714440324e-05,
802
+ "loss": 0.599,
803
+ "step": 2620
804
+ },
805
+ {
806
+ "epoch": 1.88,
807
+ "learning_rate": 4.586574162716328e-05,
808
+ "loss": 0.604,
809
+ "step": 2640
810
+ },
811
+ {
812
+ "epoch": 1.89,
813
+ "learning_rate": 4.5803837269076073e-05,
814
+ "loss": 0.5809,
815
+ "step": 2660
816
+ },
817
+ {
818
+ "epoch": 1.91,
819
+ "learning_rate": 4.5741515312756125e-05,
820
+ "loss": 0.5661,
821
+ "step": 2680
822
+ },
823
+ {
824
+ "epoch": 1.92,
825
+ "learning_rate": 4.567877700920049e-05,
826
+ "loss": 0.5885,
827
+ "step": 2700
828
+ },
829
+ {
830
+ "epoch": 1.93,
831
+ "learning_rate": 4.5615623617763606e-05,
832
+ "loss": 0.5612,
833
+ "step": 2720
834
+ },
835
+ {
836
+ "epoch": 1.95,
837
+ "learning_rate": 4.5552056406132003e-05,
838
+ "loss": 0.5735,
839
+ "step": 2740
840
+ },
841
+ {
842
+ "epoch": 1.96,
843
+ "learning_rate": 4.548807665029892e-05,
844
+ "loss": 0.5808,
845
+ "step": 2760
846
+ },
847
+ {
848
+ "epoch": 1.98,
849
+ "learning_rate": 4.542368563453861e-05,
850
+ "loss": 0.5607,
851
+ "step": 2780
852
+ },
853
+ {
854
+ "epoch": 1.99,
855
+ "learning_rate": 4.535888465138063e-05,
856
+ "loss": 0.5605,
857
+ "step": 2800
858
+ },
859
+ {
860
+ "epoch": 2.0,
861
+ "eval_loss": 0.6370431780815125,
862
+ "eval_runtime": 170.9185,
863
+ "eval_samples_per_second": 32.548,
864
+ "eval_steps_per_second": 8.138,
865
+ "step": 2813
866
+ },
867
+ {
868
+ "epoch": 2.0,
869
+ "learning_rate": 4.529367500158386e-05,
870
+ "loss": 0.5473,
871
+ "step": 2820
872
+ },
873
+ {
874
+ "epoch": 2.02,
875
+ "learning_rate": 4.522805799411039e-05,
876
+ "loss": 0.503,
877
+ "step": 2840
878
+ },
879
+ {
880
+ "epoch": 2.03,
881
+ "learning_rate": 4.5162034946099277e-05,
882
+ "loss": 0.5132,
883
+ "step": 2860
884
+ },
885
+ {
886
+ "epoch": 2.05,
887
+ "learning_rate": 4.509560718284007e-05,
888
+ "loss": 0.5009,
889
+ "step": 2880
890
+ },
891
+ {
892
+ "epoch": 2.06,
893
+ "learning_rate": 4.502877603774622e-05,
894
+ "loss": 0.5176,
895
+ "step": 2900
896
+ },
897
+ {
898
+ "epoch": 2.08,
899
+ "learning_rate": 4.496154285232833e-05,
900
+ "loss": 0.5032,
901
+ "step": 2920
902
+ },
903
+ {
904
+ "epoch": 2.09,
905
+ "learning_rate": 4.489390897616719e-05,
906
+ "loss": 0.5074,
907
+ "step": 2940
908
+ },
909
+ {
910
+ "epoch": 2.1,
911
+ "learning_rate": 4.482587576688673e-05,
912
+ "loss": 0.4971,
913
+ "step": 2960
914
+ },
915
+ {
916
+ "epoch": 2.12,
917
+ "learning_rate": 4.4757444590126736e-05,
918
+ "loss": 0.4858,
919
+ "step": 2980
920
+ },
921
+ {
922
+ "epoch": 2.13,
923
+ "learning_rate": 4.4688616819515464e-05,
924
+ "loss": 0.5065,
925
+ "step": 3000
926
+ },
927
+ {
928
+ "epoch": 2.15,
929
+ "learning_rate": 4.461939383664202e-05,
930
+ "loss": 0.5034,
931
+ "step": 3020
932
+ },
933
+ {
934
+ "epoch": 2.16,
935
+ "learning_rate": 4.45497770310287e-05,
936
+ "loss": 0.5047,
937
+ "step": 3040
938
+ },
939
+ {
940
+ "epoch": 2.18,
941
+ "learning_rate": 4.4479767800103036e-05,
942
+ "loss": 0.5026,
943
+ "step": 3060
944
+ },
945
+ {
946
+ "epoch": 2.19,
947
+ "learning_rate": 4.4409367549169764e-05,
948
+ "loss": 0.5083,
949
+ "step": 3080
950
+ },
951
+ {
952
+ "epoch": 2.2,
953
+ "learning_rate": 4.433857769138261e-05,
954
+ "loss": 0.4818,
955
+ "step": 3100
956
+ },
957
+ {
958
+ "epoch": 2.22,
959
+ "learning_rate": 4.426739964771595e-05,
960
+ "loss": 0.4851,
961
+ "step": 3120
962
+ },
963
+ {
964
+ "epoch": 2.23,
965
+ "learning_rate": 4.4195834846936264e-05,
966
+ "loss": 0.5099,
967
+ "step": 3140
968
+ },
969
+ {
970
+ "epoch": 2.25,
971
+ "learning_rate": 4.4123884725573446e-05,
972
+ "loss": 0.4986,
973
+ "step": 3160
974
+ },
975
+ {
976
+ "epoch": 2.26,
977
+ "learning_rate": 4.4051550727892e-05,
978
+ "loss": 0.4804,
979
+ "step": 3180
980
+ },
981
+ {
982
+ "epoch": 2.28,
983
+ "learning_rate": 4.3978834305862004e-05,
984
+ "loss": 0.4846,
985
+ "step": 3200
986
+ },
987
+ {
988
+ "epoch": 2.29,
989
+ "learning_rate": 4.3905736919130034e-05,
990
+ "loss": 0.4855,
991
+ "step": 3220
992
+ },
993
+ {
994
+ "epoch": 2.3,
995
+ "learning_rate": 4.383226003498978e-05,
996
+ "loss": 0.4869,
997
+ "step": 3240
998
+ },
999
+ {
1000
+ "epoch": 2.32,
1001
+ "learning_rate": 4.375840512835266e-05,
1002
+ "loss": 0.4896,
1003
+ "step": 3260
1004
+ },
1005
+ {
1006
+ "epoch": 2.33,
1007
+ "learning_rate": 4.368417368171819e-05,
1008
+ "loss": 0.4845,
1009
+ "step": 3280
1010
+ },
1011
+ {
1012
+ "epoch": 2.35,
1013
+ "learning_rate": 4.3609567185144184e-05,
1014
+ "loss": 0.4822,
1015
+ "step": 3300
1016
+ },
1017
+ {
1018
+ "epoch": 2.36,
1019
+ "learning_rate": 4.3534587136216944e-05,
1020
+ "loss": 0.491,
1021
+ "step": 3320
1022
+ },
1023
+ {
1024
+ "epoch": 2.37,
1025
+ "learning_rate": 4.345923504002111e-05,
1026
+ "loss": 0.4819,
1027
+ "step": 3340
1028
+ },
1029
+ {
1030
+ "epoch": 2.39,
1031
+ "learning_rate": 4.338351240910945e-05,
1032
+ "loss": 0.4907,
1033
+ "step": 3360
1034
+ },
1035
+ {
1036
+ "epoch": 2.4,
1037
+ "learning_rate": 4.330742076347258e-05,
1038
+ "loss": 0.4777,
1039
+ "step": 3380
1040
+ },
1041
+ {
1042
+ "epoch": 2.42,
1043
+ "learning_rate": 4.3230961630508354e-05,
1044
+ "loss": 0.4864,
1045
+ "step": 3400
1046
+ },
1047
+ {
1048
+ "epoch": 2.43,
1049
+ "learning_rate": 4.315413654499128e-05,
1050
+ "loss": 0.4826,
1051
+ "step": 3420
1052
+ },
1053
+ {
1054
+ "epoch": 2.45,
1055
+ "learning_rate": 4.307694704904165e-05,
1056
+ "loss": 0.4728,
1057
+ "step": 3440
1058
+ },
1059
+ {
1060
+ "epoch": 2.46,
1061
+ "learning_rate": 4.299939469209463e-05,
1062
+ "loss": 0.4664,
1063
+ "step": 3460
1064
+ },
1065
+ {
1066
+ "epoch": 2.47,
1067
+ "learning_rate": 4.292148103086917e-05,
1068
+ "loss": 0.4747,
1069
+ "step": 3480
1070
+ },
1071
+ {
1072
+ "epoch": 2.49,
1073
+ "learning_rate": 4.2843207629336694e-05,
1074
+ "loss": 0.474,
1075
+ "step": 3500
1076
+ },
1077
+ {
1078
+ "epoch": 2.5,
1079
+ "learning_rate": 4.2764576058689735e-05,
1080
+ "loss": 0.4791,
1081
+ "step": 3520
1082
+ },
1083
+ {
1084
+ "epoch": 2.52,
1085
+ "learning_rate": 4.268558789731044e-05,
1086
+ "loss": 0.4768,
1087
+ "step": 3540
1088
+ },
1089
+ {
1090
+ "epoch": 2.53,
1091
+ "learning_rate": 4.260624473073883e-05,
1092
+ "loss": 0.4741,
1093
+ "step": 3560
1094
+ },
1095
+ {
1096
+ "epoch": 2.55,
1097
+ "learning_rate": 4.2526548151640986e-05,
1098
+ "loss": 0.4573,
1099
+ "step": 3580
1100
+ },
1101
+ {
1102
+ "epoch": 2.56,
1103
+ "learning_rate": 4.24464997597771e-05,
1104
+ "loss": 0.4798,
1105
+ "step": 3600
1106
+ },
1107
+ {
1108
+ "epoch": 2.57,
1109
+ "learning_rate": 4.236610116196934e-05,
1110
+ "loss": 0.4589,
1111
+ "step": 3620
1112
+ },
1113
+ {
1114
+ "epoch": 2.59,
1115
+ "learning_rate": 4.228535397206962e-05,
1116
+ "loss": 0.4737,
1117
+ "step": 3640
1118
+ },
1119
+ {
1120
+ "epoch": 2.6,
1121
+ "learning_rate": 4.220425981092716e-05,
1122
+ "loss": 0.4638,
1123
+ "step": 3660
1124
+ },
1125
+ {
1126
+ "epoch": 2.62,
1127
+ "learning_rate": 4.212282030635601e-05,
1128
+ "loss": 0.4621,
1129
+ "step": 3680
1130
+ },
1131
+ {
1132
+ "epoch": 2.63,
1133
+ "learning_rate": 4.204103709310234e-05,
1134
+ "loss": 0.4658,
1135
+ "step": 3700
1136
+ },
1137
+ {
1138
+ "epoch": 2.64,
1139
+ "learning_rate": 4.195891181281161e-05,
1140
+ "loss": 0.4637,
1141
+ "step": 3720
1142
+ },
1143
+ {
1144
+ "epoch": 2.66,
1145
+ "learning_rate": 4.187644611399566e-05,
1146
+ "loss": 0.4731,
1147
+ "step": 3740
1148
+ },
1149
+ {
1150
+ "epoch": 2.67,
1151
+ "learning_rate": 4.17936416519996e-05,
1152
+ "loss": 0.4719,
1153
+ "step": 3760
1154
+ },
1155
+ {
1156
+ "epoch": 2.69,
1157
+ "learning_rate": 4.171050008896855e-05,
1158
+ "loss": 0.4825,
1159
+ "step": 3780
1160
+ },
1161
+ {
1162
+ "epoch": 2.7,
1163
+ "learning_rate": 4.162702309381434e-05,
1164
+ "loss": 0.4491,
1165
+ "step": 3800
1166
+ },
1167
+ {
1168
+ "epoch": 2.72,
1169
+ "learning_rate": 4.1543212342181956e-05,
1170
+ "loss": 0.467,
1171
+ "step": 3820
1172
+ },
1173
+ {
1174
+ "epoch": 2.73,
1175
+ "learning_rate": 4.1459069516415916e-05,
1176
+ "loss": 0.4716,
1177
+ "step": 3840
1178
+ },
1179
+ {
1180
+ "epoch": 2.74,
1181
+ "learning_rate": 4.137459630552652e-05,
1182
+ "loss": 0.478,
1183
+ "step": 3860
1184
+ },
1185
+ {
1186
+ "epoch": 2.76,
1187
+ "learning_rate": 4.128979440515594e-05,
1188
+ "loss": 0.4498,
1189
+ "step": 3880
1190
+ },
1191
+ {
1192
+ "epoch": 2.77,
1193
+ "learning_rate": 4.1204665517544144e-05,
1194
+ "loss": 0.4706,
1195
+ "step": 3900
1196
+ },
1197
+ {
1198
+ "epoch": 2.79,
1199
+ "learning_rate": 4.1119211351494795e-05,
1200
+ "loss": 0.4628,
1201
+ "step": 3920
1202
+ },
1203
+ {
1204
+ "epoch": 2.8,
1205
+ "learning_rate": 4.103343362234089e-05,
1206
+ "loss": 0.4568,
1207
+ "step": 3940
1208
+ },
1209
+ {
1210
+ "epoch": 2.82,
1211
+ "learning_rate": 4.0947334051910367e-05,
1212
+ "loss": 0.4552,
1213
+ "step": 3960
1214
+ },
1215
+ {
1216
+ "epoch": 2.83,
1217
+ "learning_rate": 4.086091436849153e-05,
1218
+ "loss": 0.4421,
1219
+ "step": 3980
1220
+ },
1221
+ {
1222
+ "epoch": 2.84,
1223
+ "learning_rate": 4.077417630679833e-05,
1224
+ "loss": 0.4589,
1225
+ "step": 4000
1226
+ },
1227
+ {
1228
+ "epoch": 2.86,
1229
+ "learning_rate": 4.068712160793559e-05,
1230
+ "loss": 0.4503,
1231
+ "step": 4020
1232
+ },
1233
+ {
1234
+ "epoch": 2.87,
1235
+ "learning_rate": 4.0599752019364026e-05,
1236
+ "loss": 0.4597,
1237
+ "step": 4040
1238
+ },
1239
+ {
1240
+ "epoch": 2.89,
1241
+ "learning_rate": 4.0512069294865176e-05,
1242
+ "loss": 0.4387,
1243
+ "step": 4060
1244
+ },
1245
+ {
1246
+ "epoch": 2.9,
1247
+ "learning_rate": 4.042407519450619e-05,
1248
+ "loss": 0.4456,
1249
+ "step": 4080
1250
+ },
1251
+ {
1252
+ "epoch": 2.92,
1253
+ "learning_rate": 4.033577148460456e-05,
1254
+ "loss": 0.4391,
1255
+ "step": 4100
1256
+ },
1257
+ {
1258
+ "epoch": 2.93,
1259
+ "learning_rate": 4.024715993769253e-05,
1260
+ "loss": 0.4618,
1261
+ "step": 4120
1262
+ },
1263
+ {
1264
+ "epoch": 2.94,
1265
+ "learning_rate": 4.0158242332481654e-05,
1266
+ "loss": 0.4621,
1267
+ "step": 4140
1268
+ },
1269
+ {
1270
+ "epoch": 2.96,
1271
+ "learning_rate": 4.006902045382701e-05,
1272
+ "loss": 0.4364,
1273
+ "step": 4160
1274
+ },
1275
+ {
1276
+ "epoch": 2.97,
1277
+ "learning_rate": 3.997949609269143e-05,
1278
+ "loss": 0.4468,
1279
+ "step": 4180
1280
+ },
1281
+ {
1282
+ "epoch": 2.99,
1283
+ "learning_rate": 3.9889671046109464e-05,
1284
+ "loss": 0.4554,
1285
+ "step": 4200
1286
+ },
1287
+ {
1288
+ "epoch": 3.0,
1289
+ "eval_loss": 0.5355364680290222,
1290
+ "eval_runtime": 170.7962,
1291
+ "eval_samples_per_second": 32.571,
1292
+ "eval_steps_per_second": 8.144,
1293
+ "step": 4219
1294
+ },
1295
+ {
1296
+ "epoch": 3.0,
1297
+ "learning_rate": 3.979954711715141e-05,
1298
+ "loss": 0.4513,
1299
+ "step": 4220
1300
+ },
1301
+ {
1302
+ "epoch": 3.01,
1303
+ "learning_rate": 3.9709126114887056e-05,
1304
+ "loss": 0.401,
1305
+ "step": 4240
1306
+ },
1307
+ {
1308
+ "epoch": 3.03,
1309
+ "learning_rate": 3.961840985434938e-05,
1310
+ "loss": 0.3981,
1311
+ "step": 4260
1312
+ },
1313
+ {
1314
+ "epoch": 3.04,
1315
+ "learning_rate": 3.952740015649812e-05,
1316
+ "loss": 0.4054,
1317
+ "step": 4280
1318
+ },
1319
+ {
1320
+ "epoch": 3.06,
1321
+ "learning_rate": 3.9436098848183226e-05,
1322
+ "loss": 0.3924,
1323
+ "step": 4300
1324
+ },
1325
+ {
1326
+ "epoch": 3.07,
1327
+ "learning_rate": 3.9344507762108165e-05,
1328
+ "loss": 0.4086,
1329
+ "step": 4320
1330
+ },
1331
+ {
1332
+ "epoch": 3.09,
1333
+ "learning_rate": 3.925262873679319e-05,
1334
+ "loss": 0.4095,
1335
+ "step": 4340
1336
+ },
1337
+ {
1338
+ "epoch": 3.1,
1339
+ "learning_rate": 3.916046361653836e-05,
1340
+ "loss": 0.3991,
1341
+ "step": 4360
1342
+ },
1343
+ {
1344
+ "epoch": 3.11,
1345
+ "learning_rate": 3.906801425138656e-05,
1346
+ "loss": 0.4067,
1347
+ "step": 4380
1348
+ },
1349
+ {
1350
+ "epoch": 3.13,
1351
+ "learning_rate": 3.89752824970864e-05,
1352
+ "loss": 0.402,
1353
+ "step": 4400
1354
+ },
1355
+ {
1356
+ "epoch": 3.14,
1357
+ "learning_rate": 3.888227021505486e-05,
1358
+ "loss": 0.4126,
1359
+ "step": 4420
1360
+ },
1361
+ {
1362
+ "epoch": 3.16,
1363
+ "learning_rate": 3.8788979272340066e-05,
1364
+ "loss": 0.4206,
1365
+ "step": 4440
1366
+ },
1367
+ {
1368
+ "epoch": 3.17,
1369
+ "learning_rate": 3.869541154158368e-05,
1370
+ "loss": 0.4187,
1371
+ "step": 4460
1372
+ },
1373
+ {
1374
+ "epoch": 3.19,
1375
+ "learning_rate": 3.860156890098339e-05,
1376
+ "loss": 0.4107,
1377
+ "step": 4480
1378
+ },
1379
+ {
1380
+ "epoch": 3.2,
1381
+ "learning_rate": 3.8507453234255176e-05,
1382
+ "loss": 0.4126,
1383
+ "step": 4500
1384
+ },
1385
+ {
1386
+ "epoch": 3.21,
1387
+ "learning_rate": 3.841306643059552e-05,
1388
+ "loss": 0.4081,
1389
+ "step": 4520
1390
+ },
1391
+ {
1392
+ "epoch": 3.23,
1393
+ "learning_rate": 3.8318410384643485e-05,
1394
+ "loss": 0.4109,
1395
+ "step": 4540
1396
+ },
1397
+ {
1398
+ "epoch": 3.24,
1399
+ "learning_rate": 3.822348699644264e-05,
1400
+ "loss": 0.4095,
1401
+ "step": 4560
1402
+ },
1403
+ {
1404
+ "epoch": 3.26,
1405
+ "learning_rate": 3.812829817140295e-05,
1406
+ "loss": 0.4068,
1407
+ "step": 4580
1408
+ },
1409
+ {
1410
+ "epoch": 3.27,
1411
+ "learning_rate": 3.8032845820262575e-05,
1412
+ "loss": 0.4002,
1413
+ "step": 4600
1414
+ },
1415
+ {
1416
+ "epoch": 3.28,
1417
+ "learning_rate": 3.793713185904942e-05,
1418
+ "loss": 0.4075,
1419
+ "step": 4620
1420
+ },
1421
+ {
1422
+ "epoch": 3.3,
1423
+ "learning_rate": 3.7841158209042756e-05,
1424
+ "loss": 0.4088,
1425
+ "step": 4640
1426
+ },
1427
+ {
1428
+ "epoch": 3.31,
1429
+ "learning_rate": 3.7744926796734596e-05,
1430
+ "loss": 0.3946,
1431
+ "step": 4660
1432
+ },
1433
+ {
1434
+ "epoch": 3.33,
1435
+ "learning_rate": 3.764843955379107e-05,
1436
+ "loss": 0.4004,
1437
+ "step": 4680
1438
+ },
1439
+ {
1440
+ "epoch": 3.34,
1441
+ "learning_rate": 3.7551698417013635e-05,
1442
+ "loss": 0.4077,
1443
+ "step": 4700
1444
+ },
1445
+ {
1446
+ "epoch": 3.36,
1447
+ "learning_rate": 3.7454705328300164e-05,
1448
+ "loss": 0.4015,
1449
+ "step": 4720
1450
+ },
1451
+ {
1452
+ "epoch": 3.37,
1453
+ "learning_rate": 3.735746223460604e-05,
1454
+ "loss": 0.4129,
1455
+ "step": 4740
1456
+ },
1457
+ {
1458
+ "epoch": 3.38,
1459
+ "learning_rate": 3.7259971087904984e-05,
1460
+ "loss": 0.4039,
1461
+ "step": 4760
1462
+ },
1463
+ {
1464
+ "epoch": 3.4,
1465
+ "learning_rate": 3.7162233845149944e-05,
1466
+ "loss": 0.3983,
1467
+ "step": 4780
1468
+ },
1469
+ {
1470
+ "epoch": 3.41,
1471
+ "learning_rate": 3.706425246823378e-05,
1472
+ "loss": 0.4059,
1473
+ "step": 4800
1474
+ },
1475
+ {
1476
+ "epoch": 3.43,
1477
+ "learning_rate": 3.69660289239499e-05,
1478
+ "loss": 0.4091,
1479
+ "step": 4820
1480
+ },
1481
+ {
1482
+ "epoch": 3.44,
1483
+ "learning_rate": 3.6867565183952764e-05,
1484
+ "loss": 0.401,
1485
+ "step": 4840
1486
+ },
1487
+ {
1488
+ "epoch": 3.46,
1489
+ "learning_rate": 3.67688632247183e-05,
1490
+ "loss": 0.3997,
1491
+ "step": 4860
1492
+ },
1493
+ {
1494
+ "epoch": 3.47,
1495
+ "learning_rate": 3.666992502750426e-05,
1496
+ "loss": 0.4017,
1497
+ "step": 4880
1498
+ },
1499
+ {
1500
+ "epoch": 3.48,
1501
+ "learning_rate": 3.657075257831043e-05,
1502
+ "loss": 0.3891,
1503
+ "step": 4900
1504
+ },
1505
+ {
1506
+ "epoch": 3.5,
1507
+ "learning_rate": 3.6471347867838766e-05,
1508
+ "loss": 0.4148,
1509
+ "step": 4920
1510
+ },
1511
+ {
1512
+ "epoch": 3.51,
1513
+ "learning_rate": 3.6371712891453424e-05,
1514
+ "loss": 0.4028,
1515
+ "step": 4940
1516
+ },
1517
+ {
1518
+ "epoch": 3.53,
1519
+ "learning_rate": 3.627184964914074e-05,
1520
+ "loss": 0.4149,
1521
+ "step": 4960
1522
+ },
1523
+ {
1524
+ "epoch": 3.54,
1525
+ "learning_rate": 3.617176014546906e-05,
1526
+ "loss": 0.3956,
1527
+ "step": 4980
1528
+ },
1529
+ {
1530
+ "epoch": 3.55,
1531
+ "learning_rate": 3.607144638954847e-05,
1532
+ "loss": 0.403,
1533
+ "step": 5000
1534
+ },
1535
+ {
1536
+ "epoch": 3.57,
1537
+ "learning_rate": 3.597091039499055e-05,
1538
+ "loss": 0.4101,
1539
+ "step": 5020
1540
+ },
1541
+ {
1542
+ "epoch": 3.58,
1543
+ "learning_rate": 3.587015417986788e-05,
1544
+ "loss": 0.4055,
1545
+ "step": 5040
1546
+ },
1547
+ {
1548
+ "epoch": 3.6,
1549
+ "learning_rate": 3.576917976667357e-05,
1550
+ "loss": 0.3912,
1551
+ "step": 5060
1552
+ },
1553
+ {
1554
+ "epoch": 3.61,
1555
+ "learning_rate": 3.566798918228062e-05,
1556
+ "loss": 0.3942,
1557
+ "step": 5080
1558
+ },
1559
+ {
1560
+ "epoch": 3.63,
1561
+ "learning_rate": 3.5566584457901304e-05,
1562
+ "loss": 0.3802,
1563
+ "step": 5100
1564
+ },
1565
+ {
1566
+ "epoch": 3.64,
1567
+ "learning_rate": 3.546496762904633e-05,
1568
+ "loss": 0.4022,
1569
+ "step": 5120
1570
+ },
1571
+ {
1572
+ "epoch": 3.65,
1573
+ "learning_rate": 3.536314073548402e-05,
1574
+ "loss": 0.4009,
1575
+ "step": 5140
1576
+ },
1577
+ {
1578
+ "epoch": 3.67,
1579
+ "learning_rate": 3.5261105821199344e-05,
1580
+ "loss": 0.4041,
1581
+ "step": 5160
1582
+ },
1583
+ {
1584
+ "epoch": 3.68,
1585
+ "learning_rate": 3.515886493435291e-05,
1586
+ "loss": 0.4123,
1587
+ "step": 5180
1588
+ },
1589
+ {
1590
+ "epoch": 3.7,
1591
+ "learning_rate": 3.505642012723983e-05,
1592
+ "loss": 0.3953,
1593
+ "step": 5200
1594
+ },
1595
+ {
1596
+ "epoch": 3.71,
1597
+ "learning_rate": 3.495377345624854e-05,
1598
+ "loss": 0.3994,
1599
+ "step": 5220
1600
+ },
1601
+ {
1602
+ "epoch": 3.73,
1603
+ "learning_rate": 3.4850926981819525e-05,
1604
+ "loss": 0.3978,
1605
+ "step": 5240
1606
+ },
1607
+ {
1608
+ "epoch": 3.74,
1609
+ "learning_rate": 3.4747882768403947e-05,
1610
+ "loss": 0.3955,
1611
+ "step": 5260
1612
+ },
1613
+ {
1614
+ "epoch": 3.75,
1615
+ "learning_rate": 3.464464288442219e-05,
1616
+ "loss": 0.4025,
1617
+ "step": 5280
1618
+ },
1619
+ {
1620
+ "epoch": 3.77,
1621
+ "learning_rate": 3.4541209402222396e-05,
1622
+ "loss": 0.3929,
1623
+ "step": 5300
1624
+ },
1625
+ {
1626
+ "epoch": 3.78,
1627
+ "learning_rate": 3.443758439803879e-05,
1628
+ "loss": 0.3876,
1629
+ "step": 5320
1630
+ },
1631
+ {
1632
+ "epoch": 3.8,
1633
+ "learning_rate": 3.433376995195008e-05,
1634
+ "loss": 0.3939,
1635
+ "step": 5340
1636
+ },
1637
+ {
1638
+ "epoch": 3.81,
1639
+ "learning_rate": 3.422976814783765e-05,
1640
+ "loss": 0.3927,
1641
+ "step": 5360
1642
+ },
1643
+ {
1644
+ "epoch": 3.83,
1645
+ "learning_rate": 3.4125581073343735e-05,
1646
+ "loss": 0.3945,
1647
+ "step": 5380
1648
+ },
1649
+ {
1650
+ "epoch": 3.84,
1651
+ "learning_rate": 3.4021210819829555e-05,
1652
+ "loss": 0.3889,
1653
+ "step": 5400
1654
+ },
1655
+ {
1656
+ "epoch": 3.85,
1657
+ "learning_rate": 3.391665948233328e-05,
1658
+ "loss": 0.4027,
1659
+ "step": 5420
1660
+ },
1661
+ {
1662
+ "epoch": 3.87,
1663
+ "learning_rate": 3.3811929159528024e-05,
1664
+ "loss": 0.3915,
1665
+ "step": 5440
1666
+ },
1667
+ {
1668
+ "epoch": 3.88,
1669
+ "learning_rate": 3.370702195367967e-05,
1670
+ "loss": 0.398,
1671
+ "step": 5460
1672
+ },
1673
+ {
1674
+ "epoch": 3.9,
1675
+ "learning_rate": 3.360193997060475e-05,
1676
+ "loss": 0.3965,
1677
+ "step": 5480
1678
+ },
1679
+ {
1680
+ "epoch": 3.91,
1681
+ "learning_rate": 3.349668531962807e-05,
1682
+ "loss": 0.3929,
1683
+ "step": 5500
1684
+ },
1685
+ {
1686
+ "epoch": 3.92,
1687
+ "learning_rate": 3.339126011354044e-05,
1688
+ "loss": 0.391,
1689
+ "step": 5520
1690
+ },
1691
+ {
1692
+ "epoch": 3.94,
1693
+ "learning_rate": 3.328566646855625e-05,
1694
+ "loss": 0.3796,
1695
+ "step": 5540
1696
+ },
1697
+ {
1698
+ "epoch": 3.95,
1699
+ "learning_rate": 3.3179906504270996e-05,
1700
+ "loss": 0.394,
1701
+ "step": 5560
1702
+ },
1703
+ {
1704
+ "epoch": 3.97,
1705
+ "learning_rate": 3.30739823436187e-05,
1706
+ "loss": 0.3899,
1707
+ "step": 5580
1708
+ },
1709
+ {
1710
+ "epoch": 3.98,
1711
+ "learning_rate": 3.2967896112829324e-05,
1712
+ "loss": 0.3927,
1713
+ "step": 5600
1714
+ },
1715
+ {
1716
+ "epoch": 4.0,
1717
+ "learning_rate": 3.286164994138612e-05,
1718
+ "loss": 0.3893,
1719
+ "step": 5620
1720
+ },
1721
+ {
1722
+ "epoch": 4.0,
1723
+ "eval_loss": 0.4602062702178955,
1724
+ "eval_runtime": 171.0396,
1725
+ "eval_samples_per_second": 32.525,
1726
+ "eval_steps_per_second": 8.133,
1727
+ "step": 5626
1728
+ },
1729
+ {
1730
+ "epoch": 4.01,
1731
+ "learning_rate": 3.27552459619828e-05,
1732
+ "loss": 0.3656,
1733
+ "step": 5640
1734
+ },
1735
+ {
1736
+ "epoch": 4.02,
1737
+ "learning_rate": 3.26486863104808e-05,
1738
+ "loss": 0.3517,
1739
+ "step": 5660
1740
+ },
1741
+ {
1742
+ "epoch": 4.04,
1743
+ "learning_rate": 3.25419731258664e-05,
1744
+ "loss": 0.3473,
1745
+ "step": 5680
1746
+ },
1747
+ {
1748
+ "epoch": 4.05,
1749
+ "learning_rate": 3.2435108550207746e-05,
1750
+ "loss": 0.3588,
1751
+ "step": 5700
1752
+ },
1753
+ {
1754
+ "epoch": 4.07,
1755
+ "learning_rate": 3.232809472861189e-05,
1756
+ "loss": 0.3478,
1757
+ "step": 5720
1758
+ },
1759
+ {
1760
+ "epoch": 4.08,
1761
+ "learning_rate": 3.22209338091817e-05,
1762
+ "loss": 0.3504,
1763
+ "step": 5740
1764
+ },
1765
+ {
1766
+ "epoch": 4.1,
1767
+ "learning_rate": 3.211362794297278e-05,
1768
+ "loss": 0.357,
1769
+ "step": 5760
1770
+ },
1771
+ {
1772
+ "epoch": 4.11,
1773
+ "learning_rate": 3.200617928395028e-05,
1774
+ "loss": 0.3609,
1775
+ "step": 5780
1776
+ },
1777
+ {
1778
+ "epoch": 4.12,
1779
+ "learning_rate": 3.1898589988945596e-05,
1780
+ "loss": 0.3501,
1781
+ "step": 5800
1782
+ },
1783
+ {
1784
+ "epoch": 4.14,
1785
+ "learning_rate": 3.179086221761319e-05,
1786
+ "loss": 0.3547,
1787
+ "step": 5820
1788
+ },
1789
+ {
1790
+ "epoch": 4.15,
1791
+ "learning_rate": 3.1682998132387146e-05,
1792
+ "loss": 0.3587,
1793
+ "step": 5840
1794
+ },
1795
+ {
1796
+ "epoch": 4.17,
1797
+ "learning_rate": 3.15749998984378e-05,
1798
+ "loss": 0.3496,
1799
+ "step": 5860
1800
+ },
1801
+ {
1802
+ "epoch": 4.18,
1803
+ "learning_rate": 3.146686968362827e-05,
1804
+ "loss": 0.3547,
1805
+ "step": 5880
1806
+ },
1807
+ {
1808
+ "epoch": 4.19,
1809
+ "learning_rate": 3.135860965847096e-05,
1810
+ "loss": 0.356,
1811
+ "step": 5900
1812
+ },
1813
+ {
1814
+ "epoch": 4.21,
1815
+ "learning_rate": 3.125022199608396e-05,
1816
+ "loss": 0.3488,
1817
+ "step": 5920
1818
+ },
1819
+ {
1820
+ "epoch": 4.22,
1821
+ "learning_rate": 3.114170887214744e-05,
1822
+ "loss": 0.357,
1823
+ "step": 5940
1824
+ },
1825
+ {
1826
+ "epoch": 4.24,
1827
+ "learning_rate": 3.103307246485997e-05,
1828
+ "loss": 0.3486,
1829
+ "step": 5960
1830
+ },
1831
+ {
1832
+ "epoch": 4.25,
1833
+ "learning_rate": 3.092431495489484e-05,
1834
+ "loss": 0.3592,
1835
+ "step": 5980
1836
+ },
1837
+ {
1838
+ "epoch": 4.27,
1839
+ "learning_rate": 3.0815438525356194e-05,
1840
+ "loss": 0.3515,
1841
+ "step": 6000
1842
+ },
1843
+ {
1844
+ "epoch": 4.28,
1845
+ "learning_rate": 3.070644536173531e-05,
1846
+ "loss": 0.3495,
1847
+ "step": 6020
1848
+ },
1849
+ {
1850
+ "epoch": 4.29,
1851
+ "learning_rate": 3.059733765186666e-05,
1852
+ "loss": 0.369,
1853
+ "step": 6040
1854
+ },
1855
+ {
1856
+ "epoch": 4.31,
1857
+ "learning_rate": 3.0488117585884037e-05,
1858
+ "loss": 0.3476,
1859
+ "step": 6060
1860
+ },
1861
+ {
1862
+ "epoch": 4.32,
1863
+ "learning_rate": 3.0378787356176557e-05,
1864
+ "loss": 0.3559,
1865
+ "step": 6080
1866
+ },
1867
+ {
1868
+ "epoch": 4.34,
1869
+ "learning_rate": 3.0269349157344667e-05,
1870
+ "loss": 0.3502,
1871
+ "step": 6100
1872
+ },
1873
+ {
1874
+ "epoch": 4.35,
1875
+ "learning_rate": 3.015980518615611e-05,
1876
+ "loss": 0.3507,
1877
+ "step": 6120
1878
+ },
1879
+ {
1880
+ "epoch": 4.37,
1881
+ "learning_rate": 3.0050157641501803e-05,
1882
+ "loss": 0.3452,
1883
+ "step": 6140
1884
+ },
1885
+ {
1886
+ "epoch": 4.38,
1887
+ "learning_rate": 2.9940408724351694e-05,
1888
+ "loss": 0.3487,
1889
+ "step": 6160
1890
+ },
1891
+ {
1892
+ "epoch": 4.39,
1893
+ "learning_rate": 2.9830560637710614e-05,
1894
+ "loss": 0.3491,
1895
+ "step": 6180
1896
+ },
1897
+ {
1898
+ "epoch": 4.41,
1899
+ "learning_rate": 2.972061558657403e-05,
1900
+ "loss": 0.3516,
1901
+ "step": 6200
1902
+ },
1903
+ {
1904
+ "epoch": 4.42,
1905
+ "learning_rate": 2.9610575777883785e-05,
1906
+ "loss": 0.3527,
1907
+ "step": 6220
1908
+ },
1909
+ {
1910
+ "epoch": 4.44,
1911
+ "learning_rate": 2.9500443420483815e-05,
1912
+ "loss": 0.3599,
1913
+ "step": 6240
1914
+ },
1915
+ {
1916
+ "epoch": 4.45,
1917
+ "learning_rate": 2.9390220725075778e-05,
1918
+ "loss": 0.3485,
1919
+ "step": 6260
1920
+ },
1921
+ {
1922
+ "epoch": 4.46,
1923
+ "learning_rate": 2.9279909904174717e-05,
1924
+ "loss": 0.3483,
1925
+ "step": 6280
1926
+ },
1927
+ {
1928
+ "epoch": 4.48,
1929
+ "learning_rate": 2.9169513172064634e-05,
1930
+ "loss": 0.3462,
1931
+ "step": 6300
1932
+ },
1933
+ {
1934
+ "epoch": 4.49,
1935
+ "learning_rate": 2.9059032744754022e-05,
1936
+ "loss": 0.3492,
1937
+ "step": 6320
1938
+ },
1939
+ {
1940
+ "epoch": 4.51,
1941
+ "learning_rate": 2.8948470839931403e-05,
1942
+ "loss": 0.3512,
1943
+ "step": 6340
1944
+ },
1945
+ {
1946
+ "epoch": 4.52,
1947
+ "learning_rate": 2.883782967692082e-05,
1948
+ "loss": 0.3501,
1949
+ "step": 6360
1950
+ },
1951
+ {
1952
+ "epoch": 4.54,
1953
+ "learning_rate": 2.872711147663726e-05,
1954
+ "loss": 0.352,
1955
+ "step": 6380
1956
+ },
1957
+ {
1958
+ "epoch": 4.55,
1959
+ "learning_rate": 2.8616318461542102e-05,
1960
+ "loss": 0.3523,
1961
+ "step": 6400
1962
+ },
1963
+ {
1964
+ "epoch": 4.56,
1965
+ "learning_rate": 2.8505452855598492e-05,
1966
+ "loss": 0.3498,
1967
+ "step": 6420
1968
+ },
1969
+ {
1970
+ "epoch": 4.58,
1971
+ "learning_rate": 2.8394516884226683e-05,
1972
+ "loss": 0.349,
1973
+ "step": 6440
1974
+ },
1975
+ {
1976
+ "epoch": 4.59,
1977
+ "learning_rate": 2.8283512774259414e-05,
1978
+ "loss": 0.3625,
1979
+ "step": 6460
1980
+ },
1981
+ {
1982
+ "epoch": 4.61,
1983
+ "learning_rate": 2.817244275389716e-05,
1984
+ "loss": 0.3449,
1985
+ "step": 6480
1986
+ },
1987
+ {
1988
+ "epoch": 4.62,
1989
+ "learning_rate": 2.806130905266342e-05,
1990
+ "loss": 0.3548,
1991
+ "step": 6500
1992
+ },
1993
+ {
1994
+ "epoch": 4.64,
1995
+ "learning_rate": 2.7950113901359974e-05,
1996
+ "loss": 0.3465,
1997
+ "step": 6520
1998
+ },
1999
+ {
2000
+ "epoch": 4.65,
2001
+ "learning_rate": 2.7838859532022116e-05,
2002
+ "loss": 0.3465,
2003
+ "step": 6540
2004
+ },
2005
+ {
2006
+ "epoch": 4.66,
2007
+ "learning_rate": 2.7727548177873798e-05,
2008
+ "loss": 0.3583,
2009
+ "step": 6560
2010
+ },
2011
+ {
2012
+ "epoch": 4.68,
2013
+ "learning_rate": 2.7616182073282854e-05,
2014
+ "loss": 0.3512,
2015
+ "step": 6580
2016
+ },
2017
+ {
2018
+ "epoch": 4.69,
2019
+ "learning_rate": 2.7504763453716132e-05,
2020
+ "loss": 0.3607,
2021
+ "step": 6600
2022
+ },
2023
+ {
2024
+ "epoch": 4.71,
2025
+ "learning_rate": 2.7393294555694614e-05,
2026
+ "loss": 0.3508,
2027
+ "step": 6620
2028
+ },
2029
+ {
2030
+ "epoch": 4.72,
2031
+ "learning_rate": 2.728177761674854e-05,
2032
+ "loss": 0.3382,
2033
+ "step": 6640
2034
+ },
2035
+ {
2036
+ "epoch": 4.74,
2037
+ "learning_rate": 2.717021487537246e-05,
2038
+ "loss": 0.3477,
2039
+ "step": 6660
2040
+ },
2041
+ {
2042
+ "epoch": 4.75,
2043
+ "learning_rate": 2.7058608570980343e-05,
2044
+ "loss": 0.3514,
2045
+ "step": 6680
2046
+ },
2047
+ {
2048
+ "epoch": 4.76,
2049
+ "learning_rate": 2.6946960943860596e-05,
2050
+ "loss": 0.3496,
2051
+ "step": 6700
2052
+ },
2053
+ {
2054
+ "epoch": 4.78,
2055
+ "learning_rate": 2.6835274235131107e-05,
2056
+ "loss": 0.3557,
2057
+ "step": 6720
2058
+ },
2059
+ {
2060
+ "epoch": 4.79,
2061
+ "learning_rate": 2.6723550686694245e-05,
2062
+ "loss": 0.3527,
2063
+ "step": 6740
2064
+ },
2065
+ {
2066
+ "epoch": 4.81,
2067
+ "learning_rate": 2.661179254119187e-05,
2068
+ "loss": 0.3508,
2069
+ "step": 6760
2070
+ },
2071
+ {
2072
+ "epoch": 4.82,
2073
+ "learning_rate": 2.6500002041960338e-05,
2074
+ "loss": 0.3534,
2075
+ "step": 6780
2076
+ },
2077
+ {
2078
+ "epoch": 4.83,
2079
+ "learning_rate": 2.6388181432985405e-05,
2080
+ "loss": 0.3437,
2081
+ "step": 6800
2082
+ },
2083
+ {
2084
+ "epoch": 4.85,
2085
+ "learning_rate": 2.6276332958857246e-05,
2086
+ "loss": 0.3453,
2087
+ "step": 6820
2088
+ },
2089
+ {
2090
+ "epoch": 4.86,
2091
+ "learning_rate": 2.6164458864725384e-05,
2092
+ "loss": 0.3489,
2093
+ "step": 6840
2094
+ },
2095
+ {
2096
+ "epoch": 4.88,
2097
+ "learning_rate": 2.6052561396253595e-05,
2098
+ "loss": 0.3483,
2099
+ "step": 6860
2100
+ },
2101
+ {
2102
+ "epoch": 4.89,
2103
+ "learning_rate": 2.5940642799574876e-05,
2104
+ "loss": 0.3455,
2105
+ "step": 6880
2106
+ },
2107
+ {
2108
+ "epoch": 4.91,
2109
+ "learning_rate": 2.5828705321246304e-05,
2110
+ "loss": 0.3603,
2111
+ "step": 6900
2112
+ },
2113
+ {
2114
+ "epoch": 4.92,
2115
+ "learning_rate": 2.5716751208204e-05,
2116
+ "loss": 0.3491,
2117
+ "step": 6920
2118
+ },
2119
+ {
2120
+ "epoch": 4.93,
2121
+ "learning_rate": 2.560478270771798e-05,
2122
+ "loss": 0.3383,
2123
+ "step": 6940
2124
+ },
2125
+ {
2126
+ "epoch": 4.95,
2127
+ "learning_rate": 2.549280206734705e-05,
2128
+ "loss": 0.3501,
2129
+ "step": 6960
2130
+ },
2131
+ {
2132
+ "epoch": 4.96,
2133
+ "learning_rate": 2.538081153489373e-05,
2134
+ "loss": 0.3462,
2135
+ "step": 6980
2136
+ },
2137
+ {
2138
+ "epoch": 4.98,
2139
+ "learning_rate": 2.5268813358359084e-05,
2140
+ "loss": 0.3493,
2141
+ "step": 7000
2142
+ },
2143
+ {
2144
+ "epoch": 4.99,
2145
+ "learning_rate": 2.5156809785897623e-05,
2146
+ "loss": 0.3544,
2147
+ "step": 7020
2148
+ },
2149
+ {
2150
+ "epoch": 5.0,
2151
+ "eval_loss": 0.48198580741882324,
2152
+ "eval_runtime": 171.3048,
2153
+ "eval_samples_per_second": 32.474,
2154
+ "eval_steps_per_second": 8.12,
2155
+ "step": 7032
2156
+ },
2157
+ {
2158
+ "epoch": 5.01,
2159
+ "learning_rate": 2.5044803065772165e-05,
2160
+ "loss": 0.3358,
2161
+ "step": 7040
2162
+ },
2163
+ {
2164
+ "epoch": 5.02,
2165
+ "learning_rate": 2.4932795446308734e-05,
2166
+ "loss": 0.3109,
2167
+ "step": 7060
2168
+ },
2169
+ {
2170
+ "epoch": 5.03,
2171
+ "learning_rate": 2.482078917585136e-05,
2172
+ "loss": 0.3119,
2173
+ "step": 7080
2174
+ },
2175
+ {
2176
+ "epoch": 5.05,
2177
+ "learning_rate": 2.4708786502717054e-05,
2178
+ "loss": 0.314,
2179
+ "step": 7100
2180
+ },
2181
+ {
2182
+ "epoch": 5.06,
2183
+ "learning_rate": 2.4596789675150577e-05,
2184
+ "loss": 0.3164,
2185
+ "step": 7120
2186
+ },
2187
+ {
2188
+ "epoch": 5.08,
2189
+ "learning_rate": 2.4484800941279355e-05,
2190
+ "loss": 0.3078,
2191
+ "step": 7140
2192
+ },
2193
+ {
2194
+ "epoch": 5.09,
2195
+ "learning_rate": 2.4372822549068354e-05,
2196
+ "loss": 0.3058,
2197
+ "step": 7160
2198
+ },
2199
+ {
2200
+ "epoch": 5.1,
2201
+ "learning_rate": 2.4260856746274963e-05,
2202
+ "loss": 0.3026,
2203
+ "step": 7180
2204
+ },
2205
+ {
2206
+ "epoch": 5.12,
2207
+ "learning_rate": 2.4148905780403844e-05,
2208
+ "loss": 0.3045,
2209
+ "step": 7200
2210
+ },
2211
+ {
2212
+ "epoch": 5.13,
2213
+ "learning_rate": 2.4036971898661832e-05,
2214
+ "loss": 0.3097,
2215
+ "step": 7220
2216
+ },
2217
+ {
2218
+ "epoch": 5.15,
2219
+ "learning_rate": 2.392505734791285e-05,
2220
+ "loss": 0.3157,
2221
+ "step": 7240
2222
+ },
2223
+ {
2224
+ "epoch": 5.16,
2225
+ "learning_rate": 2.3813164374632775e-05,
2226
+ "loss": 0.3022,
2227
+ "step": 7260
2228
+ },
2229
+ {
2230
+ "epoch": 5.18,
2231
+ "learning_rate": 2.3701295224864356e-05,
2232
+ "loss": 0.307,
2233
+ "step": 7280
2234
+ },
2235
+ {
2236
+ "epoch": 5.19,
2237
+ "learning_rate": 2.3589452144172137e-05,
2238
+ "loss": 0.3149,
2239
+ "step": 7300
2240
+ },
2241
+ {
2242
+ "epoch": 5.2,
2243
+ "learning_rate": 2.347763737759736e-05,
2244
+ "loss": 0.3075,
2245
+ "step": 7320
2246
+ },
2247
+ {
2248
+ "epoch": 5.22,
2249
+ "learning_rate": 2.336585316961292e-05,
2250
+ "loss": 0.312,
2251
+ "step": 7340
2252
+ },
2253
+ {
2254
+ "epoch": 5.23,
2255
+ "learning_rate": 2.325410176407833e-05,
2256
+ "loss": 0.3114,
2257
+ "step": 7360
2258
+ },
2259
+ {
2260
+ "epoch": 5.25,
2261
+ "learning_rate": 2.314238540419461e-05,
2262
+ "loss": 0.3183,
2263
+ "step": 7380
2264
+ },
2265
+ {
2266
+ "epoch": 5.26,
2267
+ "learning_rate": 2.303070633245933e-05,
2268
+ "loss": 0.3089,
2269
+ "step": 7400
2270
+ },
2271
+ {
2272
+ "epoch": 5.28,
2273
+ "learning_rate": 2.2919066790621575e-05,
2274
+ "loss": 0.312,
2275
+ "step": 7420
2276
+ },
2277
+ {
2278
+ "epoch": 5.29,
2279
+ "learning_rate": 2.280746901963693e-05,
2280
+ "loss": 0.3094,
2281
+ "step": 7440
2282
+ },
2283
+ {
2284
+ "epoch": 5.3,
2285
+ "learning_rate": 2.26959152596225e-05,
2286
+ "loss": 0.3099,
2287
+ "step": 7460
2288
+ },
2289
+ {
2290
+ "epoch": 5.32,
2291
+ "learning_rate": 2.2584407749811985e-05,
2292
+ "loss": 0.3111,
2293
+ "step": 7480
2294
+ },
2295
+ {
2296
+ "epoch": 5.33,
2297
+ "learning_rate": 2.2472948728510664e-05,
2298
+ "loss": 0.3133,
2299
+ "step": 7500
2300
+ },
2301
+ {
2302
+ "epoch": 5.35,
2303
+ "learning_rate": 2.2361540433050492e-05,
2304
+ "loss": 0.3049,
2305
+ "step": 7520
2306
+ },
2307
+ {
2308
+ "epoch": 5.36,
2309
+ "learning_rate": 2.2250185099745253e-05,
2310
+ "loss": 0.3056,
2311
+ "step": 7540
2312
+ },
2313
+ {
2314
+ "epoch": 5.38,
2315
+ "learning_rate": 2.213888496384556e-05,
2316
+ "loss": 0.3085,
2317
+ "step": 7560
2318
+ },
2319
+ {
2320
+ "epoch": 5.39,
2321
+ "learning_rate": 2.2027642259494046e-05,
2322
+ "loss": 0.311,
2323
+ "step": 7580
2324
+ },
2325
+ {
2326
+ "epoch": 5.4,
2327
+ "learning_rate": 2.1916459219680557e-05,
2328
+ "loss": 0.3064,
2329
+ "step": 7600
2330
+ },
2331
+ {
2332
+ "epoch": 5.42,
2333
+ "learning_rate": 2.1805338076197234e-05,
2334
+ "loss": 0.3091,
2335
+ "step": 7620
2336
+ },
2337
+ {
2338
+ "epoch": 5.43,
2339
+ "learning_rate": 2.169428105959378e-05,
2340
+ "loss": 0.3031,
2341
+ "step": 7640
2342
+ },
2343
+ {
2344
+ "epoch": 5.45,
2345
+ "learning_rate": 2.1583290399132695e-05,
2346
+ "loss": 0.3115,
2347
+ "step": 7660
2348
+ },
2349
+ {
2350
+ "epoch": 5.46,
2351
+ "learning_rate": 2.147236832274447e-05,
2352
+ "loss": 0.3091,
2353
+ "step": 7680
2354
+ },
2355
+ {
2356
+ "epoch": 5.47,
2357
+ "learning_rate": 2.1361517056982903e-05,
2358
+ "loss": 0.3062,
2359
+ "step": 7700
2360
+ },
2361
+ {
2362
+ "epoch": 5.49,
2363
+ "learning_rate": 2.1250738826980432e-05,
2364
+ "loss": 0.2961,
2365
+ "step": 7720
2366
+ },
2367
+ {
2368
+ "epoch": 5.5,
2369
+ "learning_rate": 2.1140035856403405e-05,
2370
+ "loss": 0.306,
2371
+ "step": 7740
2372
+ },
2373
+ {
2374
+ "epoch": 5.52,
2375
+ "learning_rate": 2.1029410367407476e-05,
2376
+ "loss": 0.3104,
2377
+ "step": 7760
2378
+ },
2379
+ {
2380
+ "epoch": 5.53,
2381
+ "learning_rate": 2.0918864580593034e-05,
2382
+ "loss": 0.3127,
2383
+ "step": 7780
2384
+ },
2385
+ {
2386
+ "epoch": 5.55,
2387
+ "learning_rate": 2.0808400714960567e-05,
2388
+ "loss": 0.3205,
2389
+ "step": 7800
2390
+ },
2391
+ {
2392
+ "epoch": 5.56,
2393
+ "learning_rate": 2.0698020987866153e-05,
2394
+ "loss": 0.3037,
2395
+ "step": 7820
2396
+ },
2397
+ {
2398
+ "epoch": 5.57,
2399
+ "learning_rate": 2.058772761497694e-05,
2400
+ "loss": 0.3058,
2401
+ "step": 7840
2402
+ },
2403
+ {
2404
+ "epoch": 5.59,
2405
+ "learning_rate": 2.047752281022671e-05,
2406
+ "loss": 0.3088,
2407
+ "step": 7860
2408
+ },
2409
+ {
2410
+ "epoch": 5.6,
2411
+ "learning_rate": 2.0367408785771353e-05,
2412
+ "loss": 0.3092,
2413
+ "step": 7880
2414
+ },
2415
+ {
2416
+ "epoch": 5.62,
2417
+ "learning_rate": 2.0257387751944556e-05,
2418
+ "loss": 0.3084,
2419
+ "step": 7900
2420
+ },
2421
+ {
2422
+ "epoch": 5.63,
2423
+ "learning_rate": 2.014746191721337e-05,
2424
+ "loss": 0.3144,
2425
+ "step": 7920
2426
+ },
2427
+ {
2428
+ "epoch": 5.65,
2429
+ "learning_rate": 2.003763348813391e-05,
2430
+ "loss": 0.2987,
2431
+ "step": 7940
2432
+ },
2433
+ {
2434
+ "epoch": 5.66,
2435
+ "learning_rate": 1.992790466930706e-05,
2436
+ "loss": 0.311,
2437
+ "step": 7960
2438
+ },
2439
+ {
2440
+ "epoch": 5.67,
2441
+ "learning_rate": 1.98182776633342e-05,
2442
+ "loss": 0.3113,
2443
+ "step": 7980
2444
+ },
2445
+ {
2446
+ "epoch": 5.69,
2447
+ "learning_rate": 1.9708754670773005e-05,
2448
+ "loss": 0.3048,
2449
+ "step": 8000
2450
+ },
2451
+ {
2452
+ "epoch": 5.7,
2453
+ "learning_rate": 1.9599337890093302e-05,
2454
+ "loss": 0.3103,
2455
+ "step": 8020
2456
+ },
2457
+ {
2458
+ "epoch": 5.72,
2459
+ "learning_rate": 1.9490029517632884e-05,
2460
+ "loss": 0.3112,
2461
+ "step": 8040
2462
+ },
2463
+ {
2464
+ "epoch": 5.73,
2465
+ "learning_rate": 1.9380831747553458e-05,
2466
+ "loss": 0.31,
2467
+ "step": 8060
2468
+ },
2469
+ {
2470
+ "epoch": 5.74,
2471
+ "learning_rate": 1.9271746771796607e-05,
2472
+ "loss": 0.3103,
2473
+ "step": 8080
2474
+ },
2475
+ {
2476
+ "epoch": 5.76,
2477
+ "learning_rate": 1.9162776780039766e-05,
2478
+ "loss": 0.3113,
2479
+ "step": 8100
2480
+ },
2481
+ {
2482
+ "epoch": 5.77,
2483
+ "learning_rate": 1.905392395965227e-05,
2484
+ "loss": 0.3167,
2485
+ "step": 8120
2486
+ },
2487
+ {
2488
+ "epoch": 5.79,
2489
+ "learning_rate": 1.8945190495651492e-05,
2490
+ "loss": 0.3082,
2491
+ "step": 8140
2492
+ },
2493
+ {
2494
+ "epoch": 5.8,
2495
+ "learning_rate": 1.8836578570658926e-05,
2496
+ "loss": 0.3016,
2497
+ "step": 8160
2498
+ },
2499
+ {
2500
+ "epoch": 5.82,
2501
+ "learning_rate": 1.872809036485637e-05,
2502
+ "loss": 0.3061,
2503
+ "step": 8180
2504
+ },
2505
+ {
2506
+ "epoch": 5.83,
2507
+ "learning_rate": 1.8619728055942254e-05,
2508
+ "loss": 0.3113,
2509
+ "step": 8200
2510
+ },
2511
+ {
2512
+ "epoch": 5.84,
2513
+ "learning_rate": 1.851149381908781e-05,
2514
+ "loss": 0.3114,
2515
+ "step": 8220
2516
+ },
2517
+ {
2518
+ "epoch": 5.86,
2519
+ "learning_rate": 1.8403389826893476e-05,
2520
+ "loss": 0.3059,
2521
+ "step": 8240
2522
+ },
2523
+ {
2524
+ "epoch": 5.87,
2525
+ "learning_rate": 1.8295418249345283e-05,
2526
+ "loss": 0.311,
2527
+ "step": 8260
2528
+ },
2529
+ {
2530
+ "epoch": 5.89,
2531
+ "learning_rate": 1.8187581253771274e-05,
2532
+ "loss": 0.3145,
2533
+ "step": 8280
2534
+ },
2535
+ {
2536
+ "epoch": 5.9,
2537
+ "learning_rate": 1.8079881004798005e-05,
2538
+ "loss": 0.3126,
2539
+ "step": 8300
2540
+ },
2541
+ {
2542
+ "epoch": 5.92,
2543
+ "learning_rate": 1.797231966430712e-05,
2544
+ "loss": 0.3077,
2545
+ "step": 8320
2546
+ },
2547
+ {
2548
+ "epoch": 5.93,
2549
+ "learning_rate": 1.7864899391391915e-05,
2550
+ "loss": 0.3087,
2551
+ "step": 8340
2552
+ },
2553
+ {
2554
+ "epoch": 5.94,
2555
+ "learning_rate": 1.775762234231401e-05,
2556
+ "loss": 0.3021,
2557
+ "step": 8360
2558
+ },
2559
+ {
2560
+ "epoch": 5.96,
2561
+ "learning_rate": 1.7650490670460113e-05,
2562
+ "loss": 0.3076,
2563
+ "step": 8380
2564
+ },
2565
+ {
2566
+ "epoch": 5.97,
2567
+ "learning_rate": 1.7543506526298713e-05,
2568
+ "loss": 0.3115,
2569
+ "step": 8400
2570
+ },
2571
+ {
2572
+ "epoch": 5.99,
2573
+ "learning_rate": 1.7436672057336967e-05,
2574
+ "loss": 0.3075,
2575
+ "step": 8420
2576
+ },
2577
+ {
2578
+ "epoch": 6.0,
2579
+ "eval_loss": 0.43305763602256775,
2580
+ "eval_runtime": 171.036,
2581
+ "eval_samples_per_second": 32.525,
2582
+ "eval_steps_per_second": 8.133,
2583
+ "step": 8439
2584
+ },
2585
+ {
2586
+ "epoch": 6.0,
2587
+ "learning_rate": 1.7329989408077596e-05,
2588
+ "loss": 0.3047,
2589
+ "step": 8440
2590
+ },
2591
+ {
2592
+ "epoch": 6.01,
2593
+ "learning_rate": 1.722346071997582e-05,
2594
+ "loss": 0.2703,
2595
+ "step": 8460
2596
+ },
2597
+ {
2598
+ "epoch": 6.03,
2599
+ "learning_rate": 1.7117088131396355e-05,
2600
+ "loss": 0.2708,
2601
+ "step": 8480
2602
+ },
2603
+ {
2604
+ "epoch": 6.04,
2605
+ "learning_rate": 1.701087377757053e-05,
2606
+ "loss": 0.2652,
2607
+ "step": 8500
2608
+ },
2609
+ {
2610
+ "epoch": 6.06,
2611
+ "learning_rate": 1.6904819790553407e-05,
2612
+ "loss": 0.2755,
2613
+ "step": 8520
2614
+ },
2615
+ {
2616
+ "epoch": 6.07,
2617
+ "learning_rate": 1.6798928299180978e-05,
2618
+ "loss": 0.2772,
2619
+ "step": 8540
2620
+ },
2621
+ {
2622
+ "epoch": 6.09,
2623
+ "learning_rate": 1.6693201429027427e-05,
2624
+ "loss": 0.2724,
2625
+ "step": 8560
2626
+ },
2627
+ {
2628
+ "epoch": 6.1,
2629
+ "learning_rate": 1.65876413023625e-05,
2630
+ "loss": 0.2712,
2631
+ "step": 8580
2632
+ },
2633
+ {
2634
+ "epoch": 6.11,
2635
+ "learning_rate": 1.6482250038108852e-05,
2636
+ "loss": 0.2676,
2637
+ "step": 8600
2638
+ },
2639
+ {
2640
+ "epoch": 6.13,
2641
+ "learning_rate": 1.6377029751799554e-05,
2642
+ "loss": 0.2785,
2643
+ "step": 8620
2644
+ },
2645
+ {
2646
+ "epoch": 6.14,
2647
+ "learning_rate": 1.627198255553562e-05,
2648
+ "loss": 0.2726,
2649
+ "step": 8640
2650
+ },
2651
+ {
2652
+ "epoch": 6.16,
2653
+ "learning_rate": 1.6167110557943588e-05,
2654
+ "loss": 0.2753,
2655
+ "step": 8660
2656
+ },
2657
+ {
2658
+ "epoch": 6.17,
2659
+ "learning_rate": 1.6062415864133213e-05,
2660
+ "loss": 0.2707,
2661
+ "step": 8680
2662
+ },
2663
+ {
2664
+ "epoch": 6.19,
2665
+ "learning_rate": 1.595790057565522e-05,
2666
+ "loss": 0.2648,
2667
+ "step": 8700
2668
+ },
2669
+ {
2670
+ "epoch": 6.2,
2671
+ "learning_rate": 1.5853566790459102e-05,
2672
+ "loss": 0.2739,
2673
+ "step": 8720
2674
+ },
2675
+ {
2676
+ "epoch": 6.21,
2677
+ "learning_rate": 1.574941660285098e-05,
2678
+ "loss": 0.2706,
2679
+ "step": 8740
2680
+ },
2681
+ {
2682
+ "epoch": 6.23,
2683
+ "learning_rate": 1.5645452103451657e-05,
2684
+ "loss": 0.2715,
2685
+ "step": 8760
2686
+ },
2687
+ {
2688
+ "epoch": 6.24,
2689
+ "learning_rate": 1.5541675379154548e-05,
2690
+ "loss": 0.2714,
2691
+ "step": 8780
2692
+ },
2693
+ {
2694
+ "epoch": 6.26,
2695
+ "learning_rate": 1.5438088513083826e-05,
2696
+ "loss": 0.2785,
2697
+ "step": 8800
2698
+ },
2699
+ {
2700
+ "epoch": 6.27,
2701
+ "learning_rate": 1.5334693584552655e-05,
2702
+ "loss": 0.2719,
2703
+ "step": 8820
2704
+ },
2705
+ {
2706
+ "epoch": 6.29,
2707
+ "learning_rate": 1.523149266902138e-05,
2708
+ "loss": 0.2741,
2709
+ "step": 8840
2710
+ },
2711
+ {
2712
+ "epoch": 6.3,
2713
+ "learning_rate": 1.5128487838055887e-05,
2714
+ "loss": 0.2622,
2715
+ "step": 8860
2716
+ },
2717
+ {
2718
+ "epoch": 6.31,
2719
+ "learning_rate": 1.5025681159286076e-05,
2720
+ "loss": 0.2703,
2721
+ "step": 8880
2722
+ },
2723
+ {
2724
+ "epoch": 6.33,
2725
+ "learning_rate": 1.4923074696364265e-05,
2726
+ "loss": 0.2669,
2727
+ "step": 8900
2728
+ },
2729
+ {
2730
+ "epoch": 6.34,
2731
+ "learning_rate": 1.4820670508923825e-05,
2732
+ "loss": 0.2743,
2733
+ "step": 8920
2734
+ },
2735
+ {
2736
+ "epoch": 6.36,
2737
+ "learning_rate": 1.4718470652537846e-05,
2738
+ "loss": 0.2729,
2739
+ "step": 8940
2740
+ },
2741
+ {
2742
+ "epoch": 6.37,
2743
+ "learning_rate": 1.461647717867783e-05,
2744
+ "loss": 0.2697,
2745
+ "step": 8960
2746
+ },
2747
+ {
2748
+ "epoch": 6.38,
2749
+ "learning_rate": 1.4514692134672523e-05,
2750
+ "loss": 0.275,
2751
+ "step": 8980
2752
+ },
2753
+ {
2754
+ "epoch": 6.4,
2755
+ "learning_rate": 1.4413117563666873e-05,
2756
+ "loss": 0.275,
2757
+ "step": 9000
2758
+ },
2759
+ {
2760
+ "epoch": 6.41,
2761
+ "learning_rate": 1.431175550458094e-05,
2762
+ "loss": 0.2734,
2763
+ "step": 9020
2764
+ },
2765
+ {
2766
+ "epoch": 6.43,
2767
+ "learning_rate": 1.4210607992069003e-05,
2768
+ "loss": 0.2654,
2769
+ "step": 9040
2770
+ },
2771
+ {
2772
+ "epoch": 6.44,
2773
+ "learning_rate": 1.4109677056478748e-05,
2774
+ "loss": 0.2687,
2775
+ "step": 9060
2776
+ },
2777
+ {
2778
+ "epoch": 6.46,
2779
+ "learning_rate": 1.4008964723810459e-05,
2780
+ "loss": 0.2777,
2781
+ "step": 9080
2782
+ },
2783
+ {
2784
+ "epoch": 6.47,
2785
+ "learning_rate": 1.3908473015676359e-05,
2786
+ "loss": 0.2708,
2787
+ "step": 9100
2788
+ },
2789
+ {
2790
+ "epoch": 6.48,
2791
+ "learning_rate": 1.3808203949260098e-05,
2792
+ "loss": 0.2746,
2793
+ "step": 9120
2794
+ },
2795
+ {
2796
+ "epoch": 6.5,
2797
+ "learning_rate": 1.3708159537276161e-05,
2798
+ "loss": 0.2745,
2799
+ "step": 9140
2800
+ },
2801
+ {
2802
+ "epoch": 6.51,
2803
+ "learning_rate": 1.3608341787929518e-05,
2804
+ "loss": 0.2779,
2805
+ "step": 9160
2806
+ },
2807
+ {
2808
+ "epoch": 6.53,
2809
+ "learning_rate": 1.3508752704875344e-05,
2810
+ "loss": 0.2713,
2811
+ "step": 9180
2812
+ },
2813
+ {
2814
+ "epoch": 6.54,
2815
+ "learning_rate": 1.3409394287178727e-05,
2816
+ "loss": 0.2731,
2817
+ "step": 9200
2818
+ },
2819
+ {
2820
+ "epoch": 6.56,
2821
+ "learning_rate": 1.331026852927459e-05,
2822
+ "loss": 0.2743,
2823
+ "step": 9220
2824
+ },
2825
+ {
2826
+ "epoch": 6.57,
2827
+ "learning_rate": 1.3211377420927657e-05,
2828
+ "loss": 0.2706,
2829
+ "step": 9240
2830
+ },
2831
+ {
2832
+ "epoch": 6.58,
2833
+ "learning_rate": 1.311272294719249e-05,
2834
+ "loss": 0.2724,
2835
+ "step": 9260
2836
+ },
2837
+ {
2838
+ "epoch": 6.6,
2839
+ "learning_rate": 1.3014307088373637e-05,
2840
+ "loss": 0.2792,
2841
+ "step": 9280
2842
+ },
2843
+ {
2844
+ "epoch": 6.61,
2845
+ "learning_rate": 1.2916131819985933e-05,
2846
+ "loss": 0.2705,
2847
+ "step": 9300
2848
+ },
2849
+ {
2850
+ "epoch": 6.63,
2851
+ "learning_rate": 1.2818199112714779e-05,
2852
+ "loss": 0.2742,
2853
+ "step": 9320
2854
+ },
2855
+ {
2856
+ "epoch": 6.64,
2857
+ "learning_rate": 1.2720510932376611e-05,
2858
+ "loss": 0.2717,
2859
+ "step": 9340
2860
+ },
2861
+ {
2862
+ "epoch": 6.65,
2863
+ "learning_rate": 1.2623069239879476e-05,
2864
+ "loss": 0.2759,
2865
+ "step": 9360
2866
+ },
2867
+ {
2868
+ "epoch": 6.67,
2869
+ "learning_rate": 1.2525875991183606e-05,
2870
+ "loss": 0.2797,
2871
+ "step": 9380
2872
+ },
2873
+ {
2874
+ "epoch": 6.68,
2875
+ "learning_rate": 1.2428933137262196e-05,
2876
+ "loss": 0.277,
2877
+ "step": 9400
2878
+ },
2879
+ {
2880
+ "epoch": 6.7,
2881
+ "learning_rate": 1.2332242624062225e-05,
2882
+ "loss": 0.2741,
2883
+ "step": 9420
2884
+ },
2885
+ {
2886
+ "epoch": 6.71,
2887
+ "learning_rate": 1.2235806392465435e-05,
2888
+ "loss": 0.2632,
2889
+ "step": 9440
2890
+ },
2891
+ {
2892
+ "epoch": 6.73,
2893
+ "learning_rate": 1.2139626378249299e-05,
2894
+ "loss": 0.2686,
2895
+ "step": 9460
2896
+ },
2897
+ {
2898
+ "epoch": 6.74,
2899
+ "learning_rate": 1.2043704512048217e-05,
2900
+ "loss": 0.274,
2901
+ "step": 9480
2902
+ },
2903
+ {
2904
+ "epoch": 6.75,
2905
+ "learning_rate": 1.194804271931477e-05,
2906
+ "loss": 0.2744,
2907
+ "step": 9500
2908
+ },
2909
+ {
2910
+ "epoch": 6.77,
2911
+ "learning_rate": 1.1852642920281021e-05,
2912
+ "loss": 0.2718,
2913
+ "step": 9520
2914
+ },
2915
+ {
2916
+ "epoch": 6.78,
2917
+ "learning_rate": 1.1757507029920009e-05,
2918
+ "loss": 0.2697,
2919
+ "step": 9540
2920
+ },
2921
+ {
2922
+ "epoch": 6.8,
2923
+ "learning_rate": 1.1662636957907291e-05,
2924
+ "loss": 0.2765,
2925
+ "step": 9560
2926
+ },
2927
+ {
2928
+ "epoch": 6.81,
2929
+ "learning_rate": 1.1568034608582642e-05,
2930
+ "loss": 0.2693,
2931
+ "step": 9580
2932
+ },
2933
+ {
2934
+ "epoch": 6.83,
2935
+ "learning_rate": 1.1473701880911774e-05,
2936
+ "loss": 0.2701,
2937
+ "step": 9600
2938
+ },
2939
+ {
2940
+ "epoch": 6.84,
2941
+ "learning_rate": 1.1379640668448263e-05,
2942
+ "loss": 0.2749,
2943
+ "step": 9620
2944
+ },
2945
+ {
2946
+ "epoch": 6.85,
2947
+ "learning_rate": 1.1285852859295506e-05,
2948
+ "loss": 0.2749,
2949
+ "step": 9640
2950
+ },
2951
+ {
2952
+ "epoch": 6.87,
2953
+ "learning_rate": 1.1192340336068874e-05,
2954
+ "loss": 0.2673,
2955
+ "step": 9660
2956
+ },
2957
+ {
2958
+ "epoch": 6.88,
2959
+ "learning_rate": 1.1099104975857852e-05,
2960
+ "loss": 0.2716,
2961
+ "step": 9680
2962
+ },
2963
+ {
2964
+ "epoch": 6.9,
2965
+ "learning_rate": 1.1006148650188409e-05,
2966
+ "loss": 0.2657,
2967
+ "step": 9700
2968
+ },
2969
+ {
2970
+ "epoch": 6.91,
2971
+ "learning_rate": 1.09134732249854e-05,
2972
+ "loss": 0.2716,
2973
+ "step": 9720
2974
+ },
2975
+ {
2976
+ "epoch": 6.92,
2977
+ "learning_rate": 1.082108056053516e-05,
2978
+ "loss": 0.2706,
2979
+ "step": 9740
2980
+ },
2981
+ {
2982
+ "epoch": 6.94,
2983
+ "learning_rate": 1.0728972511448104e-05,
2984
+ "loss": 0.2756,
2985
+ "step": 9760
2986
+ },
2987
+ {
2988
+ "epoch": 6.95,
2989
+ "learning_rate": 1.063715092662152e-05,
2990
+ "loss": 0.2693,
2991
+ "step": 9780
2992
+ },
2993
+ {
2994
+ "epoch": 6.97,
2995
+ "learning_rate": 1.0545617649202486e-05,
2996
+ "loss": 0.2725,
2997
+ "step": 9800
2998
+ },
2999
+ {
3000
+ "epoch": 6.98,
3001
+ "learning_rate": 1.0454374516550825e-05,
3002
+ "loss": 0.2697,
3003
+ "step": 9820
3004
+ },
3005
+ {
3006
+ "epoch": 7.0,
3007
+ "learning_rate": 1.036342336020224e-05,
3008
+ "loss": 0.2685,
3009
+ "step": 9840
3010
+ },
3011
+ {
3012
+ "epoch": 7.0,
3013
+ "eval_loss": 0.441493421792984,
3014
+ "eval_runtime": 171.0135,
3015
+ "eval_samples_per_second": 32.53,
3016
+ "eval_steps_per_second": 8.134,
3017
+ "step": 9845
3018
+ },
3019
+ {
3020
+ "epoch": 7.01,
3021
+ "learning_rate": 1.0272766005831583e-05,
3022
+ "loss": 0.2493,
3023
+ "step": 9860
3024
+ },
3025
+ {
3026
+ "epoch": 7.02,
3027
+ "learning_rate": 1.0182404273216154e-05,
3028
+ "loss": 0.2425,
3029
+ "step": 9880
3030
+ },
3031
+ {
3032
+ "epoch": 7.04,
3033
+ "learning_rate": 1.0092339976199192e-05,
3034
+ "loss": 0.2442,
3035
+ "step": 9900
3036
+ },
3037
+ {
3038
+ "epoch": 7.05,
3039
+ "learning_rate": 1.0002574922653506e-05,
3040
+ "loss": 0.2374,
3041
+ "step": 9920
3042
+ },
3043
+ {
3044
+ "epoch": 7.07,
3045
+ "learning_rate": 9.91311091444512e-06,
3046
+ "loss": 0.2363,
3047
+ "step": 9940
3048
+ },
3049
+ {
3050
+ "epoch": 7.08,
3051
+ "learning_rate": 9.823949747397134e-06,
3052
+ "loss": 0.2427,
3053
+ "step": 9960
3054
+ },
3055
+ {
3056
+ "epoch": 7.1,
3057
+ "learning_rate": 9.735093211253698e-06,
3058
+ "loss": 0.2417,
3059
+ "step": 9980
3060
+ },
3061
+ {
3062
+ "epoch": 7.11,
3063
+ "learning_rate": 9.64654308964405e-06,
3064
+ "loss": 0.236,
3065
+ "step": 10000
3066
+ },
3067
+ {
3068
+ "epoch": 7.12,
3069
+ "learning_rate": 9.558301160046717e-06,
3070
+ "loss": 0.2411,
3071
+ "step": 10020
3072
+ },
3073
+ {
3074
+ "epoch": 7.14,
3075
+ "learning_rate": 9.470369193753877e-06,
3076
+ "loss": 0.245,
3077
+ "step": 10040
3078
+ },
3079
+ {
3080
+ "epoch": 7.15,
3081
+ "learning_rate": 9.38274895583575e-06,
3082
+ "loss": 0.2474,
3083
+ "step": 10060
3084
+ },
3085
+ {
3086
+ "epoch": 7.17,
3087
+ "learning_rate": 9.295442205105178e-06,
3088
+ "loss": 0.2398,
3089
+ "step": 10080
3090
+ },
3091
+ {
3092
+ "epoch": 7.18,
3093
+ "learning_rate": 9.208450694082373e-06,
3094
+ "loss": 0.2383,
3095
+ "step": 10100
3096
+ },
3097
+ {
3098
+ "epoch": 7.2,
3099
+ "learning_rate": 9.121776168959667e-06,
3100
+ "loss": 0.2429,
3101
+ "step": 10120
3102
+ },
3103
+ {
3104
+ "epoch": 7.21,
3105
+ "learning_rate": 9.035420369566485e-06,
3106
+ "loss": 0.2413,
3107
+ "step": 10140
3108
+ },
3109
+ {
3110
+ "epoch": 7.22,
3111
+ "learning_rate": 8.949385029334459e-06,
3112
+ "loss": 0.2395,
3113
+ "step": 10160
3114
+ },
3115
+ {
3116
+ "epoch": 7.24,
3117
+ "learning_rate": 8.863671875262577e-06,
3118
+ "loss": 0.2384,
3119
+ "step": 10180
3120
+ },
3121
+ {
3122
+ "epoch": 7.25,
3123
+ "learning_rate": 8.778282627882536e-06,
3124
+ "loss": 0.2399,
3125
+ "step": 10200
3126
+ },
3127
+ {
3128
+ "epoch": 7.27,
3129
+ "learning_rate": 8.693219001224239e-06,
3130
+ "loss": 0.2408,
3131
+ "step": 10220
3132
+ },
3133
+ {
3134
+ "epoch": 7.28,
3135
+ "learning_rate": 8.608482702781332e-06,
3136
+ "loss": 0.2382,
3137
+ "step": 10240
3138
+ },
3139
+ {
3140
+ "epoch": 7.29,
3141
+ "learning_rate": 8.524075433476963e-06,
3142
+ "loss": 0.24,
3143
+ "step": 10260
3144
+ },
3145
+ {
3146
+ "epoch": 7.31,
3147
+ "learning_rate": 8.439998887629649e-06,
3148
+ "loss": 0.2434,
3149
+ "step": 10280
3150
+ },
3151
+ {
3152
+ "epoch": 7.32,
3153
+ "learning_rate": 8.356254752919241e-06,
3154
+ "loss": 0.2411,
3155
+ "step": 10300
3156
+ },
3157
+ {
3158
+ "epoch": 7.34,
3159
+ "learning_rate": 8.272844710353036e-06,
3160
+ "loss": 0.2417,
3161
+ "step": 10320
3162
+ },
3163
+ {
3164
+ "epoch": 7.35,
3165
+ "learning_rate": 8.189770434232096e-06,
3166
+ "loss": 0.2389,
3167
+ "step": 10340
3168
+ },
3169
+ {
3170
+ "epoch": 7.37,
3171
+ "learning_rate": 8.10703359211757e-06,
3172
+ "loss": 0.2352,
3173
+ "step": 10360
3174
+ },
3175
+ {
3176
+ "epoch": 7.38,
3177
+ "learning_rate": 8.02463584479724e-06,
3178
+ "loss": 0.234,
3179
+ "step": 10380
3180
+ },
3181
+ {
3182
+ "epoch": 7.39,
3183
+ "learning_rate": 7.942578846252227e-06,
3184
+ "loss": 0.2399,
3185
+ "step": 10400
3186
+ },
3187
+ {
3188
+ "epoch": 7.41,
3189
+ "learning_rate": 7.860864243623726e-06,
3190
+ "loss": 0.2465,
3191
+ "step": 10420
3192
+ },
3193
+ {
3194
+ "epoch": 7.42,
3195
+ "learning_rate": 7.779493677179971e-06,
3196
+ "loss": 0.2455,
3197
+ "step": 10440
3198
+ },
3199
+ {
3200
+ "epoch": 7.44,
3201
+ "learning_rate": 7.698468780283344e-06,
3202
+ "loss": 0.2416,
3203
+ "step": 10460
3204
+ },
3205
+ {
3206
+ "epoch": 7.45,
3207
+ "learning_rate": 7.617791179357522e-06,
3208
+ "loss": 0.2419,
3209
+ "step": 10480
3210
+ },
3211
+ {
3212
+ "epoch": 7.47,
3213
+ "learning_rate": 7.537462493854866e-06,
3214
+ "loss": 0.2501,
3215
+ "step": 10500
3216
+ },
3217
+ {
3218
+ "epoch": 7.48,
3219
+ "learning_rate": 7.457484336223939e-06,
3220
+ "loss": 0.2349,
3221
+ "step": 10520
3222
+ },
3223
+ {
3224
+ "epoch": 7.49,
3225
+ "learning_rate": 7.377858311877081e-06,
3226
+ "loss": 0.2488,
3227
+ "step": 10540
3228
+ },
3229
+ {
3230
+ "epoch": 7.51,
3231
+ "learning_rate": 7.298586019158216e-06,
3232
+ "loss": 0.2435,
3233
+ "step": 10560
3234
+ },
3235
+ {
3236
+ "epoch": 7.52,
3237
+ "learning_rate": 7.219669049310784e-06,
3238
+ "loss": 0.2453,
3239
+ "step": 10580
3240
+ },
3241
+ {
3242
+ "epoch": 7.54,
3243
+ "learning_rate": 7.141108986445768e-06,
3244
+ "loss": 0.2441,
3245
+ "step": 10600
3246
+ },
3247
+ {
3248
+ "epoch": 7.55,
3249
+ "learning_rate": 7.062907407509903e-06,
3250
+ "loss": 0.2388,
3251
+ "step": 10620
3252
+ },
3253
+ {
3254
+ "epoch": 7.56,
3255
+ "learning_rate": 6.985065882254046e-06,
3256
+ "loss": 0.2396,
3257
+ "step": 10640
3258
+ },
3259
+ {
3260
+ "epoch": 7.58,
3261
+ "learning_rate": 6.907585973201633e-06,
3262
+ "loss": 0.2357,
3263
+ "step": 10660
3264
+ },
3265
+ {
3266
+ "epoch": 7.59,
3267
+ "learning_rate": 6.830469235617323e-06,
3268
+ "loss": 0.2455,
3269
+ "step": 10680
3270
+ },
3271
+ {
3272
+ "epoch": 7.61,
3273
+ "learning_rate": 6.7537172174758135e-06,
3274
+ "loss": 0.2471,
3275
+ "step": 10700
3276
+ },
3277
+ {
3278
+ "epoch": 7.62,
3279
+ "learning_rate": 6.677331459430713e-06,
3280
+ "loss": 0.2445,
3281
+ "step": 10720
3282
+ },
3283
+ {
3284
+ "epoch": 7.64,
3285
+ "learning_rate": 6.601313494783648e-06,
3286
+ "loss": 0.2427,
3287
+ "step": 10740
3288
+ },
3289
+ {
3290
+ "epoch": 7.65,
3291
+ "learning_rate": 6.525664849453478e-06,
3292
+ "loss": 0.239,
3293
+ "step": 10760
3294
+ },
3295
+ {
3296
+ "epoch": 7.66,
3297
+ "learning_rate": 6.450387041945677e-06,
3298
+ "loss": 0.2453,
3299
+ "step": 10780
3300
+ },
3301
+ {
3302
+ "epoch": 7.68,
3303
+ "learning_rate": 6.375481583321829e-06,
3304
+ "loss": 0.2401,
3305
+ "step": 10800
3306
+ },
3307
+ {
3308
+ "epoch": 7.69,
3309
+ "learning_rate": 6.3009499771693156e-06,
3310
+ "loss": 0.2417,
3311
+ "step": 10820
3312
+ },
3313
+ {
3314
+ "epoch": 7.71,
3315
+ "learning_rate": 6.226793719571111e-06,
3316
+ "loss": 0.2463,
3317
+ "step": 10840
3318
+ },
3319
+ {
3320
+ "epoch": 7.72,
3321
+ "learning_rate": 6.153014299075799e-06,
3322
+ "loss": 0.2366,
3323
+ "step": 10860
3324
+ },
3325
+ {
3326
+ "epoch": 7.74,
3327
+ "learning_rate": 6.0796131966676324e-06,
3328
+ "loss": 0.237,
3329
+ "step": 10880
3330
+ },
3331
+ {
3332
+ "epoch": 7.75,
3333
+ "learning_rate": 6.006591885736851e-06,
3334
+ "loss": 0.2364,
3335
+ "step": 10900
3336
+ },
3337
+ {
3338
+ "epoch": 7.76,
3339
+ "learning_rate": 5.9339518320500665e-06,
3340
+ "loss": 0.235,
3341
+ "step": 10920
3342
+ },
3343
+ {
3344
+ "epoch": 7.78,
3345
+ "learning_rate": 5.861694493720898e-06,
3346
+ "loss": 0.2448,
3347
+ "step": 10940
3348
+ },
3349
+ {
3350
+ "epoch": 7.79,
3351
+ "learning_rate": 5.789821321180639e-06,
3352
+ "loss": 0.2471,
3353
+ "step": 10960
3354
+ },
3355
+ {
3356
+ "epoch": 7.81,
3357
+ "learning_rate": 5.718333757149183e-06,
3358
+ "loss": 0.2391,
3359
+ "step": 10980
3360
+ },
3361
+ {
3362
+ "epoch": 7.82,
3363
+ "learning_rate": 5.647233236606037e-06,
3364
+ "loss": 0.241,
3365
+ "step": 11000
3366
+ },
3367
+ {
3368
+ "epoch": 7.84,
3369
+ "learning_rate": 5.576521186761563e-06,
3370
+ "loss": 0.2383,
3371
+ "step": 11020
3372
+ },
3373
+ {
3374
+ "epoch": 7.85,
3375
+ "learning_rate": 5.506199027028272e-06,
3376
+ "loss": 0.2455,
3377
+ "step": 11040
3378
+ },
3379
+ {
3380
+ "epoch": 7.86,
3381
+ "learning_rate": 5.436268168992356e-06,
3382
+ "loss": 0.2358,
3383
+ "step": 11060
3384
+ },
3385
+ {
3386
+ "epoch": 7.88,
3387
+ "learning_rate": 5.36673001638538e-06,
3388
+ "loss": 0.2439,
3389
+ "step": 11080
3390
+ },
3391
+ {
3392
+ "epoch": 7.89,
3393
+ "learning_rate": 5.297585965056056e-06,
3394
+ "loss": 0.245,
3395
+ "step": 11100
3396
+ },
3397
+ {
3398
+ "epoch": 7.91,
3399
+ "learning_rate": 5.228837402942252e-06,
3400
+ "loss": 0.2461,
3401
+ "step": 11120
3402
+ },
3403
+ {
3404
+ "epoch": 7.92,
3405
+ "learning_rate": 5.1604857100431445e-06,
3406
+ "loss": 0.2315,
3407
+ "step": 11140
3408
+ },
3409
+ {
3410
+ "epoch": 7.93,
3411
+ "learning_rate": 5.092532258391483e-06,
3412
+ "loss": 0.2451,
3413
+ "step": 11160
3414
+ },
3415
+ {
3416
+ "epoch": 7.95,
3417
+ "learning_rate": 5.0249784120260626e-06,
3418
+ "loss": 0.2364,
3419
+ "step": 11180
3420
+ },
3421
+ {
3422
+ "epoch": 7.96,
3423
+ "learning_rate": 4.957825526964371e-06,
3424
+ "loss": 0.2425,
3425
+ "step": 11200
3426
+ },
3427
+ {
3428
+ "epoch": 7.98,
3429
+ "learning_rate": 4.891074951175328e-06,
3430
+ "loss": 0.2481,
3431
+ "step": 11220
3432
+ },
3433
+ {
3434
+ "epoch": 7.99,
3435
+ "learning_rate": 4.824728024552239e-06,
3436
+ "loss": 0.2408,
3437
+ "step": 11240
3438
+ },
3439
+ {
3440
+ "epoch": 8.0,
3441
+ "eval_loss": 0.46929147839546204,
3442
+ "eval_runtime": 171.1743,
3443
+ "eval_samples_per_second": 32.499,
3444
+ "eval_steps_per_second": 8.126,
3445
+ "step": 11252
3446
+ },
3447
+ {
3448
+ "epoch": 8.01,
3449
+ "learning_rate": 4.758786078885927e-06,
3450
+ "loss": 0.2229,
3451
+ "step": 11260
3452
+ },
3453
+ {
3454
+ "epoch": 8.02,
3455
+ "learning_rate": 4.69325043783796e-06,
3456
+ "loss": 0.2182,
3457
+ "step": 11280
3458
+ },
3459
+ {
3460
+ "epoch": 8.03,
3461
+ "learning_rate": 4.628122416914099e-06,
3462
+ "loss": 0.2152,
3463
+ "step": 11300
3464
+ },
3465
+ {
3466
+ "epoch": 8.05,
3467
+ "learning_rate": 4.563403323437909e-06,
3468
+ "loss": 0.2172,
3469
+ "step": 11320
3470
+ },
3471
+ {
3472
+ "epoch": 8.06,
3473
+ "learning_rate": 4.499094456524478e-06,
3474
+ "loss": 0.2185,
3475
+ "step": 11340
3476
+ },
3477
+ {
3478
+ "epoch": 8.08,
3479
+ "learning_rate": 4.435197107054364e-06,
3480
+ "loss": 0.2163,
3481
+ "step": 11360
3482
+ },
3483
+ {
3484
+ "epoch": 8.09,
3485
+ "learning_rate": 4.371712557647698e-06,
3486
+ "loss": 0.2195,
3487
+ "step": 11380
3488
+ },
3489
+ {
3490
+ "epoch": 8.11,
3491
+ "learning_rate": 4.308642082638401e-06,
3492
+ "loss": 0.2256,
3493
+ "step": 11400
3494
+ },
3495
+ {
3496
+ "epoch": 8.12,
3497
+ "learning_rate": 4.245986948048619e-06,
3498
+ "loss": 0.2176,
3499
+ "step": 11420
3500
+ },
3501
+ {
3502
+ "epoch": 8.13,
3503
+ "learning_rate": 4.18374841156334e-06,
3504
+ "loss": 0.2234,
3505
+ "step": 11440
3506
+ },
3507
+ {
3508
+ "epoch": 8.15,
3509
+ "learning_rate": 4.121927722505095e-06,
3510
+ "loss": 0.2227,
3511
+ "step": 11460
3512
+ },
3513
+ {
3514
+ "epoch": 8.16,
3515
+ "learning_rate": 4.060526121808916e-06,
3516
+ "loss": 0.2255,
3517
+ "step": 11480
3518
+ },
3519
+ {
3520
+ "epoch": 8.18,
3521
+ "learning_rate": 3.999544841997427e-06,
3522
+ "loss": 0.2238,
3523
+ "step": 11500
3524
+ },
3525
+ {
3526
+ "epoch": 8.19,
3527
+ "learning_rate": 3.938985107156082e-06,
3528
+ "loss": 0.2179,
3529
+ "step": 11520
3530
+ },
3531
+ {
3532
+ "epoch": 8.2,
3533
+ "learning_rate": 3.878848132908605e-06,
3534
+ "loss": 0.2217,
3535
+ "step": 11540
3536
+ },
3537
+ {
3538
+ "epoch": 8.22,
3539
+ "learning_rate": 3.819135126392606e-06,
3540
+ "loss": 0.2218,
3541
+ "step": 11560
3542
+ },
3543
+ {
3544
+ "epoch": 8.23,
3545
+ "learning_rate": 3.7598472862353157e-06,
3546
+ "loss": 0.2205,
3547
+ "step": 11580
3548
+ },
3549
+ {
3550
+ "epoch": 8.25,
3551
+ "learning_rate": 3.700985802529544e-06,
3552
+ "loss": 0.2109,
3553
+ "step": 11600
3554
+ },
3555
+ {
3556
+ "epoch": 8.26,
3557
+ "learning_rate": 3.6425518568098087e-06,
3558
+ "loss": 0.2193,
3559
+ "step": 11620
3560
+ },
3561
+ {
3562
+ "epoch": 8.28,
3563
+ "learning_rate": 3.584546622028581e-06,
3564
+ "loss": 0.218,
3565
+ "step": 11640
3566
+ },
3567
+ {
3568
+ "epoch": 8.29,
3569
+ "learning_rate": 3.526971262532758e-06,
3570
+ "loss": 0.2161,
3571
+ "step": 11660
3572
+ },
3573
+ {
3574
+ "epoch": 8.3,
3575
+ "learning_rate": 3.4698269340403157e-06,
3576
+ "loss": 0.2149,
3577
+ "step": 11680
3578
+ },
3579
+ {
3580
+ "epoch": 8.32,
3581
+ "learning_rate": 3.4131147836170634e-06,
3582
+ "loss": 0.2141,
3583
+ "step": 11700
3584
+ },
3585
+ {
3586
+ "epoch": 8.33,
3587
+ "learning_rate": 3.356835949653642e-06,
3588
+ "loss": 0.2174,
3589
+ "step": 11720
3590
+ },
3591
+ {
3592
+ "epoch": 8.35,
3593
+ "learning_rate": 3.3009915618426894e-06,
3594
+ "loss": 0.2254,
3595
+ "step": 11740
3596
+ },
3597
+ {
3598
+ "epoch": 8.36,
3599
+ "learning_rate": 3.2455827411561364e-06,
3600
+ "loss": 0.22,
3601
+ "step": 11760
3602
+ },
3603
+ {
3604
+ "epoch": 8.38,
3605
+ "learning_rate": 3.1906105998227104e-06,
3606
+ "loss": 0.2205,
3607
+ "step": 11780
3608
+ },
3609
+ {
3610
+ "epoch": 8.39,
3611
+ "learning_rate": 3.136076241305633e-06,
3612
+ "loss": 0.2206,
3613
+ "step": 11800
3614
+ },
3615
+ {
3616
+ "epoch": 8.4,
3617
+ "learning_rate": 3.081980760280437e-06,
3618
+ "loss": 0.2198,
3619
+ "step": 11820
3620
+ },
3621
+ {
3622
+ "epoch": 8.42,
3623
+ "learning_rate": 3.0283252426130034e-06,
3624
+ "loss": 0.2262,
3625
+ "step": 11840
3626
+ },
3627
+ {
3628
+ "epoch": 8.43,
3629
+ "learning_rate": 2.9751107653377934e-06,
3630
+ "loss": 0.2234,
3631
+ "step": 11860
3632
+ },
3633
+ {
3634
+ "epoch": 8.45,
3635
+ "learning_rate": 2.9223383966361818e-06,
3636
+ "loss": 0.2215,
3637
+ "step": 11880
3638
+ },
3639
+ {
3640
+ "epoch": 8.46,
3641
+ "learning_rate": 2.870009195815046e-06,
3642
+ "loss": 0.2214,
3643
+ "step": 11900
3644
+ },
3645
+ {
3646
+ "epoch": 8.47,
3647
+ "learning_rate": 2.8181242132854973e-06,
3648
+ "loss": 0.2202,
3649
+ "step": 11920
3650
+ },
3651
+ {
3652
+ "epoch": 8.49,
3653
+ "learning_rate": 2.766684490541796e-06,
3654
+ "loss": 0.2136,
3655
+ "step": 11940
3656
+ },
3657
+ {
3658
+ "epoch": 8.5,
3659
+ "learning_rate": 2.715691060140424e-06,
3660
+ "loss": 0.2136,
3661
+ "step": 11960
3662
+ },
3663
+ {
3664
+ "epoch": 8.52,
3665
+ "learning_rate": 2.665144945679407e-06,
3666
+ "loss": 0.2146,
3667
+ "step": 11980
3668
+ },
3669
+ {
3670
+ "epoch": 8.53,
3671
+ "learning_rate": 2.6150471617777116e-06,
3672
+ "loss": 0.2231,
3673
+ "step": 12000
3674
+ },
3675
+ {
3676
+ "epoch": 8.55,
3677
+ "learning_rate": 2.565398714054917e-06,
3678
+ "loss": 0.2234,
3679
+ "step": 12020
3680
+ },
3681
+ {
3682
+ "epoch": 8.56,
3683
+ "learning_rate": 2.51620059911101e-06,
3684
+ "loss": 0.2208,
3685
+ "step": 12040
3686
+ },
3687
+ {
3688
+ "epoch": 8.57,
3689
+ "learning_rate": 2.4674538045063976e-06,
3690
+ "loss": 0.2227,
3691
+ "step": 12060
3692
+ },
3693
+ {
3694
+ "epoch": 8.59,
3695
+ "learning_rate": 2.4191593087420613e-06,
3696
+ "loss": 0.2274,
3697
+ "step": 12080
3698
+ },
3699
+ {
3700
+ "epoch": 8.6,
3701
+ "learning_rate": 2.3713180812399317e-06,
3702
+ "loss": 0.2202,
3703
+ "step": 12100
3704
+ },
3705
+ {
3706
+ "epoch": 8.62,
3707
+ "learning_rate": 2.3239310823234215e-06,
3708
+ "loss": 0.2156,
3709
+ "step": 12120
3710
+ },
3711
+ {
3712
+ "epoch": 8.63,
3713
+ "learning_rate": 2.2769992631981595e-06,
3714
+ "loss": 0.2208,
3715
+ "step": 12140
3716
+ },
3717
+ {
3718
+ "epoch": 8.65,
3719
+ "learning_rate": 2.230523565932882e-06,
3720
+ "loss": 0.2178,
3721
+ "step": 12160
3722
+ },
3723
+ {
3724
+ "epoch": 8.66,
3725
+ "learning_rate": 2.1845049234405306e-06,
3726
+ "loss": 0.2228,
3727
+ "step": 12180
3728
+ },
3729
+ {
3730
+ "epoch": 8.67,
3731
+ "learning_rate": 2.1389442594595214e-06,
3732
+ "loss": 0.2213,
3733
+ "step": 12200
3734
+ },
3735
+ {
3736
+ "epoch": 8.69,
3737
+ "learning_rate": 2.093842488535219e-06,
3738
+ "loss": 0.2229,
3739
+ "step": 12220
3740
+ },
3741
+ {
3742
+ "epoch": 8.7,
3743
+ "learning_rate": 2.049200516001554e-06,
3744
+ "loss": 0.225,
3745
+ "step": 12240
3746
+ },
3747
+ {
3748
+ "epoch": 8.72,
3749
+ "learning_rate": 2.0050192379628656e-06,
3750
+ "loss": 0.2194,
3751
+ "step": 12260
3752
+ },
3753
+ {
3754
+ "epoch": 8.73,
3755
+ "learning_rate": 1.9612995412759016e-06,
3756
+ "loss": 0.2232,
3757
+ "step": 12280
3758
+ },
3759
+ {
3760
+ "epoch": 8.75,
3761
+ "learning_rate": 1.9180423035320416e-06,
3762
+ "loss": 0.2112,
3763
+ "step": 12300
3764
+ },
3765
+ {
3766
+ "epoch": 8.76,
3767
+ "learning_rate": 1.875248393039658e-06,
3768
+ "loss": 0.2132,
3769
+ "step": 12320
3770
+ },
3771
+ {
3772
+ "epoch": 8.77,
3773
+ "learning_rate": 1.8329186688066797e-06,
3774
+ "loss": 0.2229,
3775
+ "step": 12340
3776
+ },
3777
+ {
3778
+ "epoch": 8.79,
3779
+ "learning_rate": 1.7910539805233827e-06,
3780
+ "loss": 0.2132,
3781
+ "step": 12360
3782
+ },
3783
+ {
3784
+ "epoch": 8.8,
3785
+ "learning_rate": 1.7496551685453028e-06,
3786
+ "loss": 0.2222,
3787
+ "step": 12380
3788
+ },
3789
+ {
3790
+ "epoch": 8.82,
3791
+ "learning_rate": 1.7087230638763745e-06,
3792
+ "loss": 0.2242,
3793
+ "step": 12400
3794
+ },
3795
+ {
3796
+ "epoch": 8.83,
3797
+ "learning_rate": 1.6682584881522634e-06,
3798
+ "loss": 0.2254,
3799
+ "step": 12420
3800
+ },
3801
+ {
3802
+ "epoch": 8.84,
3803
+ "learning_rate": 1.6282622536238551e-06,
3804
+ "loss": 0.2208,
3805
+ "step": 12440
3806
+ },
3807
+ {
3808
+ "epoch": 8.86,
3809
+ "learning_rate": 1.5887351631409614e-06,
3810
+ "loss": 0.2238,
3811
+ "step": 12460
3812
+ },
3813
+ {
3814
+ "epoch": 8.87,
3815
+ "learning_rate": 1.5496780101362074e-06,
3816
+ "loss": 0.2186,
3817
+ "step": 12480
3818
+ },
3819
+ {
3820
+ "epoch": 8.89,
3821
+ "learning_rate": 1.5110915786090918e-06,
3822
+ "loss": 0.2192,
3823
+ "step": 12500
3824
+ },
3825
+ {
3826
+ "epoch": 8.9,
3827
+ "learning_rate": 1.4729766431102604e-06,
3828
+ "loss": 0.2164,
3829
+ "step": 12520
3830
+ },
3831
+ {
3832
+ "epoch": 8.92,
3833
+ "learning_rate": 1.4353339687259632e-06,
3834
+ "loss": 0.2198,
3835
+ "step": 12540
3836
+ },
3837
+ {
3838
+ "epoch": 8.93,
3839
+ "learning_rate": 1.3981643110626775e-06,
3840
+ "loss": 0.2188,
3841
+ "step": 12560
3842
+ },
3843
+ {
3844
+ "epoch": 8.94,
3845
+ "learning_rate": 1.3614684162319564e-06,
3846
+ "loss": 0.2228,
3847
+ "step": 12580
3848
+ },
3849
+ {
3850
+ "epoch": 8.96,
3851
+ "learning_rate": 1.3252470208354518e-06,
3852
+ "loss": 0.2183,
3853
+ "step": 12600
3854
+ },
3855
+ {
3856
+ "epoch": 8.97,
3857
+ "learning_rate": 1.2895008519501206e-06,
3858
+ "loss": 0.2183,
3859
+ "step": 12620
3860
+ },
3861
+ {
3862
+ "epoch": 8.99,
3863
+ "learning_rate": 1.2542306271136284e-06,
3864
+ "loss": 0.218,
3865
+ "step": 12640
3866
+ },
3867
+ {
3868
+ "epoch": 9.0,
3869
+ "eval_loss": 0.47556841373443604,
3870
+ "eval_runtime": 170.6905,
3871
+ "eval_samples_per_second": 32.591,
3872
+ "eval_steps_per_second": 8.149,
3873
+ "step": 12658
3874
+ },
3875
+ {
3876
+ "epoch": 9.0,
3877
+ "learning_rate": 1.2194370543099659e-06,
3878
+ "loss": 0.2195,
3879
+ "step": 12660
3880
+ },
3881
+ {
3882
+ "epoch": 9.02,
3883
+ "learning_rate": 1.1851208319552109e-06,
3884
+ "loss": 0.2087,
3885
+ "step": 12680
3886
+ },
3887
+ {
3888
+ "epoch": 9.03,
3889
+ "learning_rate": 1.1512826488835227e-06,
3890
+ "loss": 0.2115,
3891
+ "step": 12700
3892
+ },
3893
+ {
3894
+ "epoch": 9.04,
3895
+ "learning_rate": 1.1179231843333248e-06,
3896
+ "loss": 0.207,
3897
+ "step": 12720
3898
+ },
3899
+ {
3900
+ "epoch": 9.06,
3901
+ "learning_rate": 1.085043107933642e-06,
3902
+ "loss": 0.21,
3903
+ "step": 12740
3904
+ },
3905
+ {
3906
+ "epoch": 9.07,
3907
+ "learning_rate": 1.0526430796906878e-06,
3908
+ "loss": 0.2057,
3909
+ "step": 12760
3910
+ },
3911
+ {
3912
+ "epoch": 9.09,
3913
+ "learning_rate": 1.0207237499746002e-06,
3914
+ "loss": 0.2121,
3915
+ "step": 12780
3916
+ },
3917
+ {
3918
+ "epoch": 9.1,
3919
+ "learning_rate": 9.892857595063947e-07,
3920
+ "loss": 0.2129,
3921
+ "step": 12800
3922
+ },
3923
+ {
3924
+ "epoch": 9.11,
3925
+ "learning_rate": 9.583297393450929e-07,
3926
+ "loss": 0.2131,
3927
+ "step": 12820
3928
+ },
3929
+ {
3930
+ "epoch": 9.13,
3931
+ "learning_rate": 9.278563108750665e-07,
3932
+ "loss": 0.2127,
3933
+ "step": 12840
3934
+ },
3935
+ {
3936
+ "epoch": 9.14,
3937
+ "learning_rate": 8.978660857935555e-07,
3938
+ "loss": 0.2134,
3939
+ "step": 12860
3940
+ },
3941
+ {
3942
+ "epoch": 9.16,
3943
+ "learning_rate": 8.68359666098395e-07,
3944
+ "loss": 0.2099,
3945
+ "step": 12880
3946
+ },
3947
+ {
3948
+ "epoch": 9.17,
3949
+ "learning_rate": 8.393376440759326e-07,
3950
+ "loss": 0.209,
3951
+ "step": 12900
3952
+ },
3953
+ {
3954
+ "epoch": 9.19,
3955
+ "learning_rate": 8.108006022891274e-07,
3956
+ "loss": 0.2084,
3957
+ "step": 12920
3958
+ },
3959
+ {
3960
+ "epoch": 9.2,
3961
+ "learning_rate": 7.827491135658726e-07,
3962
+ "loss": 0.2091,
3963
+ "step": 12940
3964
+ },
3965
+ {
3966
+ "epoch": 9.21,
3967
+ "learning_rate": 7.551837409874862e-07,
3968
+ "loss": 0.2116,
3969
+ "step": 12960
3970
+ },
3971
+ {
3972
+ "epoch": 9.23,
3973
+ "learning_rate": 7.281050378774135e-07,
3974
+ "loss": 0.2031,
3975
+ "step": 12980
3976
+ },
3977
+ {
3978
+ "epoch": 9.24,
3979
+ "learning_rate": 7.015135477901086e-07,
3980
+ "loss": 0.2038,
3981
+ "step": 13000
3982
+ },
3983
+ {
3984
+ "epoch": 9.26,
3985
+ "learning_rate": 6.754098045001517e-07,
3986
+ "loss": 0.202,
3987
+ "step": 13020
3988
+ },
3989
+ {
3990
+ "epoch": 9.27,
3991
+ "learning_rate": 6.497943319914962e-07,
3992
+ "loss": 0.2145,
3993
+ "step": 13040
3994
+ },
3995
+ {
3996
+ "epoch": 9.29,
3997
+ "learning_rate": 6.246676444469774e-07,
3998
+ "loss": 0.2117,
3999
+ "step": 13060
4000
+ },
4001
+ {
4002
+ "epoch": 9.3,
4003
+ "learning_rate": 6.000302462379898e-07,
4004
+ "loss": 0.2067,
4005
+ "step": 13080
4006
+ },
4007
+ {
4008
+ "epoch": 9.31,
4009
+ "learning_rate": 5.758826319143512e-07,
4010
+ "loss": 0.2014,
4011
+ "step": 13100
4012
+ },
4013
+ {
4014
+ "epoch": 9.33,
4015
+ "learning_rate": 5.5222528619438e-07,
4016
+ "loss": 0.209,
4017
+ "step": 13120
4018
+ },
4019
+ {
4020
+ "epoch": 9.34,
4021
+ "learning_rate": 5.29058683955172e-07,
4022
+ "loss": 0.2062,
4023
+ "step": 13140
4024
+ },
4025
+ {
4026
+ "epoch": 9.36,
4027
+ "learning_rate": 5.063832902230586e-07,
4028
+ "loss": 0.2086,
4029
+ "step": 13160
4030
+ },
4031
+ {
4032
+ "epoch": 9.37,
4033
+ "learning_rate": 4.841995601642751e-07,
4034
+ "loss": 0.2134,
4035
+ "step": 13180
4036
+ },
4037
+ {
4038
+ "epoch": 9.38,
4039
+ "learning_rate": 4.625079390758319e-07,
4040
+ "loss": 0.2027,
4041
+ "step": 13200
4042
+ },
4043
+ {
4044
+ "epoch": 9.4,
4045
+ "learning_rate": 4.41308862376566e-07,
4046
+ "loss": 0.2098,
4047
+ "step": 13220
4048
+ },
4049
+ {
4050
+ "epoch": 9.41,
4051
+ "learning_rate": 4.2060275559840377e-07,
4052
+ "loss": 0.2086,
4053
+ "step": 13240
4054
+ },
4055
+ {
4056
+ "epoch": 9.43,
4057
+ "learning_rate": 4.0039003437782055e-07,
4058
+ "loss": 0.2082,
4059
+ "step": 13260
4060
+ },
4061
+ {
4062
+ "epoch": 9.44,
4063
+ "learning_rate": 3.80671104447497e-07,
4064
+ "loss": 0.2104,
4065
+ "step": 13280
4066
+ },
4067
+ {
4068
+ "epoch": 9.46,
4069
+ "learning_rate": 3.61446361628176e-07,
4070
+ "loss": 0.2075,
4071
+ "step": 13300
4072
+ },
4073
+ {
4074
+ "epoch": 9.47,
4075
+ "learning_rate": 3.427161918207106e-07,
4076
+ "loss": 0.2133,
4077
+ "step": 13320
4078
+ },
4079
+ {
4080
+ "epoch": 9.48,
4081
+ "learning_rate": 3.2448097099833095e-07,
4082
+ "loss": 0.2087,
4083
+ "step": 13340
4084
+ },
4085
+ {
4086
+ "epoch": 9.5,
4087
+ "learning_rate": 3.0674106519908155e-07,
4088
+ "loss": 0.2049,
4089
+ "step": 13360
4090
+ },
4091
+ {
4092
+ "epoch": 9.51,
4093
+ "learning_rate": 2.8949683051848754e-07,
4094
+ "loss": 0.2087,
4095
+ "step": 13380
4096
+ },
4097
+ {
4098
+ "epoch": 9.53,
4099
+ "learning_rate": 2.727486131023971e-07,
4100
+ "loss": 0.2017,
4101
+ "step": 13400
4102
+ },
4103
+ {
4104
+ "epoch": 9.54,
4105
+ "learning_rate": 2.564967491400394e-07,
4106
+ "loss": 0.2105,
4107
+ "step": 13420
4108
+ },
4109
+ {
4110
+ "epoch": 9.56,
4111
+ "learning_rate": 2.4074156485727197e-07,
4112
+ "loss": 0.2055,
4113
+ "step": 13440
4114
+ },
4115
+ {
4116
+ "epoch": 9.57,
4117
+ "learning_rate": 2.2548337651003837e-07,
4118
+ "loss": 0.213,
4119
+ "step": 13460
4120
+ },
4121
+ {
4122
+ "epoch": 9.58,
4123
+ "learning_rate": 2.1072249037800418e-07,
4124
+ "loss": 0.2072,
4125
+ "step": 13480
4126
+ },
4127
+ {
4128
+ "epoch": 9.6,
4129
+ "learning_rate": 1.9645920275843943e-07,
4130
+ "loss": 0.2115,
4131
+ "step": 13500
4132
+ },
4133
+ {
4134
+ "epoch": 9.61,
4135
+ "learning_rate": 1.8269379996023183e-07,
4136
+ "loss": 0.2093,
4137
+ "step": 13520
4138
+ },
4139
+ {
4140
+ "epoch": 9.63,
4141
+ "learning_rate": 1.6942655829817189e-07,
4142
+ "loss": 0.2134,
4143
+ "step": 13540
4144
+ },
4145
+ {
4146
+ "epoch": 9.64,
4147
+ "learning_rate": 1.566577440873962e-07,
4148
+ "loss": 0.2098,
4149
+ "step": 13560
4150
+ },
4151
+ {
4152
+ "epoch": 9.66,
4153
+ "learning_rate": 1.4438761363803067e-07,
4154
+ "loss": 0.209,
4155
+ "step": 13580
4156
+ },
4157
+ {
4158
+ "epoch": 9.67,
4159
+ "learning_rate": 1.3261641325006124e-07,
4160
+ "loss": 0.2119,
4161
+ "step": 13600
4162
+ },
4163
+ {
4164
+ "epoch": 9.68,
4165
+ "learning_rate": 1.213443792083796e-07,
4166
+ "loss": 0.2116,
4167
+ "step": 13620
4168
+ },
4169
+ {
4170
+ "epoch": 9.7,
4171
+ "learning_rate": 1.1057173777804797e-07,
4172
+ "loss": 0.2091,
4173
+ "step": 13640
4174
+ },
4175
+ {
4176
+ "epoch": 9.71,
4177
+ "learning_rate": 1.0029870519975004e-07,
4178
+ "loss": 0.2074,
4179
+ "step": 13660
4180
+ },
4181
+ {
4182
+ "epoch": 9.73,
4183
+ "learning_rate": 9.052548768545832e-08,
4184
+ "loss": 0.2092,
4185
+ "step": 13680
4186
+ },
4187
+ {
4188
+ "epoch": 9.74,
4189
+ "learning_rate": 8.125228141428465e-08,
4190
+ "loss": 0.2043,
4191
+ "step": 13700
4192
+ },
4193
+ {
4194
+ "epoch": 9.75,
4195
+ "learning_rate": 7.247927252854725e-08,
4196
+ "loss": 0.215,
4197
+ "step": 13720
4198
+ },
4199
+ {
4200
+ "epoch": 9.77,
4201
+ "learning_rate": 6.420663713004038e-08,
4202
+ "loss": 0.2143,
4203
+ "step": 13740
4204
+ },
4205
+ {
4206
+ "epoch": 9.78,
4207
+ "learning_rate": 5.643454127648995e-08,
4208
+ "loss": 0.2043,
4209
+ "step": 13760
4210
+ },
4211
+ {
4212
+ "epoch": 9.8,
4213
+ "learning_rate": 4.9163140978225605e-08,
4214
+ "loss": 0.2103,
4215
+ "step": 13780
4216
+ },
4217
+ {
4218
+ "epoch": 9.81,
4219
+ "learning_rate": 4.239258219504716e-08,
4220
+ "loss": 0.2062,
4221
+ "step": 13800
4222
+ },
4223
+ {
4224
+ "epoch": 9.83,
4225
+ "learning_rate": 3.612300083329079e-08,
4226
+ "loss": 0.2076,
4227
+ "step": 13820
4228
+ },
4229
+ {
4230
+ "epoch": 9.84,
4231
+ "learning_rate": 3.035452274311457e-08,
4232
+ "loss": 0.2104,
4233
+ "step": 13840
4234
+ },
4235
+ {
4236
+ "epoch": 9.85,
4237
+ "learning_rate": 2.5087263715953268e-08,
4238
+ "loss": 0.2117,
4239
+ "step": 13860
4240
+ },
4241
+ {
4242
+ "epoch": 9.87,
4243
+ "learning_rate": 2.0321329482209107e-08,
4244
+ "loss": 0.206,
4245
+ "step": 13880
4246
+ },
4247
+ {
4248
+ "epoch": 9.88,
4249
+ "learning_rate": 1.605681570912565e-08,
4250
+ "loss": 0.2169,
4251
+ "step": 13900
4252
+ },
4253
+ {
4254
+ "epoch": 9.9,
4255
+ "learning_rate": 1.2293807998858819e-08,
4256
+ "loss": 0.2124,
4257
+ "step": 13920
4258
+ },
4259
+ {
4260
+ "epoch": 9.91,
4261
+ "learning_rate": 9.03238188677269e-09,
4262
+ "loss": 0.2097,
4263
+ "step": 13940
4264
+ },
4265
+ {
4266
+ "epoch": 9.93,
4267
+ "learning_rate": 6.272602839915709e-09,
4268
+ "loss": 0.2076,
4269
+ "step": 13960
4270
+ },
4271
+ {
4272
+ "epoch": 9.94,
4273
+ "learning_rate": 4.014526255702311e-09,
4274
+ "loss": 0.2033,
4275
+ "step": 13980
4276
+ },
4277
+ {
4278
+ "epoch": 9.95,
4279
+ "learning_rate": 2.2581974608082425e-09,
4280
+ "loss": 0.2207,
4281
+ "step": 14000
4282
+ },
4283
+ {
4284
+ "epoch": 9.97,
4285
+ "learning_rate": 1.0036517102601784e-09,
4286
+ "loss": 0.2111,
4287
+ "step": 14020
4288
+ },
4289
+ {
4290
+ "epoch": 9.98,
4291
+ "learning_rate": 2.509141867224063e-10,
4292
+ "loss": 0.2116,
4293
+ "step": 14040
4294
+ },
4295
+ {
4296
+ "epoch": 10.0,
4297
+ "learning_rate": 0.0,
4298
+ "loss": 0.2079,
4299
+ "step": 14060
4300
+ },
4301
+ {
4302
+ "epoch": 10.0,
4303
+ "eval_loss": 0.48169249296188354,
4304
+ "eval_runtime": 171.0447,
4305
+ "eval_samples_per_second": 32.524,
4306
+ "eval_steps_per_second": 8.132,
4307
+ "step": 14060
4308
+ },
4309
+ {
4310
+ "epoch": 10.0,
4311
+ "step": 14060,
4312
+ "total_flos": 4.511702957167411e+18,
4313
+ "train_loss": 0.43939053083042673,
4314
+ "train_runtime": 38556.6993,
4315
+ "train_samples_per_second": 11.673,
4316
+ "train_steps_per_second": 0.365
4317
+ }
4318
+ ],
4319
+ "logging_steps": 20,
4320
+ "max_steps": 14060,
4321
+ "num_input_tokens_seen": 0,
4322
+ "num_train_epochs": 10,
4323
+ "save_steps": 500,
4324
+ "total_flos": 4.511702957167411e+18,
4325
+ "train_batch_size": 8,
4326
+ "trial_name": null,
4327
+ "trial_params": null
4328
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01fd5cc7e55c8b246d28ba91d73ca5dce0ad7a11f88b117f2f120103c53aeb74
3
+ size 4792