vodkaslime commited on
Commit
d5ea435
·
1 Parent(s): 4ca8960

Upload folder using huggingface_hub

Browse files
LICENSE.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ STABLECODE RESEARCH LICENSE AGREEMENT
2
+ Dated: August 8, 2023
3
+
4
+ "Agreement" means the terms and conditions for use, reproduction, distribution and modification of the Software Products set forth herein.
5
+
6
+ "Documentation" means any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software.
7
+
8
+ "Licensee" or "you" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
9
+
10
+ "Stability AI" or "we" means Stability AI Ltd.
11
+
12
+ "Software" means, collectively, Stability AI’s proprietary StableCode made available under this Agreement.
13
+
14
+ "Software Products" means Software and Documentation.
15
+
16
+ By using or distributing any portion or element of the Software Products, you agree to be bound by this Agreement.
17
+
18
+ 1. License Rights and Redistribution.
19
+
20
+ a. Subject to your compliance with this Agreement and the Documentation, Stability AI grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Software Products to reproduce, distribute, and create derivative works of the Software Products solely for your non-commercial research purposes.
21
+
22
+ b. If you distribute or make the Software Products, or any derivative works thereof, available to a third party, you shall (i) provide a copy of this Agreement to such third party, and (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "StableCode is licensed under the StableCode Research License, Copyright (c) Stability AI Ltd. All Rights Reserved.”
23
+
24
+ 2. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE SOFTWARE PRODUCTS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE SOFTWARE PRODUCTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE SOFTWARE PRODUCTS AND ANY OUTPUT AND RESULTS.
25
+
26
+ 3. Limitation of Liability. IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
27
+
28
+ 4. Intellectual Property.
29
+
30
+ a. No trademark licenses are granted under this Agreement, and in connection with the Software Products, neither Stability AI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Software Products.
31
+
32
+ b. Subject to Stability AI’s ownership of the Software Products and derivatives made by or for Stability AI, with respect to any derivative works and modifications of the Software Products that are made by you, as between you and Stability AI, you are and will be the owner of such derivative works and modifications.
33
+
34
+ c. If you institute litigation or other proceedings against Stability AI (including a cross-claim or counterclaim in a lawsuit) alleging that the Software Products or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to your use or distribution of the Software Products in violation of this Agreement.
35
+
36
+ 5. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Software Products and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Software Products. Sections 2-4 shall survive the termination of this Agreement.
NOTICE.md ADDED
@@ -0,0 +1 @@
 
 
1
+ StableCode is licensed under the StableCode Research License, Copyright (c) Stability AI Ltd. All Rights Reserved
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - code
4
+ tags:
5
+ - causal-lm
6
+ model-index:
7
+ - name: stabilityai/stablecode-instruct-alpha-3b
8
+ results:
9
+ - task:
10
+ type: text-generation
11
+ dataset:
12
+ type: openai_humaneval
13
+ name: HumanEval
14
+ metrics:
15
+ - name: pass@1
16
+ type: pass@1
17
+ value: 0.2689
18
+ verified: false
19
+ - name: pass@10
20
+ type: pass@10
21
+ value: 0.3618
22
+ verified: false
23
+
24
+ license: other
25
+ extra_gated_prompt: >
26
+ STABLECODE RESEARCH LICENSE AGREEMENT
27
+ Dated: August 8, 2023
28
+
29
+ "Agreement" means the terms and conditions for use, reproduction, distribution and modification of the Software Products set forth herein.
30
+
31
+ "Documentation" means any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software.
32
+
33
+ "Licensee" or "you" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
34
+
35
+ "Stability AI" or "we" means Stability AI Ltd.
36
+
37
+ "Software" means, collectively, Stability AI’s proprietary StableCode made available under this Agreement.
38
+
39
+ "Software Products" means Software and Documentation.
40
+
41
+ By using or distributing any portion or element of the Software Products, you agree to be bound by this Agreement.
42
+
43
+ 1. License Rights and Redistribution.
44
+
45
+ a. Subject to your compliance with this Agreement and the Documentation, Stability AI grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Software Products to reproduce, distribute, and create derivative works of the Software Products solely for your non-commercial research purposes.
46
+
47
+ b. If you distribute or make the Software Products, or any derivative works thereof, available to a third party, you shall (i) provide a copy of this Agreement to such third party, and (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "StableCode is licensed under the StableCode Research License, Copyright (c) Stability AI Ltd. All Rights Reserved.”
48
+
49
+ 2. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE SOFTWARE PRODUCTS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE SOFTWARE PRODUCTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE SOFTWARE PRODUCTS AND ANY OUTPUT AND RESULTS.
50
+
51
+ 3. Limitation of Liability. IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
52
+
53
+ 4. Intellectual Property.
54
+
55
+ a. No trademark licenses are granted under this Agreement, and in connection with the Software Products, neither Stability AI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Software Products.
56
+
57
+ b. Subject to Stability AI’s ownership of the Software Products and derivatives made by or for Stability AI, with respect to any derivative works and modifications of the Software Products that are made by you, as between you and Stability AI, you are and will be the owner of such derivative works and modifications.
58
+
59
+ c. If you institute litigation or other proceedings against Stability AI (including a cross-claim or counterclaim in a lawsuit) alleging that the Software Products or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to your use or distribution of the Software Products in violation of this Agreement.
60
+
61
+ 5. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Software Products and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Software Products. Sections 2-4 shall survive the termination of this Agreement.
62
+ extra_gated_fields:
63
+ # Company: text
64
+ # Country: text
65
+ I agree to use this model for research use ONLY: checkbox
66
+
67
+ ---
68
+ # `StableCode-Instruct-Alpha-3B`
69
+
70
+ ## Model Description
71
+
72
+ `StableCode-Instruct-Alpha-3B` is a 3 billion parameter decoder-only instruction tuned code model pre-trained on diverse set of programming languages that topped the stackoverflow developer survey.
73
+
74
+ ## Usage
75
+ The model is intended to follow instruction to generate code. The dataset used to train the model is formatted in Alpaca format.
76
+ Get started generating code with `StableCode-Instruct-Alpha-3B` by using the following code snippet:
77
+
78
+ ```python
79
+ from transformers import AutoModelForCausalLM, AutoTokenizer
80
+ tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
81
+ model = AutoModelForCausalLM.from_pretrained(
82
+ "stabilityai/stablecode-instruct-alpha-3b",
83
+ trust_remote_code=True,
84
+ torch_dtype="auto",
85
+ )
86
+ model.cuda()
87
+ inputs = tokenizer("###Instruction\nGenerate a python function to find number of CPU cores###Response\n", return_tensors="pt").to("cuda")
88
+ tokens = model.generate(
89
+ **inputs,
90
+ max_new_tokens=48,
91
+ temperature=0.2,
92
+ do_sample=True,
93
+ )
94
+ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
95
+ ```
96
+
97
+ ## Model Details
98
+
99
+ * **Developed by**: [Stability AI](https://stability.ai/)
100
+ * **Model type**: `StableCode-Instruct-Alpha-3B` models are auto-regressive language models based on the transformer decoder architecture.
101
+ * **Language(s)**: Code
102
+ * **Library**: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
103
+ * **License** : Model checkpoints are licensed under the [StableCode Research License](https://huggingface.co/stabilityai/stablecode-instruct-alpha-3b/blob/main/LICENSE.md) Copyright (c) Stability AI Ltd. All Rights Reserved
104
+ * **Contact**: For questions and comments about the model, please email `lm@stability.ai`
105
+
106
+ ### Model Architecture
107
+
108
+ | Parameters | Hidden Size | Layers | Heads | Sequence Length |
109
+ |----------------|-------------|--------|-------|-----------------|
110
+ | 2,796,431,360 | 2560 | 32 | 32 | 4096 |
111
+
112
+
113
+ * **Decoder Layer**: Parallel Attention and MLP residuals with a single input LayerNorm ([Wang & Komatsuzaki, 2021](https://github.com/kingoflolz/mesh-transformer-jax/tree/master))
114
+ * **Position Embeddings**: Rotary Position Embeddings ([Su et al., 2021](https://arxiv.org/abs/2104.09864))
115
+ * **Bias**: LayerNorm bias terms only
116
+
117
+ ## Training
118
+
119
+ `StableCode-Instruct-Alpha-3B` is the instruction finetuned version on [StableCode-Completion-Alpha-3B](https://huggingface.co/stabilityai/stablecode-completion-alpha-3b) with code instruction datasets.
120
+
121
+ ## Use and Limitations
122
+
123
+ ### Intended Use
124
+
125
+ StableCode-Instruct-Alpha-3B independently generates new code completions, but we recommend that you use StableCode-Instruct-Alpha-3B together with the tool developed by BigCode and HuggingFace [(huggingface/huggingface-vscode: Code completion VSCode extension for OSS models (github.com))](https://github.com/huggingface/huggingface-vscode), to identify and, if necessary, attribute any outputs that match training code.
126
+
127
+ ### Limitations and bias
128
+
129
+ This model is intended to be used responsibly. It is not intended to be used to create unlawful content of any kind, to further any unlawful activity, or to engage in activities with a high risk of physical or economic harm.
130
+
131
+ ## How to cite
132
+
133
+ ```bibtex
134
+ @misc{StableCodeInstructAlpha,
135
+ url={[https://huggingface.co/stabilityai/stablecode-instruct-alpha-3b](https://huggingface.co/stabilityai/stablecode-instruct-alpha-3b)},
136
+ title={Stable Code Instruct Alpha},
137
+ author={Adithyan, Reshinth and Phung, Duy and Cooper, Nathan and Pinnaparaju, Nikhil and Laforte, Christian}
138
+ }
139
+ ```
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/fsx/ckpts/3b_tok=starcoder_data=star_coder_model=stable-code-no-fim-specific-langs/global_step150000_hf",
3
+ "architectures": [
4
+ "GPTNeoXForCausalLM"
5
+ ],
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": 0.1,
8
+ "eos_token_id": 0,
9
+ "hidden_act": "gelu",
10
+ "hidden_size": 2560,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 10240,
13
+ "layer_norm_eps": 1e-05,
14
+ "max_position_embeddings": 4096,
15
+ "model_type": "gpt_neox",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 32,
18
+ "rotary_emb_base": 10000,
19
+ "rotary_pct": 0.25,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.30.2",
23
+ "use_cache": false,
24
+ "use_parallel_residual": true,
25
+ "vocab_size": 49152
26
+ }
ctranslate2/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": null,
3
+ "eos_token": "<|endoftext|>",
4
+ "layer_norm_epsilon": null,
5
+ "unk_token": null
6
+ }
ctranslate2/model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6e02c248f9f5aeeb6cf1da4e2d3672a6054d7a56bba7cdb5a9c0aef4abff534
3
+ size 5538646818
ctranslate2/vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.30.2"
6
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7674c4fee28a3bca6cdb6950ad03719b6932845873e169ab2cb26420f91b74ce
3
+ size 6075550304
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ec4a192da28265510f0407bccbc0205c1b8293488523ca84ecb0255952c4fd8
3
+ size 6075650877
special_tokens_map.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "eos_token": "<|endoftext|>",
3
+ "pad_token": "<|endoftext|>"
4
+ }
tabby.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "auto_model": "AutoModelForCausalLM"
3
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "model_max_length": 2048,
4
+ "padding_side": "right",
5
+ "tokenizer_class": "PreTrainedTokenizerFast"
6
+ }
trainer_state.json ADDED
@@ -0,0 +1,4327 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 3.0,
5
+ "global_step": 717,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.0,
12
+ "learning_rate": 9.090909090909091e-07,
13
+ "loss": 0.7678,
14
+ "step": 1
15
+ },
16
+ {
17
+ "epoch": 0.01,
18
+ "learning_rate": 1.8181818181818183e-06,
19
+ "loss": 0.7731,
20
+ "step": 2
21
+ },
22
+ {
23
+ "epoch": 0.01,
24
+ "learning_rate": 2.7272727272727272e-06,
25
+ "loss": 0.8476,
26
+ "step": 3
27
+ },
28
+ {
29
+ "epoch": 0.02,
30
+ "learning_rate": 3.6363636363636366e-06,
31
+ "loss": 0.783,
32
+ "step": 4
33
+ },
34
+ {
35
+ "epoch": 0.02,
36
+ "learning_rate": 4.5454545454545455e-06,
37
+ "loss": 0.7332,
38
+ "step": 5
39
+ },
40
+ {
41
+ "epoch": 0.03,
42
+ "learning_rate": 5.4545454545454545e-06,
43
+ "loss": 0.6563,
44
+ "step": 6
45
+ },
46
+ {
47
+ "epoch": 0.03,
48
+ "learning_rate": 6.363636363636364e-06,
49
+ "loss": 0.6574,
50
+ "step": 7
51
+ },
52
+ {
53
+ "epoch": 0.03,
54
+ "learning_rate": 7.272727272727273e-06,
55
+ "loss": 0.6471,
56
+ "step": 8
57
+ },
58
+ {
59
+ "epoch": 0.04,
60
+ "learning_rate": 8.181818181818183e-06,
61
+ "loss": 0.5971,
62
+ "step": 9
63
+ },
64
+ {
65
+ "epoch": 0.04,
66
+ "learning_rate": 9.090909090909091e-06,
67
+ "loss": 0.5837,
68
+ "step": 10
69
+ },
70
+ {
71
+ "epoch": 0.05,
72
+ "learning_rate": 1e-05,
73
+ "loss": 0.5884,
74
+ "step": 11
75
+ },
76
+ {
77
+ "epoch": 0.05,
78
+ "learning_rate": 1.0909090909090909e-05,
79
+ "loss": 0.5916,
80
+ "step": 12
81
+ },
82
+ {
83
+ "epoch": 0.05,
84
+ "learning_rate": 1.181818181818182e-05,
85
+ "loss": 0.5759,
86
+ "step": 13
87
+ },
88
+ {
89
+ "epoch": 0.06,
90
+ "learning_rate": 1.2727272727272728e-05,
91
+ "loss": 0.5464,
92
+ "step": 14
93
+ },
94
+ {
95
+ "epoch": 0.06,
96
+ "learning_rate": 1.3636363636363637e-05,
97
+ "loss": 0.5675,
98
+ "step": 15
99
+ },
100
+ {
101
+ "epoch": 0.07,
102
+ "learning_rate": 1.4545454545454546e-05,
103
+ "loss": 0.5477,
104
+ "step": 16
105
+ },
106
+ {
107
+ "epoch": 0.07,
108
+ "learning_rate": 1.5454545454545454e-05,
109
+ "loss": 0.5162,
110
+ "step": 17
111
+ },
112
+ {
113
+ "epoch": 0.08,
114
+ "learning_rate": 1.6363636363636366e-05,
115
+ "loss": 0.5018,
116
+ "step": 18
117
+ },
118
+ {
119
+ "epoch": 0.08,
120
+ "learning_rate": 1.7272727272727274e-05,
121
+ "loss": 0.5149,
122
+ "step": 19
123
+ },
124
+ {
125
+ "epoch": 0.08,
126
+ "learning_rate": 1.8181818181818182e-05,
127
+ "loss": 0.5485,
128
+ "step": 20
129
+ },
130
+ {
131
+ "epoch": 0.09,
132
+ "learning_rate": 1.9090909090909094e-05,
133
+ "loss": 0.5174,
134
+ "step": 21
135
+ },
136
+ {
137
+ "epoch": 0.09,
138
+ "learning_rate": 2e-05,
139
+ "loss": 0.5476,
140
+ "step": 22
141
+ },
142
+ {
143
+ "epoch": 0.1,
144
+ "learning_rate": 1.9999897835644166e-05,
145
+ "loss": 0.5322,
146
+ "step": 23
147
+ },
148
+ {
149
+ "epoch": 0.1,
150
+ "learning_rate": 1.9999591344664163e-05,
151
+ "loss": 0.4921,
152
+ "step": 24
153
+ },
154
+ {
155
+ "epoch": 0.1,
156
+ "learning_rate": 1.9999080533322486e-05,
157
+ "loss": 0.5514,
158
+ "step": 25
159
+ },
160
+ {
161
+ "epoch": 0.11,
162
+ "learning_rate": 1.9998365412056476e-05,
163
+ "loss": 0.4966,
164
+ "step": 26
165
+ },
166
+ {
167
+ "epoch": 0.11,
168
+ "learning_rate": 1.999744599547812e-05,
169
+ "loss": 0.4834,
170
+ "step": 27
171
+ },
172
+ {
173
+ "epoch": 0.12,
174
+ "learning_rate": 1.999632230237373e-05,
175
+ "loss": 0.5289,
176
+ "step": 28
177
+ },
178
+ {
179
+ "epoch": 0.12,
180
+ "learning_rate": 1.999499435570359e-05,
181
+ "loss": 0.4914,
182
+ "step": 29
183
+ },
184
+ {
185
+ "epoch": 0.13,
186
+ "learning_rate": 1.999346218260146e-05,
187
+ "loss": 0.5273,
188
+ "step": 30
189
+ },
190
+ {
191
+ "epoch": 0.13,
192
+ "learning_rate": 1.999172581437403e-05,
193
+ "loss": 0.4999,
194
+ "step": 31
195
+ },
196
+ {
197
+ "epoch": 0.13,
198
+ "learning_rate": 1.9989785286500294e-05,
199
+ "loss": 0.4953,
200
+ "step": 32
201
+ },
202
+ {
203
+ "epoch": 0.14,
204
+ "learning_rate": 1.9987640638630812e-05,
205
+ "loss": 0.4888,
206
+ "step": 33
207
+ },
208
+ {
209
+ "epoch": 0.14,
210
+ "learning_rate": 1.998529191458689e-05,
211
+ "loss": 0.4968,
212
+ "step": 34
213
+ },
214
+ {
215
+ "epoch": 0.15,
216
+ "learning_rate": 1.9982739162359707e-05,
217
+ "loss": 0.4915,
218
+ "step": 35
219
+ },
220
+ {
221
+ "epoch": 0.15,
222
+ "learning_rate": 1.997998243410932e-05,
223
+ "loss": 0.4958,
224
+ "step": 36
225
+ },
226
+ {
227
+ "epoch": 0.15,
228
+ "learning_rate": 1.99770217861636e-05,
229
+ "loss": 0.5136,
230
+ "step": 37
231
+ },
232
+ {
233
+ "epoch": 0.16,
234
+ "learning_rate": 1.9973857279017092e-05,
235
+ "loss": 0.4989,
236
+ "step": 38
237
+ },
238
+ {
239
+ "epoch": 0.16,
240
+ "learning_rate": 1.9970488977329757e-05,
241
+ "loss": 0.4797,
242
+ "step": 39
243
+ },
244
+ {
245
+ "epoch": 0.17,
246
+ "learning_rate": 1.996691694992567e-05,
247
+ "loss": 0.4806,
248
+ "step": 40
249
+ },
250
+ {
251
+ "epoch": 0.17,
252
+ "learning_rate": 1.9963141269791606e-05,
253
+ "loss": 0.4641,
254
+ "step": 41
255
+ },
256
+ {
257
+ "epoch": 0.18,
258
+ "learning_rate": 1.9959162014075553e-05,
259
+ "loss": 0.4734,
260
+ "step": 42
261
+ },
262
+ {
263
+ "epoch": 0.18,
264
+ "learning_rate": 1.995497926408513e-05,
265
+ "loss": 0.4716,
266
+ "step": 43
267
+ },
268
+ {
269
+ "epoch": 0.18,
270
+ "learning_rate": 1.9950593105285927e-05,
271
+ "loss": 0.4856,
272
+ "step": 44
273
+ },
274
+ {
275
+ "epoch": 0.19,
276
+ "learning_rate": 1.9946003627299766e-05,
277
+ "loss": 0.4658,
278
+ "step": 45
279
+ },
280
+ {
281
+ "epoch": 0.19,
282
+ "learning_rate": 1.9941210923902853e-05,
283
+ "loss": 0.4677,
284
+ "step": 46
285
+ },
286
+ {
287
+ "epoch": 0.2,
288
+ "learning_rate": 1.9936215093023884e-05,
289
+ "loss": 0.4633,
290
+ "step": 47
291
+ },
292
+ {
293
+ "epoch": 0.2,
294
+ "learning_rate": 1.9931016236742026e-05,
295
+ "loss": 0.4699,
296
+ "step": 48
297
+ },
298
+ {
299
+ "epoch": 0.21,
300
+ "learning_rate": 1.992561446128484e-05,
301
+ "loss": 0.44,
302
+ "step": 49
303
+ },
304
+ {
305
+ "epoch": 0.21,
306
+ "learning_rate": 1.9920009877026106e-05,
307
+ "loss": 0.4759,
308
+ "step": 50
309
+ },
310
+ {
311
+ "epoch": 0.21,
312
+ "learning_rate": 1.9914202598483576e-05,
313
+ "loss": 0.4693,
314
+ "step": 51
315
+ },
316
+ {
317
+ "epoch": 0.22,
318
+ "learning_rate": 1.990819274431662e-05,
319
+ "loss": 0.4647,
320
+ "step": 52
321
+ },
322
+ {
323
+ "epoch": 0.22,
324
+ "learning_rate": 1.9901980437323818e-05,
325
+ "loss": 0.4502,
326
+ "step": 53
327
+ },
328
+ {
329
+ "epoch": 0.23,
330
+ "learning_rate": 1.9895565804440435e-05,
331
+ "loss": 0.5081,
332
+ "step": 54
333
+ },
334
+ {
335
+ "epoch": 0.23,
336
+ "learning_rate": 1.9888948976735843e-05,
337
+ "loss": 0.4724,
338
+ "step": 55
339
+ },
340
+ {
341
+ "epoch": 0.23,
342
+ "learning_rate": 1.9882130089410822e-05,
343
+ "loss": 0.4941,
344
+ "step": 56
345
+ },
346
+ {
347
+ "epoch": 0.24,
348
+ "learning_rate": 1.9875109281794828e-05,
349
+ "loss": 0.4827,
350
+ "step": 57
351
+ },
352
+ {
353
+ "epoch": 0.24,
354
+ "learning_rate": 1.986788669734311e-05,
355
+ "loss": 0.4633,
356
+ "step": 58
357
+ },
358
+ {
359
+ "epoch": 0.25,
360
+ "learning_rate": 1.986046248363381e-05,
361
+ "loss": 0.4893,
362
+ "step": 59
363
+ },
364
+ {
365
+ "epoch": 0.25,
366
+ "learning_rate": 1.985283679236493e-05,
367
+ "loss": 0.4672,
368
+ "step": 60
369
+ },
370
+ {
371
+ "epoch": 0.26,
372
+ "learning_rate": 1.9845009779351235e-05,
373
+ "loss": 0.4666,
374
+ "step": 61
375
+ },
376
+ {
377
+ "epoch": 0.26,
378
+ "learning_rate": 1.9836981604521077e-05,
379
+ "loss": 0.4673,
380
+ "step": 62
381
+ },
382
+ {
383
+ "epoch": 0.26,
384
+ "learning_rate": 1.9828752431913116e-05,
385
+ "loss": 0.4754,
386
+ "step": 63
387
+ },
388
+ {
389
+ "epoch": 0.27,
390
+ "learning_rate": 1.9820322429672978e-05,
391
+ "loss": 0.5032,
392
+ "step": 64
393
+ },
394
+ {
395
+ "epoch": 0.27,
396
+ "learning_rate": 1.9811691770049806e-05,
397
+ "loss": 0.4799,
398
+ "step": 65
399
+ },
400
+ {
401
+ "epoch": 0.28,
402
+ "learning_rate": 1.9802860629392765e-05,
403
+ "loss": 0.4699,
404
+ "step": 66
405
+ },
406
+ {
407
+ "epoch": 0.28,
408
+ "learning_rate": 1.9793829188147406e-05,
409
+ "loss": 0.472,
410
+ "step": 67
411
+ },
412
+ {
413
+ "epoch": 0.28,
414
+ "learning_rate": 1.9784597630852008e-05,
415
+ "loss": 0.5005,
416
+ "step": 68
417
+ },
418
+ {
419
+ "epoch": 0.29,
420
+ "learning_rate": 1.97751661461338e-05,
421
+ "loss": 0.4388,
422
+ "step": 69
423
+ },
424
+ {
425
+ "epoch": 0.29,
426
+ "learning_rate": 1.9765534926705082e-05,
427
+ "loss": 0.4659,
428
+ "step": 70
429
+ },
430
+ {
431
+ "epoch": 0.3,
432
+ "learning_rate": 1.975570416935932e-05,
433
+ "loss": 0.4816,
434
+ "step": 71
435
+ },
436
+ {
437
+ "epoch": 0.3,
438
+ "learning_rate": 1.974567407496712e-05,
439
+ "loss": 0.4659,
440
+ "step": 72
441
+ },
442
+ {
443
+ "epoch": 0.31,
444
+ "learning_rate": 1.9735444848472108e-05,
445
+ "loss": 0.472,
446
+ "step": 73
447
+ },
448
+ {
449
+ "epoch": 0.31,
450
+ "learning_rate": 1.9725016698886748e-05,
451
+ "loss": 0.468,
452
+ "step": 74
453
+ },
454
+ {
455
+ "epoch": 0.31,
456
+ "learning_rate": 1.9714389839288073e-05,
457
+ "loss": 0.4568,
458
+ "step": 75
459
+ },
460
+ {
461
+ "epoch": 0.32,
462
+ "learning_rate": 1.9703564486813342e-05,
463
+ "loss": 0.4627,
464
+ "step": 76
465
+ },
466
+ {
467
+ "epoch": 0.32,
468
+ "learning_rate": 1.9692540862655587e-05,
469
+ "loss": 0.497,
470
+ "step": 77
471
+ },
472
+ {
473
+ "epoch": 0.33,
474
+ "learning_rate": 1.96813191920591e-05,
475
+ "loss": 0.496,
476
+ "step": 78
477
+ },
478
+ {
479
+ "epoch": 0.33,
480
+ "learning_rate": 1.9669899704314828e-05,
481
+ "loss": 0.4655,
482
+ "step": 79
483
+ },
484
+ {
485
+ "epoch": 0.33,
486
+ "learning_rate": 1.9658282632755694e-05,
487
+ "loss": 0.4611,
488
+ "step": 80
489
+ },
490
+ {
491
+ "epoch": 0.34,
492
+ "learning_rate": 1.964646821475183e-05,
493
+ "loss": 0.4795,
494
+ "step": 81
495
+ },
496
+ {
497
+ "epoch": 0.34,
498
+ "learning_rate": 1.9634456691705705e-05,
499
+ "loss": 0.4558,
500
+ "step": 82
501
+ },
502
+ {
503
+ "epoch": 0.35,
504
+ "learning_rate": 1.9622248309047233e-05,
505
+ "loss": 0.476,
506
+ "step": 83
507
+ },
508
+ {
509
+ "epoch": 0.35,
510
+ "learning_rate": 1.960984331622872e-05,
511
+ "loss": 0.4417,
512
+ "step": 84
513
+ },
514
+ {
515
+ "epoch": 0.36,
516
+ "learning_rate": 1.959724196671978e-05,
517
+ "loss": 0.4682,
518
+ "step": 85
519
+ },
520
+ {
521
+ "epoch": 0.36,
522
+ "learning_rate": 1.9584444518002178e-05,
523
+ "loss": 0.485,
524
+ "step": 86
525
+ },
526
+ {
527
+ "epoch": 0.36,
528
+ "learning_rate": 1.9571451231564523e-05,
529
+ "loss": 0.4674,
530
+ "step": 87
531
+ },
532
+ {
533
+ "epoch": 0.37,
534
+ "learning_rate": 1.955826237289697e-05,
535
+ "loss": 0.4464,
536
+ "step": 88
537
+ },
538
+ {
539
+ "epoch": 0.37,
540
+ "learning_rate": 1.9544878211485763e-05,
541
+ "loss": 0.4677,
542
+ "step": 89
543
+ },
544
+ {
545
+ "epoch": 0.38,
546
+ "learning_rate": 1.9531299020807752e-05,
547
+ "loss": 0.4343,
548
+ "step": 90
549
+ },
550
+ {
551
+ "epoch": 0.38,
552
+ "learning_rate": 1.9517525078324787e-05,
553
+ "loss": 0.4506,
554
+ "step": 91
555
+ },
556
+ {
557
+ "epoch": 0.38,
558
+ "learning_rate": 1.9503556665478066e-05,
559
+ "loss": 0.4669,
560
+ "step": 92
561
+ },
562
+ {
563
+ "epoch": 0.39,
564
+ "learning_rate": 1.9489394067682365e-05,
565
+ "loss": 0.4587,
566
+ "step": 93
567
+ },
568
+ {
569
+ "epoch": 0.39,
570
+ "learning_rate": 1.9475037574320217e-05,
571
+ "loss": 0.4337,
572
+ "step": 94
573
+ },
574
+ {
575
+ "epoch": 0.4,
576
+ "learning_rate": 1.946048747873601e-05,
577
+ "loss": 0.4547,
578
+ "step": 95
579
+ },
580
+ {
581
+ "epoch": 0.4,
582
+ "learning_rate": 1.9445744078229967e-05,
583
+ "loss": 0.4605,
584
+ "step": 96
585
+ },
586
+ {
587
+ "epoch": 0.41,
588
+ "learning_rate": 1.9430807674052092e-05,
589
+ "loss": 0.4508,
590
+ "step": 97
591
+ },
592
+ {
593
+ "epoch": 0.41,
594
+ "learning_rate": 1.9415678571396006e-05,
595
+ "loss": 0.4642,
596
+ "step": 98
597
+ },
598
+ {
599
+ "epoch": 0.41,
600
+ "learning_rate": 1.9400357079392714e-05,
601
+ "loss": 0.4528,
602
+ "step": 99
603
+ },
604
+ {
605
+ "epoch": 0.42,
606
+ "learning_rate": 1.9384843511104294e-05,
607
+ "loss": 0.4679,
608
+ "step": 100
609
+ },
610
+ {
611
+ "epoch": 0.42,
612
+ "learning_rate": 1.936913818351748e-05,
613
+ "loss": 0.4425,
614
+ "step": 101
615
+ },
616
+ {
617
+ "epoch": 0.43,
618
+ "learning_rate": 1.9353241417537216e-05,
619
+ "loss": 0.4758,
620
+ "step": 102
621
+ },
622
+ {
623
+ "epoch": 0.43,
624
+ "learning_rate": 1.933715353798006e-05,
625
+ "loss": 0.4516,
626
+ "step": 103
627
+ },
628
+ {
629
+ "epoch": 0.44,
630
+ "learning_rate": 1.9320874873567598e-05,
631
+ "loss": 0.4519,
632
+ "step": 104
633
+ },
634
+ {
635
+ "epoch": 0.44,
636
+ "learning_rate": 1.930440575691967e-05,
637
+ "loss": 0.4861,
638
+ "step": 105
639
+ },
640
+ {
641
+ "epoch": 0.44,
642
+ "learning_rate": 1.9287746524547627e-05,
643
+ "loss": 0.4635,
644
+ "step": 106
645
+ },
646
+ {
647
+ "epoch": 0.45,
648
+ "learning_rate": 1.9270897516847406e-05,
649
+ "loss": 0.4607,
650
+ "step": 107
651
+ },
652
+ {
653
+ "epoch": 0.45,
654
+ "learning_rate": 1.9253859078092616e-05,
655
+ "loss": 0.4538,
656
+ "step": 108
657
+ },
658
+ {
659
+ "epoch": 0.46,
660
+ "learning_rate": 1.923663155642748e-05,
661
+ "loss": 0.4647,
662
+ "step": 109
663
+ },
664
+ {
665
+ "epoch": 0.46,
666
+ "learning_rate": 1.9219215303859732e-05,
667
+ "loss": 0.4524,
668
+ "step": 110
669
+ },
670
+ {
671
+ "epoch": 0.46,
672
+ "learning_rate": 1.9201610676253412e-05,
673
+ "loss": 0.4296,
674
+ "step": 111
675
+ },
676
+ {
677
+ "epoch": 0.47,
678
+ "learning_rate": 1.9183818033321612e-05,
679
+ "loss": 0.4378,
680
+ "step": 112
681
+ },
682
+ {
683
+ "epoch": 0.47,
684
+ "learning_rate": 1.916583773861911e-05,
685
+ "loss": 0.4793,
686
+ "step": 113
687
+ },
688
+ {
689
+ "epoch": 0.48,
690
+ "learning_rate": 1.9147670159534953e-05,
691
+ "loss": 0.4423,
692
+ "step": 114
693
+ },
694
+ {
695
+ "epoch": 0.48,
696
+ "learning_rate": 1.912931566728494e-05,
697
+ "loss": 0.4824,
698
+ "step": 115
699
+ },
700
+ {
701
+ "epoch": 0.49,
702
+ "learning_rate": 1.9110774636904052e-05,
703
+ "loss": 0.4399,
704
+ "step": 116
705
+ },
706
+ {
707
+ "epoch": 0.49,
708
+ "learning_rate": 1.9092047447238775e-05,
709
+ "loss": 0.478,
710
+ "step": 117
711
+ },
712
+ {
713
+ "epoch": 0.49,
714
+ "learning_rate": 1.9073134480939353e-05,
715
+ "loss": 0.4661,
716
+ "step": 118
717
+ },
718
+ {
719
+ "epoch": 0.5,
720
+ "learning_rate": 1.9054036124452e-05,
721
+ "loss": 0.4806,
722
+ "step": 119
723
+ },
724
+ {
725
+ "epoch": 0.5,
726
+ "learning_rate": 1.9034752768010965e-05,
727
+ "loss": 0.4437,
728
+ "step": 120
729
+ },
730
+ {
731
+ "epoch": 0.51,
732
+ "learning_rate": 1.901528480563059e-05,
733
+ "loss": 0.4863,
734
+ "step": 121
735
+ },
736
+ {
737
+ "epoch": 0.51,
738
+ "learning_rate": 1.899563263509725e-05,
739
+ "loss": 0.4434,
740
+ "step": 122
741
+ },
742
+ {
743
+ "epoch": 0.51,
744
+ "learning_rate": 1.89757966579612e-05,
745
+ "loss": 0.443,
746
+ "step": 123
747
+ },
748
+ {
749
+ "epoch": 0.52,
750
+ "learning_rate": 1.8955777279528414e-05,
751
+ "loss": 0.4813,
752
+ "step": 124
753
+ },
754
+ {
755
+ "epoch": 0.52,
756
+ "learning_rate": 1.8935574908852272e-05,
757
+ "loss": 0.4609,
758
+ "step": 125
759
+ },
760
+ {
761
+ "epoch": 0.53,
762
+ "learning_rate": 1.8915189958725207e-05,
763
+ "loss": 0.4542,
764
+ "step": 126
765
+ },
766
+ {
767
+ "epoch": 0.53,
768
+ "learning_rate": 1.8894622845670282e-05,
769
+ "loss": 0.4285,
770
+ "step": 127
771
+ },
772
+ {
773
+ "epoch": 0.54,
774
+ "learning_rate": 1.8873873989932666e-05,
775
+ "loss": 0.4887,
776
+ "step": 128
777
+ },
778
+ {
779
+ "epoch": 0.54,
780
+ "learning_rate": 1.8852943815471058e-05,
781
+ "loss": 0.4553,
782
+ "step": 129
783
+ },
784
+ {
785
+ "epoch": 0.54,
786
+ "learning_rate": 1.8831832749949015e-05,
787
+ "loss": 0.4411,
788
+ "step": 130
789
+ },
790
+ {
791
+ "epoch": 0.55,
792
+ "learning_rate": 1.8810541224726217e-05,
793
+ "loss": 0.4653,
794
+ "step": 131
795
+ },
796
+ {
797
+ "epoch": 0.55,
798
+ "learning_rate": 1.878906967484966e-05,
799
+ "loss": 0.4594,
800
+ "step": 132
801
+ },
802
+ {
803
+ "epoch": 0.56,
804
+ "learning_rate": 1.8767418539044753e-05,
805
+ "loss": 0.4602,
806
+ "step": 133
807
+ },
808
+ {
809
+ "epoch": 0.56,
810
+ "learning_rate": 1.8745588259706366e-05,
811
+ "loss": 0.4504,
812
+ "step": 134
813
+ },
814
+ {
815
+ "epoch": 0.56,
816
+ "learning_rate": 1.8723579282889784e-05,
817
+ "loss": 0.4721,
818
+ "step": 135
819
+ },
820
+ {
821
+ "epoch": 0.57,
822
+ "learning_rate": 1.8701392058301595e-05,
823
+ "loss": 0.4561,
824
+ "step": 136
825
+ },
826
+ {
827
+ "epoch": 0.57,
828
+ "learning_rate": 1.86790270392905e-05,
829
+ "loss": 0.4376,
830
+ "step": 137
831
+ },
832
+ {
833
+ "epoch": 0.58,
834
+ "learning_rate": 1.865648468283805e-05,
835
+ "loss": 0.4513,
836
+ "step": 138
837
+ },
838
+ {
839
+ "epoch": 0.58,
840
+ "learning_rate": 1.863376544954931e-05,
841
+ "loss": 0.4416,
842
+ "step": 139
843
+ },
844
+ {
845
+ "epoch": 0.59,
846
+ "learning_rate": 1.8610869803643454e-05,
847
+ "loss": 0.4298,
848
+ "step": 140
849
+ },
850
+ {
851
+ "epoch": 0.59,
852
+ "learning_rate": 1.8587798212944255e-05,
853
+ "loss": 0.4596,
854
+ "step": 141
855
+ },
856
+ {
857
+ "epoch": 0.59,
858
+ "learning_rate": 1.856455114887056e-05,
859
+ "loss": 0.4627,
860
+ "step": 142
861
+ },
862
+ {
863
+ "epoch": 0.6,
864
+ "learning_rate": 1.854112908642663e-05,
865
+ "loss": 0.472,
866
+ "step": 143
867
+ },
868
+ {
869
+ "epoch": 0.6,
870
+ "learning_rate": 1.8517532504192456e-05,
871
+ "loss": 0.4671,
872
+ "step": 144
873
+ },
874
+ {
875
+ "epoch": 0.61,
876
+ "learning_rate": 1.849376188431396e-05,
877
+ "loss": 0.4807,
878
+ "step": 145
879
+ },
880
+ {
881
+ "epoch": 0.61,
882
+ "learning_rate": 1.8469817712493148e-05,
883
+ "loss": 0.4428,
884
+ "step": 146
885
+ },
886
+ {
887
+ "epoch": 0.62,
888
+ "learning_rate": 1.8445700477978207e-05,
889
+ "loss": 0.4461,
890
+ "step": 147
891
+ },
892
+ {
893
+ "epoch": 0.62,
894
+ "learning_rate": 1.8421410673553475e-05,
895
+ "loss": 0.4507,
896
+ "step": 148
897
+ },
898
+ {
899
+ "epoch": 0.62,
900
+ "learning_rate": 1.8396948795529405e-05,
901
+ "loss": 0.4594,
902
+ "step": 149
903
+ },
904
+ {
905
+ "epoch": 0.63,
906
+ "learning_rate": 1.8372315343732395e-05,
907
+ "loss": 0.4423,
908
+ "step": 150
909
+ },
910
+ {
911
+ "epoch": 0.63,
912
+ "learning_rate": 1.8347510821494593e-05,
913
+ "loss": 0.4668,
914
+ "step": 151
915
+ },
916
+ {
917
+ "epoch": 0.64,
918
+ "learning_rate": 1.8322535735643604e-05,
919
+ "loss": 0.4286,
920
+ "step": 152
921
+ },
922
+ {
923
+ "epoch": 0.64,
924
+ "learning_rate": 1.8297390596492143e-05,
925
+ "loss": 0.4313,
926
+ "step": 153
927
+ },
928
+ {
929
+ "epoch": 0.64,
930
+ "learning_rate": 1.8272075917827597e-05,
931
+ "loss": 0.4507,
932
+ "step": 154
933
+ },
934
+ {
935
+ "epoch": 0.65,
936
+ "learning_rate": 1.824659221690153e-05,
937
+ "loss": 0.4631,
938
+ "step": 155
939
+ },
940
+ {
941
+ "epoch": 0.65,
942
+ "learning_rate": 1.822094001441913e-05,
943
+ "loss": 0.4839,
944
+ "step": 156
945
+ },
946
+ {
947
+ "epoch": 0.66,
948
+ "learning_rate": 1.8195119834528535e-05,
949
+ "loss": 0.4634,
950
+ "step": 157
951
+ },
952
+ {
953
+ "epoch": 0.66,
954
+ "learning_rate": 1.8169132204810157e-05,
955
+ "loss": 0.442,
956
+ "step": 158
957
+ },
958
+ {
959
+ "epoch": 0.67,
960
+ "learning_rate": 1.814297765626589e-05,
961
+ "loss": 0.4671,
962
+ "step": 159
963
+ },
964
+ {
965
+ "epoch": 0.67,
966
+ "learning_rate": 1.8116656723308253e-05,
967
+ "loss": 0.4622,
968
+ "step": 160
969
+ },
970
+ {
971
+ "epoch": 0.67,
972
+ "learning_rate": 1.8090169943749477e-05,
973
+ "loss": 0.4721,
974
+ "step": 161
975
+ },
976
+ {
977
+ "epoch": 0.68,
978
+ "learning_rate": 1.8063517858790517e-05,
979
+ "loss": 0.4728,
980
+ "step": 162
981
+ },
982
+ {
983
+ "epoch": 0.68,
984
+ "learning_rate": 1.8036701013009988e-05,
985
+ "loss": 0.4662,
986
+ "step": 163
987
+ },
988
+ {
989
+ "epoch": 0.69,
990
+ "learning_rate": 1.800971995435305e-05,
991
+ "loss": 0.4916,
992
+ "step": 164
993
+ },
994
+ {
995
+ "epoch": 0.69,
996
+ "learning_rate": 1.7982575234120196e-05,
997
+ "loss": 0.4598,
998
+ "step": 165
999
+ },
1000
+ {
1001
+ "epoch": 0.69,
1002
+ "learning_rate": 1.7955267406955997e-05,
1003
+ "loss": 0.4434,
1004
+ "step": 166
1005
+ },
1006
+ {
1007
+ "epoch": 0.7,
1008
+ "learning_rate": 1.792779703083777e-05,
1009
+ "loss": 0.4565,
1010
+ "step": 167
1011
+ },
1012
+ {
1013
+ "epoch": 0.7,
1014
+ "learning_rate": 1.790016466706417e-05,
1015
+ "loss": 0.4633,
1016
+ "step": 168
1017
+ },
1018
+ {
1019
+ "epoch": 0.71,
1020
+ "learning_rate": 1.787237088024372e-05,
1021
+ "loss": 0.4623,
1022
+ "step": 169
1023
+ },
1024
+ {
1025
+ "epoch": 0.71,
1026
+ "learning_rate": 1.78444162382833e-05,
1027
+ "loss": 0.4559,
1028
+ "step": 170
1029
+ },
1030
+ {
1031
+ "epoch": 0.72,
1032
+ "learning_rate": 1.781630131237649e-05,
1033
+ "loss": 0.4421,
1034
+ "step": 171
1035
+ },
1036
+ {
1037
+ "epoch": 0.72,
1038
+ "learning_rate": 1.778802667699196e-05,
1039
+ "loss": 0.4357,
1040
+ "step": 172
1041
+ },
1042
+ {
1043
+ "epoch": 0.72,
1044
+ "learning_rate": 1.7759592909861694e-05,
1045
+ "loss": 0.4538,
1046
+ "step": 173
1047
+ },
1048
+ {
1049
+ "epoch": 0.73,
1050
+ "learning_rate": 1.7731000591969182e-05,
1051
+ "loss": 0.4342,
1052
+ "step": 174
1053
+ },
1054
+ {
1055
+ "epoch": 0.73,
1056
+ "learning_rate": 1.7702250307537583e-05,
1057
+ "loss": 0.4581,
1058
+ "step": 175
1059
+ },
1060
+ {
1061
+ "epoch": 0.74,
1062
+ "learning_rate": 1.7673342644017744e-05,
1063
+ "loss": 0.4485,
1064
+ "step": 176
1065
+ },
1066
+ {
1067
+ "epoch": 0.74,
1068
+ "learning_rate": 1.764427819207624e-05,
1069
+ "loss": 0.4574,
1070
+ "step": 177
1071
+ },
1072
+ {
1073
+ "epoch": 0.74,
1074
+ "learning_rate": 1.761505754558327e-05,
1075
+ "loss": 0.4474,
1076
+ "step": 178
1077
+ },
1078
+ {
1079
+ "epoch": 0.75,
1080
+ "learning_rate": 1.758568130160053e-05,
1081
+ "loss": 0.4478,
1082
+ "step": 179
1083
+ },
1084
+ {
1085
+ "epoch": 0.75,
1086
+ "learning_rate": 1.755615006036904e-05,
1087
+ "loss": 0.4543,
1088
+ "step": 180
1089
+ },
1090
+ {
1091
+ "epoch": 0.76,
1092
+ "learning_rate": 1.7526464425296846e-05,
1093
+ "loss": 0.4528,
1094
+ "step": 181
1095
+ },
1096
+ {
1097
+ "epoch": 0.76,
1098
+ "learning_rate": 1.7496625002946702e-05,
1099
+ "loss": 0.4313,
1100
+ "step": 182
1101
+ },
1102
+ {
1103
+ "epoch": 0.77,
1104
+ "learning_rate": 1.746663240302368e-05,
1105
+ "loss": 0.4869,
1106
+ "step": 183
1107
+ },
1108
+ {
1109
+ "epoch": 0.77,
1110
+ "learning_rate": 1.743648723836271e-05,
1111
+ "loss": 0.472,
1112
+ "step": 184
1113
+ },
1114
+ {
1115
+ "epoch": 0.77,
1116
+ "learning_rate": 1.7406190124916064e-05,
1117
+ "loss": 0.4585,
1118
+ "step": 185
1119
+ },
1120
+ {
1121
+ "epoch": 0.78,
1122
+ "learning_rate": 1.737574168174075e-05,
1123
+ "loss": 0.4215,
1124
+ "step": 186
1125
+ },
1126
+ {
1127
+ "epoch": 0.78,
1128
+ "learning_rate": 1.734514253098589e-05,
1129
+ "loss": 0.452,
1130
+ "step": 187
1131
+ },
1132
+ {
1133
+ "epoch": 0.79,
1134
+ "learning_rate": 1.7314393297879982e-05,
1135
+ "loss": 0.456,
1136
+ "step": 188
1137
+ },
1138
+ {
1139
+ "epoch": 0.79,
1140
+ "learning_rate": 1.7283494610718153e-05,
1141
+ "loss": 0.4557,
1142
+ "step": 189
1143
+ },
1144
+ {
1145
+ "epoch": 0.79,
1146
+ "learning_rate": 1.7252447100849294e-05,
1147
+ "loss": 0.4679,
1148
+ "step": 190
1149
+ },
1150
+ {
1151
+ "epoch": 0.8,
1152
+ "learning_rate": 1.7221251402663176e-05,
1153
+ "loss": 0.4364,
1154
+ "step": 191
1155
+ },
1156
+ {
1157
+ "epoch": 0.8,
1158
+ "learning_rate": 1.7189908153577473e-05,
1159
+ "loss": 0.4641,
1160
+ "step": 192
1161
+ },
1162
+ {
1163
+ "epoch": 0.81,
1164
+ "learning_rate": 1.7158417994024766e-05,
1165
+ "loss": 0.4387,
1166
+ "step": 193
1167
+ },
1168
+ {
1169
+ "epoch": 0.81,
1170
+ "learning_rate": 1.7126781567439418e-05,
1171
+ "loss": 0.4487,
1172
+ "step": 194
1173
+ },
1174
+ {
1175
+ "epoch": 0.82,
1176
+ "learning_rate": 1.709499952024447e-05,
1177
+ "loss": 0.4381,
1178
+ "step": 195
1179
+ },
1180
+ {
1181
+ "epoch": 0.82,
1182
+ "learning_rate": 1.7063072501838388e-05,
1183
+ "loss": 0.4622,
1184
+ "step": 196
1185
+ },
1186
+ {
1187
+ "epoch": 0.82,
1188
+ "learning_rate": 1.7031001164581828e-05,
1189
+ "loss": 0.4528,
1190
+ "step": 197
1191
+ },
1192
+ {
1193
+ "epoch": 0.83,
1194
+ "learning_rate": 1.6998786163784295e-05,
1195
+ "loss": 0.4374,
1196
+ "step": 198
1197
+ },
1198
+ {
1199
+ "epoch": 0.83,
1200
+ "learning_rate": 1.696642815769075e-05,
1201
+ "loss": 0.4468,
1202
+ "step": 199
1203
+ },
1204
+ {
1205
+ "epoch": 0.84,
1206
+ "learning_rate": 1.6933927807468155e-05,
1207
+ "loss": 0.4716,
1208
+ "step": 200
1209
+ },
1210
+ {
1211
+ "epoch": 0.84,
1212
+ "learning_rate": 1.690128577719199e-05,
1213
+ "loss": 0.4599,
1214
+ "step": 201
1215
+ },
1216
+ {
1217
+ "epoch": 0.85,
1218
+ "learning_rate": 1.6868502733832647e-05,
1219
+ "loss": 0.4494,
1220
+ "step": 202
1221
+ },
1222
+ {
1223
+ "epoch": 0.85,
1224
+ "learning_rate": 1.683557934724183e-05,
1225
+ "loss": 0.4539,
1226
+ "step": 203
1227
+ },
1228
+ {
1229
+ "epoch": 0.85,
1230
+ "learning_rate": 1.680251629013885e-05,
1231
+ "loss": 0.445,
1232
+ "step": 204
1233
+ },
1234
+ {
1235
+ "epoch": 0.86,
1236
+ "learning_rate": 1.6769314238096906e-05,
1237
+ "loss": 0.4293,
1238
+ "step": 205
1239
+ },
1240
+ {
1241
+ "epoch": 0.86,
1242
+ "learning_rate": 1.673597386952924e-05,
1243
+ "loss": 0.4578,
1244
+ "step": 206
1245
+ },
1246
+ {
1247
+ "epoch": 0.87,
1248
+ "learning_rate": 1.670249586567531e-05,
1249
+ "loss": 0.434,
1250
+ "step": 207
1251
+ },
1252
+ {
1253
+ "epoch": 0.87,
1254
+ "learning_rate": 1.6668880910586853e-05,
1255
+ "loss": 0.4491,
1256
+ "step": 208
1257
+ },
1258
+ {
1259
+ "epoch": 0.87,
1260
+ "learning_rate": 1.663512969111392e-05,
1261
+ "loss": 0.4461,
1262
+ "step": 209
1263
+ },
1264
+ {
1265
+ "epoch": 0.88,
1266
+ "learning_rate": 1.6601242896890832e-05,
1267
+ "loss": 0.4375,
1268
+ "step": 210
1269
+ },
1270
+ {
1271
+ "epoch": 0.88,
1272
+ "learning_rate": 1.6567221220322082e-05,
1273
+ "loss": 0.4456,
1274
+ "step": 211
1275
+ },
1276
+ {
1277
+ "epoch": 0.89,
1278
+ "learning_rate": 1.6533065356568206e-05,
1279
+ "loss": 0.4541,
1280
+ "step": 212
1281
+ },
1282
+ {
1283
+ "epoch": 0.89,
1284
+ "learning_rate": 1.6498776003531575e-05,
1285
+ "loss": 0.4385,
1286
+ "step": 213
1287
+ },
1288
+ {
1289
+ "epoch": 0.9,
1290
+ "learning_rate": 1.6464353861842115e-05,
1291
+ "loss": 0.4304,
1292
+ "step": 214
1293
+ },
1294
+ {
1295
+ "epoch": 0.9,
1296
+ "learning_rate": 1.6429799634843012e-05,
1297
+ "loss": 0.4517,
1298
+ "step": 215
1299
+ },
1300
+ {
1301
+ "epoch": 0.9,
1302
+ "learning_rate": 1.6395114028576344e-05,
1303
+ "loss": 0.4442,
1304
+ "step": 216
1305
+ },
1306
+ {
1307
+ "epoch": 0.91,
1308
+ "learning_rate": 1.636029775176862e-05,
1309
+ "loss": 0.437,
1310
+ "step": 217
1311
+ },
1312
+ {
1313
+ "epoch": 0.91,
1314
+ "learning_rate": 1.6325351515816353e-05,
1315
+ "loss": 0.4193,
1316
+ "step": 218
1317
+ },
1318
+ {
1319
+ "epoch": 0.92,
1320
+ "learning_rate": 1.629027603477147e-05,
1321
+ "loss": 0.4421,
1322
+ "step": 219
1323
+ },
1324
+ {
1325
+ "epoch": 0.92,
1326
+ "learning_rate": 1.6255072025326763e-05,
1327
+ "loss": 0.4511,
1328
+ "step": 220
1329
+ },
1330
+ {
1331
+ "epoch": 0.92,
1332
+ "learning_rate": 1.621974020680122e-05,
1333
+ "loss": 0.4409,
1334
+ "step": 221
1335
+ },
1336
+ {
1337
+ "epoch": 0.93,
1338
+ "learning_rate": 1.618428130112533e-05,
1339
+ "loss": 0.4239,
1340
+ "step": 222
1341
+ },
1342
+ {
1343
+ "epoch": 0.93,
1344
+ "learning_rate": 1.6148696032826354e-05,
1345
+ "loss": 0.4479,
1346
+ "step": 223
1347
+ },
1348
+ {
1349
+ "epoch": 0.94,
1350
+ "learning_rate": 1.611298512901349e-05,
1351
+ "loss": 0.4601,
1352
+ "step": 224
1353
+ },
1354
+ {
1355
+ "epoch": 0.94,
1356
+ "learning_rate": 1.6077149319363035e-05,
1357
+ "loss": 0.435,
1358
+ "step": 225
1359
+ },
1360
+ {
1361
+ "epoch": 0.95,
1362
+ "learning_rate": 1.6041189336103475e-05,
1363
+ "loss": 0.4369,
1364
+ "step": 226
1365
+ },
1366
+ {
1367
+ "epoch": 0.95,
1368
+ "learning_rate": 1.6005105914000508e-05,
1369
+ "loss": 0.452,
1370
+ "step": 227
1371
+ },
1372
+ {
1373
+ "epoch": 0.95,
1374
+ "learning_rate": 1.596889979034205e-05,
1375
+ "loss": 0.4772,
1376
+ "step": 228
1377
+ },
1378
+ {
1379
+ "epoch": 0.96,
1380
+ "learning_rate": 1.5932571704923168e-05,
1381
+ "loss": 0.4528,
1382
+ "step": 229
1383
+ },
1384
+ {
1385
+ "epoch": 0.96,
1386
+ "learning_rate": 1.589612240003095e-05,
1387
+ "loss": 0.441,
1388
+ "step": 230
1389
+ },
1390
+ {
1391
+ "epoch": 0.97,
1392
+ "learning_rate": 1.585955262042934e-05,
1393
+ "loss": 0.445,
1394
+ "step": 231
1395
+ },
1396
+ {
1397
+ "epoch": 0.97,
1398
+ "learning_rate": 1.5822863113343934e-05,
1399
+ "loss": 0.4712,
1400
+ "step": 232
1401
+ },
1402
+ {
1403
+ "epoch": 0.97,
1404
+ "learning_rate": 1.5786054628446712e-05,
1405
+ "loss": 0.4349,
1406
+ "step": 233
1407
+ },
1408
+ {
1409
+ "epoch": 0.98,
1410
+ "learning_rate": 1.57491279178407e-05,
1411
+ "loss": 0.4509,
1412
+ "step": 234
1413
+ },
1414
+ {
1415
+ "epoch": 0.98,
1416
+ "learning_rate": 1.5712083736044613e-05,
1417
+ "loss": 0.435,
1418
+ "step": 235
1419
+ },
1420
+ {
1421
+ "epoch": 0.99,
1422
+ "learning_rate": 1.5674922839977446e-05,
1423
+ "loss": 0.4306,
1424
+ "step": 236
1425
+ },
1426
+ {
1427
+ "epoch": 0.99,
1428
+ "learning_rate": 1.5637645988943008e-05,
1429
+ "loss": 0.4595,
1430
+ "step": 237
1431
+ },
1432
+ {
1433
+ "epoch": 1.0,
1434
+ "learning_rate": 1.560025394461439e-05,
1435
+ "loss": 0.4597,
1436
+ "step": 238
1437
+ },
1438
+ {
1439
+ "epoch": 1.0,
1440
+ "learning_rate": 1.5562747471018415e-05,
1441
+ "loss": 0.4572,
1442
+ "step": 239
1443
+ },
1444
+ {
1445
+ "epoch": 1.0,
1446
+ "learning_rate": 1.552512733452003e-05,
1447
+ "loss": 0.376,
1448
+ "step": 240
1449
+ },
1450
+ {
1451
+ "epoch": 1.01,
1452
+ "learning_rate": 1.5487394303806632e-05,
1453
+ "loss": 0.3759,
1454
+ "step": 241
1455
+ },
1456
+ {
1457
+ "epoch": 1.01,
1458
+ "learning_rate": 1.544954914987238e-05,
1459
+ "loss": 0.3626,
1460
+ "step": 242
1461
+ },
1462
+ {
1463
+ "epoch": 1.02,
1464
+ "learning_rate": 1.541159264600242e-05,
1465
+ "loss": 0.397,
1466
+ "step": 243
1467
+ },
1468
+ {
1469
+ "epoch": 1.02,
1470
+ "learning_rate": 1.5373525567757124e-05,
1471
+ "loss": 0.3663,
1472
+ "step": 244
1473
+ },
1474
+ {
1475
+ "epoch": 1.03,
1476
+ "learning_rate": 1.5335348692956177e-05,
1477
+ "loss": 0.3769,
1478
+ "step": 245
1479
+ },
1480
+ {
1481
+ "epoch": 1.03,
1482
+ "learning_rate": 1.529706280166276e-05,
1483
+ "loss": 0.3705,
1484
+ "step": 246
1485
+ },
1486
+ {
1487
+ "epoch": 1.03,
1488
+ "learning_rate": 1.5258668676167548e-05,
1489
+ "loss": 0.3748,
1490
+ "step": 247
1491
+ },
1492
+ {
1493
+ "epoch": 1.04,
1494
+ "learning_rate": 1.5220167100972763e-05,
1495
+ "loss": 0.3654,
1496
+ "step": 248
1497
+ },
1498
+ {
1499
+ "epoch": 1.04,
1500
+ "learning_rate": 1.518155886277613e-05,
1501
+ "loss": 0.3639,
1502
+ "step": 249
1503
+ },
1504
+ {
1505
+ "epoch": 1.05,
1506
+ "learning_rate": 1.5142844750454807e-05,
1507
+ "loss": 0.3645,
1508
+ "step": 250
1509
+ },
1510
+ {
1511
+ "epoch": 1.05,
1512
+ "learning_rate": 1.5104025555049262e-05,
1513
+ "loss": 0.3606,
1514
+ "step": 251
1515
+ },
1516
+ {
1517
+ "epoch": 1.05,
1518
+ "learning_rate": 1.5065102069747117e-05,
1519
+ "loss": 0.3607,
1520
+ "step": 252
1521
+ },
1522
+ {
1523
+ "epoch": 1.06,
1524
+ "learning_rate": 1.502607508986693e-05,
1525
+ "loss": 0.35,
1526
+ "step": 253
1527
+ },
1528
+ {
1529
+ "epoch": 1.06,
1530
+ "learning_rate": 1.498694541284195e-05,
1531
+ "loss": 0.3839,
1532
+ "step": 254
1533
+ },
1534
+ {
1535
+ "epoch": 1.07,
1536
+ "learning_rate": 1.4947713838203835e-05,
1537
+ "loss": 0.3718,
1538
+ "step": 255
1539
+ },
1540
+ {
1541
+ "epoch": 1.07,
1542
+ "learning_rate": 1.4908381167566286e-05,
1543
+ "loss": 0.3666,
1544
+ "step": 256
1545
+ },
1546
+ {
1547
+ "epoch": 1.08,
1548
+ "learning_rate": 1.48689482046087e-05,
1549
+ "loss": 0.3587,
1550
+ "step": 257
1551
+ },
1552
+ {
1553
+ "epoch": 1.08,
1554
+ "learning_rate": 1.4829415755059726e-05,
1555
+ "loss": 0.3605,
1556
+ "step": 258
1557
+ },
1558
+ {
1559
+ "epoch": 1.08,
1560
+ "learning_rate": 1.4789784626680819e-05,
1561
+ "loss": 0.3575,
1562
+ "step": 259
1563
+ },
1564
+ {
1565
+ "epoch": 1.09,
1566
+ "learning_rate": 1.475005562924971e-05,
1567
+ "loss": 0.3515,
1568
+ "step": 260
1569
+ },
1570
+ {
1571
+ "epoch": 1.09,
1572
+ "learning_rate": 1.4710229574543893e-05,
1573
+ "loss": 0.3763,
1574
+ "step": 261
1575
+ },
1576
+ {
1577
+ "epoch": 1.1,
1578
+ "learning_rate": 1.467030727632401e-05,
1579
+ "loss": 0.3653,
1580
+ "step": 262
1581
+ },
1582
+ {
1583
+ "epoch": 1.1,
1584
+ "learning_rate": 1.4630289550317234e-05,
1585
+ "loss": 0.3454,
1586
+ "step": 263
1587
+ },
1588
+ {
1589
+ "epoch": 1.1,
1590
+ "learning_rate": 1.4590177214200609e-05,
1591
+ "loss": 0.3394,
1592
+ "step": 264
1593
+ },
1594
+ {
1595
+ "epoch": 1.11,
1596
+ "learning_rate": 1.4549971087584329e-05,
1597
+ "loss": 0.3479,
1598
+ "step": 265
1599
+ },
1600
+ {
1601
+ "epoch": 1.11,
1602
+ "learning_rate": 1.4509671991995003e-05,
1603
+ "loss": 0.3624,
1604
+ "step": 266
1605
+ },
1606
+ {
1607
+ "epoch": 1.12,
1608
+ "learning_rate": 1.4469280750858854e-05,
1609
+ "loss": 0.383,
1610
+ "step": 267
1611
+ },
1612
+ {
1613
+ "epoch": 1.12,
1614
+ "learning_rate": 1.4428798189484914e-05,
1615
+ "loss": 0.3767,
1616
+ "step": 268
1617
+ },
1618
+ {
1619
+ "epoch": 1.13,
1620
+ "learning_rate": 1.4388225135048137e-05,
1621
+ "loss": 0.3593,
1622
+ "step": 269
1623
+ },
1624
+ {
1625
+ "epoch": 1.13,
1626
+ "learning_rate": 1.4347562416572525e-05,
1627
+ "loss": 0.3576,
1628
+ "step": 270
1629
+ },
1630
+ {
1631
+ "epoch": 1.13,
1632
+ "learning_rate": 1.430681086491416e-05,
1633
+ "loss": 0.3578,
1634
+ "step": 271
1635
+ },
1636
+ {
1637
+ "epoch": 1.14,
1638
+ "learning_rate": 1.4265971312744252e-05,
1639
+ "loss": 0.3413,
1640
+ "step": 272
1641
+ },
1642
+ {
1643
+ "epoch": 1.14,
1644
+ "learning_rate": 1.4225044594532104e-05,
1645
+ "loss": 0.3641,
1646
+ "step": 273
1647
+ },
1648
+ {
1649
+ "epoch": 1.15,
1650
+ "learning_rate": 1.4184031546528077e-05,
1651
+ "loss": 0.3536,
1652
+ "step": 274
1653
+ },
1654
+ {
1655
+ "epoch": 1.15,
1656
+ "learning_rate": 1.4142933006746502e-05,
1657
+ "loss": 0.3691,
1658
+ "step": 275
1659
+ },
1660
+ {
1661
+ "epoch": 1.15,
1662
+ "learning_rate": 1.4101749814948544e-05,
1663
+ "loss": 0.3735,
1664
+ "step": 276
1665
+ },
1666
+ {
1667
+ "epoch": 1.16,
1668
+ "learning_rate": 1.4060482812625055e-05,
1669
+ "loss": 0.3634,
1670
+ "step": 277
1671
+ },
1672
+ {
1673
+ "epoch": 1.16,
1674
+ "learning_rate": 1.4019132842979375e-05,
1675
+ "loss": 0.3649,
1676
+ "step": 278
1677
+ },
1678
+ {
1679
+ "epoch": 1.17,
1680
+ "learning_rate": 1.3977700750910112e-05,
1681
+ "loss": 0.3546,
1682
+ "step": 279
1683
+ },
1684
+ {
1685
+ "epoch": 1.17,
1686
+ "learning_rate": 1.3936187382993862e-05,
1687
+ "loss": 0.3672,
1688
+ "step": 280
1689
+ },
1690
+ {
1691
+ "epoch": 1.18,
1692
+ "learning_rate": 1.3894593587467924e-05,
1693
+ "loss": 0.3381,
1694
+ "step": 281
1695
+ },
1696
+ {
1697
+ "epoch": 1.18,
1698
+ "learning_rate": 1.3852920214212966e-05,
1699
+ "loss": 0.3708,
1700
+ "step": 282
1701
+ },
1702
+ {
1703
+ "epoch": 1.18,
1704
+ "learning_rate": 1.3811168114735647e-05,
1705
+ "loss": 0.3776,
1706
+ "step": 283
1707
+ },
1708
+ {
1709
+ "epoch": 1.19,
1710
+ "learning_rate": 1.3769338142151245e-05,
1711
+ "loss": 0.3588,
1712
+ "step": 284
1713
+ },
1714
+ {
1715
+ "epoch": 1.19,
1716
+ "learning_rate": 1.3727431151166196e-05,
1717
+ "loss": 0.3656,
1718
+ "step": 285
1719
+ },
1720
+ {
1721
+ "epoch": 1.2,
1722
+ "learning_rate": 1.3685447998060651e-05,
1723
+ "loss": 0.378,
1724
+ "step": 286
1725
+ },
1726
+ {
1727
+ "epoch": 1.2,
1728
+ "learning_rate": 1.3643389540670963e-05,
1729
+ "loss": 0.3652,
1730
+ "step": 287
1731
+ },
1732
+ {
1733
+ "epoch": 1.21,
1734
+ "learning_rate": 1.3601256638372182e-05,
1735
+ "loss": 0.3688,
1736
+ "step": 288
1737
+ },
1738
+ {
1739
+ "epoch": 1.21,
1740
+ "learning_rate": 1.3559050152060465e-05,
1741
+ "loss": 0.3577,
1742
+ "step": 289
1743
+ },
1744
+ {
1745
+ "epoch": 1.21,
1746
+ "learning_rate": 1.3516770944135514e-05,
1747
+ "loss": 0.3593,
1748
+ "step": 290
1749
+ },
1750
+ {
1751
+ "epoch": 1.22,
1752
+ "learning_rate": 1.3474419878482935e-05,
1753
+ "loss": 0.3425,
1754
+ "step": 291
1755
+ },
1756
+ {
1757
+ "epoch": 1.22,
1758
+ "learning_rate": 1.3431997820456592e-05,
1759
+ "loss": 0.38,
1760
+ "step": 292
1761
+ },
1762
+ {
1763
+ "epoch": 1.23,
1764
+ "learning_rate": 1.3389505636860944e-05,
1765
+ "loss": 0.3717,
1766
+ "step": 293
1767
+ },
1768
+ {
1769
+ "epoch": 1.23,
1770
+ "learning_rate": 1.3346944195933294e-05,
1771
+ "loss": 0.3476,
1772
+ "step": 294
1773
+ },
1774
+ {
1775
+ "epoch": 1.23,
1776
+ "learning_rate": 1.330431436732608e-05,
1777
+ "loss": 0.3779,
1778
+ "step": 295
1779
+ },
1780
+ {
1781
+ "epoch": 1.24,
1782
+ "learning_rate": 1.3261617022089103e-05,
1783
+ "loss": 0.3492,
1784
+ "step": 296
1785
+ },
1786
+ {
1787
+ "epoch": 1.24,
1788
+ "learning_rate": 1.3218853032651719e-05,
1789
+ "loss": 0.3657,
1790
+ "step": 297
1791
+ },
1792
+ {
1793
+ "epoch": 1.25,
1794
+ "learning_rate": 1.3176023272805008e-05,
1795
+ "loss": 0.3524,
1796
+ "step": 298
1797
+ },
1798
+ {
1799
+ "epoch": 1.25,
1800
+ "learning_rate": 1.313312861768394e-05,
1801
+ "loss": 0.3704,
1802
+ "step": 299
1803
+ },
1804
+ {
1805
+ "epoch": 1.26,
1806
+ "learning_rate": 1.3090169943749475e-05,
1807
+ "loss": 0.3631,
1808
+ "step": 300
1809
+ },
1810
+ {
1811
+ "epoch": 1.26,
1812
+ "learning_rate": 1.3047148128770664e-05,
1813
+ "loss": 0.3806,
1814
+ "step": 301
1815
+ },
1816
+ {
1817
+ "epoch": 1.26,
1818
+ "learning_rate": 1.3004064051806712e-05,
1819
+ "loss": 0.3577,
1820
+ "step": 302
1821
+ },
1822
+ {
1823
+ "epoch": 1.27,
1824
+ "learning_rate": 1.2960918593189005e-05,
1825
+ "loss": 0.3629,
1826
+ "step": 303
1827
+ },
1828
+ {
1829
+ "epoch": 1.27,
1830
+ "learning_rate": 1.2917712634503148e-05,
1831
+ "loss": 0.3581,
1832
+ "step": 304
1833
+ },
1834
+ {
1835
+ "epoch": 1.28,
1836
+ "learning_rate": 1.2874447058570927e-05,
1837
+ "loss": 0.3801,
1838
+ "step": 305
1839
+ },
1840
+ {
1841
+ "epoch": 1.28,
1842
+ "learning_rate": 1.2831122749432278e-05,
1843
+ "loss": 0.3409,
1844
+ "step": 306
1845
+ },
1846
+ {
1847
+ "epoch": 1.28,
1848
+ "learning_rate": 1.2787740592327232e-05,
1849
+ "loss": 0.3563,
1850
+ "step": 307
1851
+ },
1852
+ {
1853
+ "epoch": 1.29,
1854
+ "learning_rate": 1.2744301473677814e-05,
1855
+ "loss": 0.3664,
1856
+ "step": 308
1857
+ },
1858
+ {
1859
+ "epoch": 1.29,
1860
+ "learning_rate": 1.2700806281069942e-05,
1861
+ "loss": 0.3702,
1862
+ "step": 309
1863
+ },
1864
+ {
1865
+ "epoch": 1.3,
1866
+ "learning_rate": 1.2657255903235278e-05,
1867
+ "loss": 0.3679,
1868
+ "step": 310
1869
+ },
1870
+ {
1871
+ "epoch": 1.3,
1872
+ "learning_rate": 1.2613651230033085e-05,
1873
+ "loss": 0.3626,
1874
+ "step": 311
1875
+ },
1876
+ {
1877
+ "epoch": 1.31,
1878
+ "learning_rate": 1.2569993152432028e-05,
1879
+ "loss": 0.3574,
1880
+ "step": 312
1881
+ },
1882
+ {
1883
+ "epoch": 1.31,
1884
+ "learning_rate": 1.2526282562491991e-05,
1885
+ "loss": 0.3684,
1886
+ "step": 313
1887
+ },
1888
+ {
1889
+ "epoch": 1.31,
1890
+ "learning_rate": 1.2482520353345819e-05,
1891
+ "loss": 0.3683,
1892
+ "step": 314
1893
+ },
1894
+ {
1895
+ "epoch": 1.32,
1896
+ "learning_rate": 1.2438707419181097e-05,
1897
+ "loss": 0.3704,
1898
+ "step": 315
1899
+ },
1900
+ {
1901
+ "epoch": 1.32,
1902
+ "learning_rate": 1.2394844655221863e-05,
1903
+ "loss": 0.3782,
1904
+ "step": 316
1905
+ },
1906
+ {
1907
+ "epoch": 1.33,
1908
+ "learning_rate": 1.2350932957710322e-05,
1909
+ "loss": 0.3704,
1910
+ "step": 317
1911
+ },
1912
+ {
1913
+ "epoch": 1.33,
1914
+ "learning_rate": 1.2306973223888535e-05,
1915
+ "loss": 0.3752,
1916
+ "step": 318
1917
+ },
1918
+ {
1919
+ "epoch": 1.33,
1920
+ "learning_rate": 1.2262966351980075e-05,
1921
+ "loss": 0.3592,
1922
+ "step": 319
1923
+ },
1924
+ {
1925
+ "epoch": 1.34,
1926
+ "learning_rate": 1.2218913241171691e-05,
1927
+ "loss": 0.3506,
1928
+ "step": 320
1929
+ },
1930
+ {
1931
+ "epoch": 1.34,
1932
+ "learning_rate": 1.2174814791594913e-05,
1933
+ "loss": 0.3413,
1934
+ "step": 321
1935
+ },
1936
+ {
1937
+ "epoch": 1.35,
1938
+ "learning_rate": 1.2130671904307692e-05,
1939
+ "loss": 0.3435,
1940
+ "step": 322
1941
+ },
1942
+ {
1943
+ "epoch": 1.35,
1944
+ "learning_rate": 1.2086485481275943e-05,
1945
+ "loss": 0.3685,
1946
+ "step": 323
1947
+ },
1948
+ {
1949
+ "epoch": 1.36,
1950
+ "learning_rate": 1.2042256425355165e-05,
1951
+ "loss": 0.3643,
1952
+ "step": 324
1953
+ },
1954
+ {
1955
+ "epoch": 1.36,
1956
+ "learning_rate": 1.1997985640271956e-05,
1957
+ "loss": 0.3758,
1958
+ "step": 325
1959
+ },
1960
+ {
1961
+ "epoch": 1.36,
1962
+ "learning_rate": 1.1953674030605568e-05,
1963
+ "loss": 0.3781,
1964
+ "step": 326
1965
+ },
1966
+ {
1967
+ "epoch": 1.37,
1968
+ "learning_rate": 1.1909322501769407e-05,
1969
+ "loss": 0.3549,
1970
+ "step": 327
1971
+ },
1972
+ {
1973
+ "epoch": 1.37,
1974
+ "learning_rate": 1.186493195999255e-05,
1975
+ "loss": 0.363,
1976
+ "step": 328
1977
+ },
1978
+ {
1979
+ "epoch": 1.38,
1980
+ "learning_rate": 1.1820503312301218e-05,
1981
+ "loss": 0.3435,
1982
+ "step": 329
1983
+ },
1984
+ {
1985
+ "epoch": 1.38,
1986
+ "learning_rate": 1.1776037466500245e-05,
1987
+ "loss": 0.3626,
1988
+ "step": 330
1989
+ },
1990
+ {
1991
+ "epoch": 1.38,
1992
+ "learning_rate": 1.1731535331154532e-05,
1993
+ "loss": 0.3714,
1994
+ "step": 331
1995
+ },
1996
+ {
1997
+ "epoch": 1.39,
1998
+ "learning_rate": 1.1686997815570473e-05,
1999
+ "loss": 0.358,
2000
+ "step": 332
2001
+ },
2002
+ {
2003
+ "epoch": 1.39,
2004
+ "learning_rate": 1.1642425829777391e-05,
2005
+ "loss": 0.3592,
2006
+ "step": 333
2007
+ },
2008
+ {
2009
+ "epoch": 1.4,
2010
+ "learning_rate": 1.1597820284508927e-05,
2011
+ "loss": 0.3675,
2012
+ "step": 334
2013
+ },
2014
+ {
2015
+ "epoch": 1.4,
2016
+ "learning_rate": 1.1553182091184439e-05,
2017
+ "loss": 0.3545,
2018
+ "step": 335
2019
+ },
2020
+ {
2021
+ "epoch": 1.41,
2022
+ "learning_rate": 1.1508512161890381e-05,
2023
+ "loss": 0.3593,
2024
+ "step": 336
2025
+ },
2026
+ {
2027
+ "epoch": 1.41,
2028
+ "learning_rate": 1.1463811409361667e-05,
2029
+ "loss": 0.3663,
2030
+ "step": 337
2031
+ },
2032
+ {
2033
+ "epoch": 1.41,
2034
+ "learning_rate": 1.1419080746963012e-05,
2035
+ "loss": 0.3528,
2036
+ "step": 338
2037
+ },
2038
+ {
2039
+ "epoch": 1.42,
2040
+ "learning_rate": 1.1374321088670277e-05,
2041
+ "loss": 0.3664,
2042
+ "step": 339
2043
+ },
2044
+ {
2045
+ "epoch": 1.42,
2046
+ "learning_rate": 1.1329533349051794e-05,
2047
+ "loss": 0.3702,
2048
+ "step": 340
2049
+ },
2050
+ {
2051
+ "epoch": 1.43,
2052
+ "learning_rate": 1.1284718443249676e-05,
2053
+ "loss": 0.3712,
2054
+ "step": 341
2055
+ },
2056
+ {
2057
+ "epoch": 1.43,
2058
+ "learning_rate": 1.1239877286961123e-05,
2059
+ "loss": 0.3572,
2060
+ "step": 342
2061
+ },
2062
+ {
2063
+ "epoch": 1.44,
2064
+ "learning_rate": 1.11950107964197e-05,
2065
+ "loss": 0.3646,
2066
+ "step": 343
2067
+ },
2068
+ {
2069
+ "epoch": 1.44,
2070
+ "learning_rate": 1.1150119888376631e-05,
2071
+ "loss": 0.3399,
2072
+ "step": 344
2073
+ },
2074
+ {
2075
+ "epoch": 1.44,
2076
+ "learning_rate": 1.1105205480082052e-05,
2077
+ "loss": 0.3683,
2078
+ "step": 345
2079
+ },
2080
+ {
2081
+ "epoch": 1.45,
2082
+ "learning_rate": 1.106026848926629e-05,
2083
+ "loss": 0.356,
2084
+ "step": 346
2085
+ },
2086
+ {
2087
+ "epoch": 1.45,
2088
+ "learning_rate": 1.1015309834121083e-05,
2089
+ "loss": 0.3749,
2090
+ "step": 347
2091
+ },
2092
+ {
2093
+ "epoch": 1.46,
2094
+ "learning_rate": 1.0970330433280838e-05,
2095
+ "loss": 0.3572,
2096
+ "step": 348
2097
+ },
2098
+ {
2099
+ "epoch": 1.46,
2100
+ "learning_rate": 1.0925331205803861e-05,
2101
+ "loss": 0.3536,
2102
+ "step": 349
2103
+ },
2104
+ {
2105
+ "epoch": 1.46,
2106
+ "learning_rate": 1.0880313071153568e-05,
2107
+ "loss": 0.3611,
2108
+ "step": 350
2109
+ },
2110
+ {
2111
+ "epoch": 1.47,
2112
+ "learning_rate": 1.0835276949179707e-05,
2113
+ "loss": 0.3735,
2114
+ "step": 351
2115
+ },
2116
+ {
2117
+ "epoch": 1.47,
2118
+ "learning_rate": 1.079022376009955e-05,
2119
+ "loss": 0.3728,
2120
+ "step": 352
2121
+ },
2122
+ {
2123
+ "epoch": 1.48,
2124
+ "learning_rate": 1.0745154424479112e-05,
2125
+ "loss": 0.3526,
2126
+ "step": 353
2127
+ },
2128
+ {
2129
+ "epoch": 1.48,
2130
+ "learning_rate": 1.0700069863214317e-05,
2131
+ "loss": 0.3574,
2132
+ "step": 354
2133
+ },
2134
+ {
2135
+ "epoch": 1.49,
2136
+ "learning_rate": 1.0654970997512201e-05,
2137
+ "loss": 0.3753,
2138
+ "step": 355
2139
+ },
2140
+ {
2141
+ "epoch": 1.49,
2142
+ "learning_rate": 1.0609858748872073e-05,
2143
+ "loss": 0.3666,
2144
+ "step": 356
2145
+ },
2146
+ {
2147
+ "epoch": 1.49,
2148
+ "learning_rate": 1.05647340390667e-05,
2149
+ "loss": 0.346,
2150
+ "step": 357
2151
+ },
2152
+ {
2153
+ "epoch": 1.5,
2154
+ "learning_rate": 1.0519597790123463e-05,
2155
+ "loss": 0.3615,
2156
+ "step": 358
2157
+ },
2158
+ {
2159
+ "epoch": 1.5,
2160
+ "learning_rate": 1.047445092430552e-05,
2161
+ "loss": 0.3601,
2162
+ "step": 359
2163
+ },
2164
+ {
2165
+ "epoch": 1.51,
2166
+ "learning_rate": 1.0429294364092968e-05,
2167
+ "loss": 0.3692,
2168
+ "step": 360
2169
+ },
2170
+ {
2171
+ "epoch": 1.51,
2172
+ "learning_rate": 1.0384129032163976e-05,
2173
+ "loss": 0.3886,
2174
+ "step": 361
2175
+ },
2176
+ {
2177
+ "epoch": 1.51,
2178
+ "learning_rate": 1.0338955851375962e-05,
2179
+ "loss": 0.3539,
2180
+ "step": 362
2181
+ },
2182
+ {
2183
+ "epoch": 1.52,
2184
+ "learning_rate": 1.0293775744746705e-05,
2185
+ "loss": 0.3746,
2186
+ "step": 363
2187
+ },
2188
+ {
2189
+ "epoch": 1.52,
2190
+ "learning_rate": 1.0248589635435505e-05,
2191
+ "loss": 0.3851,
2192
+ "step": 364
2193
+ },
2194
+ {
2195
+ "epoch": 1.53,
2196
+ "learning_rate": 1.0203398446724306e-05,
2197
+ "loss": 0.3546,
2198
+ "step": 365
2199
+ },
2200
+ {
2201
+ "epoch": 1.53,
2202
+ "learning_rate": 1.0158203101998854e-05,
2203
+ "loss": 0.3599,
2204
+ "step": 366
2205
+ },
2206
+ {
2207
+ "epoch": 1.54,
2208
+ "learning_rate": 1.01130045247298e-05,
2209
+ "loss": 0.3639,
2210
+ "step": 367
2211
+ },
2212
+ {
2213
+ "epoch": 1.54,
2214
+ "learning_rate": 1.0067803638453847e-05,
2215
+ "loss": 0.3592,
2216
+ "step": 368
2217
+ },
2218
+ {
2219
+ "epoch": 1.54,
2220
+ "learning_rate": 1.0022601366754889e-05,
2221
+ "loss": 0.3565,
2222
+ "step": 369
2223
+ },
2224
+ {
2225
+ "epoch": 1.55,
2226
+ "learning_rate": 9.977398633245116e-06,
2227
+ "loss": 0.3535,
2228
+ "step": 370
2229
+ },
2230
+ {
2231
+ "epoch": 1.55,
2232
+ "learning_rate": 9.932196361546156e-06,
2233
+ "loss": 0.354,
2234
+ "step": 371
2235
+ },
2236
+ {
2237
+ "epoch": 1.56,
2238
+ "learning_rate": 9.886995475270205e-06,
2239
+ "loss": 0.3416,
2240
+ "step": 372
2241
+ },
2242
+ {
2243
+ "epoch": 1.56,
2244
+ "learning_rate": 9.84179689800115e-06,
2245
+ "loss": 0.3462,
2246
+ "step": 373
2247
+ },
2248
+ {
2249
+ "epoch": 1.56,
2250
+ "learning_rate": 9.796601553275697e-06,
2251
+ "loss": 0.3709,
2252
+ "step": 374
2253
+ },
2254
+ {
2255
+ "epoch": 1.57,
2256
+ "learning_rate": 9.751410364564499e-06,
2257
+ "loss": 0.3574,
2258
+ "step": 375
2259
+ },
2260
+ {
2261
+ "epoch": 1.57,
2262
+ "learning_rate": 9.706224255253297e-06,
2263
+ "loss": 0.3694,
2264
+ "step": 376
2265
+ },
2266
+ {
2267
+ "epoch": 1.58,
2268
+ "learning_rate": 9.661044148624038e-06,
2269
+ "loss": 0.354,
2270
+ "step": 377
2271
+ },
2272
+ {
2273
+ "epoch": 1.58,
2274
+ "learning_rate": 9.615870967836026e-06,
2275
+ "loss": 0.3834,
2276
+ "step": 378
2277
+ },
2278
+ {
2279
+ "epoch": 1.59,
2280
+ "learning_rate": 9.570705635907038e-06,
2281
+ "loss": 0.3805,
2282
+ "step": 379
2283
+ },
2284
+ {
2285
+ "epoch": 1.59,
2286
+ "learning_rate": 9.525549075694484e-06,
2287
+ "loss": 0.3694,
2288
+ "step": 380
2289
+ },
2290
+ {
2291
+ "epoch": 1.59,
2292
+ "learning_rate": 9.48040220987654e-06,
2293
+ "loss": 0.3533,
2294
+ "step": 381
2295
+ },
2296
+ {
2297
+ "epoch": 1.6,
2298
+ "learning_rate": 9.435265960933304e-06,
2299
+ "loss": 0.3654,
2300
+ "step": 382
2301
+ },
2302
+ {
2303
+ "epoch": 1.6,
2304
+ "learning_rate": 9.39014125112793e-06,
2305
+ "loss": 0.3586,
2306
+ "step": 383
2307
+ },
2308
+ {
2309
+ "epoch": 1.61,
2310
+ "learning_rate": 9.3450290024878e-06,
2311
+ "loss": 0.3707,
2312
+ "step": 384
2313
+ },
2314
+ {
2315
+ "epoch": 1.61,
2316
+ "learning_rate": 9.299930136785685e-06,
2317
+ "loss": 0.3409,
2318
+ "step": 385
2319
+ },
2320
+ {
2321
+ "epoch": 1.62,
2322
+ "learning_rate": 9.25484557552089e-06,
2323
+ "loss": 0.3458,
2324
+ "step": 386
2325
+ },
2326
+ {
2327
+ "epoch": 1.62,
2328
+ "learning_rate": 9.209776239900453e-06,
2329
+ "loss": 0.3624,
2330
+ "step": 387
2331
+ },
2332
+ {
2333
+ "epoch": 1.62,
2334
+ "learning_rate": 9.164723050820298e-06,
2335
+ "loss": 0.3684,
2336
+ "step": 388
2337
+ },
2338
+ {
2339
+ "epoch": 1.63,
2340
+ "learning_rate": 9.119686928846437e-06,
2341
+ "loss": 0.3856,
2342
+ "step": 389
2343
+ },
2344
+ {
2345
+ "epoch": 1.63,
2346
+ "learning_rate": 9.074668794196142e-06,
2347
+ "loss": 0.3841,
2348
+ "step": 390
2349
+ },
2350
+ {
2351
+ "epoch": 1.64,
2352
+ "learning_rate": 9.029669566719165e-06,
2353
+ "loss": 0.3658,
2354
+ "step": 391
2355
+ },
2356
+ {
2357
+ "epoch": 1.64,
2358
+ "learning_rate": 8.98469016587892e-06,
2359
+ "loss": 0.3515,
2360
+ "step": 392
2361
+ },
2362
+ {
2363
+ "epoch": 1.64,
2364
+ "learning_rate": 8.939731510733711e-06,
2365
+ "loss": 0.3672,
2366
+ "step": 393
2367
+ },
2368
+ {
2369
+ "epoch": 1.65,
2370
+ "learning_rate": 8.894794519917947e-06,
2371
+ "loss": 0.386,
2372
+ "step": 394
2373
+ },
2374
+ {
2375
+ "epoch": 1.65,
2376
+ "learning_rate": 8.849880111623374e-06,
2377
+ "loss": 0.3662,
2378
+ "step": 395
2379
+ },
2380
+ {
2381
+ "epoch": 1.66,
2382
+ "learning_rate": 8.804989203580303e-06,
2383
+ "loss": 0.3821,
2384
+ "step": 396
2385
+ },
2386
+ {
2387
+ "epoch": 1.66,
2388
+ "learning_rate": 8.76012271303888e-06,
2389
+ "loss": 0.3603,
2390
+ "step": 397
2391
+ },
2392
+ {
2393
+ "epoch": 1.67,
2394
+ "learning_rate": 8.715281556750327e-06,
2395
+ "loss": 0.3587,
2396
+ "step": 398
2397
+ },
2398
+ {
2399
+ "epoch": 1.67,
2400
+ "learning_rate": 8.670466650948208e-06,
2401
+ "loss": 0.3516,
2402
+ "step": 399
2403
+ },
2404
+ {
2405
+ "epoch": 1.67,
2406
+ "learning_rate": 8.625678911329727e-06,
2407
+ "loss": 0.3782,
2408
+ "step": 400
2409
+ },
2410
+ {
2411
+ "epoch": 1.68,
2412
+ "learning_rate": 8.580919253036991e-06,
2413
+ "loss": 0.3443,
2414
+ "step": 401
2415
+ },
2416
+ {
2417
+ "epoch": 1.68,
2418
+ "learning_rate": 8.536188590638334e-06,
2419
+ "loss": 0.3498,
2420
+ "step": 402
2421
+ },
2422
+ {
2423
+ "epoch": 1.69,
2424
+ "learning_rate": 8.491487838109622e-06,
2425
+ "loss": 0.3547,
2426
+ "step": 403
2427
+ },
2428
+ {
2429
+ "epoch": 1.69,
2430
+ "learning_rate": 8.446817908815566e-06,
2431
+ "loss": 0.3432,
2432
+ "step": 404
2433
+ },
2434
+ {
2435
+ "epoch": 1.69,
2436
+ "learning_rate": 8.402179715491078e-06,
2437
+ "loss": 0.3499,
2438
+ "step": 405
2439
+ },
2440
+ {
2441
+ "epoch": 1.7,
2442
+ "learning_rate": 8.357574170222612e-06,
2443
+ "loss": 0.3522,
2444
+ "step": 406
2445
+ },
2446
+ {
2447
+ "epoch": 1.7,
2448
+ "learning_rate": 8.313002184429529e-06,
2449
+ "loss": 0.3561,
2450
+ "step": 407
2451
+ },
2452
+ {
2453
+ "epoch": 1.71,
2454
+ "learning_rate": 8.268464668845471e-06,
2455
+ "loss": 0.345,
2456
+ "step": 408
2457
+ },
2458
+ {
2459
+ "epoch": 1.71,
2460
+ "learning_rate": 8.223962533499757e-06,
2461
+ "loss": 0.358,
2462
+ "step": 409
2463
+ },
2464
+ {
2465
+ "epoch": 1.72,
2466
+ "learning_rate": 8.179496687698785e-06,
2467
+ "loss": 0.3514,
2468
+ "step": 410
2469
+ },
2470
+ {
2471
+ "epoch": 1.72,
2472
+ "learning_rate": 8.135068040007452e-06,
2473
+ "loss": 0.3495,
2474
+ "step": 411
2475
+ },
2476
+ {
2477
+ "epoch": 1.72,
2478
+ "learning_rate": 8.090677498230598e-06,
2479
+ "loss": 0.3693,
2480
+ "step": 412
2481
+ },
2482
+ {
2483
+ "epoch": 1.73,
2484
+ "learning_rate": 8.046325969394437e-06,
2485
+ "loss": 0.3638,
2486
+ "step": 413
2487
+ },
2488
+ {
2489
+ "epoch": 1.73,
2490
+ "learning_rate": 8.002014359728046e-06,
2491
+ "loss": 0.363,
2492
+ "step": 414
2493
+ },
2494
+ {
2495
+ "epoch": 1.74,
2496
+ "learning_rate": 7.957743574644837e-06,
2497
+ "loss": 0.3701,
2498
+ "step": 415
2499
+ },
2500
+ {
2501
+ "epoch": 1.74,
2502
+ "learning_rate": 7.913514518724059e-06,
2503
+ "loss": 0.3653,
2504
+ "step": 416
2505
+ },
2506
+ {
2507
+ "epoch": 1.74,
2508
+ "learning_rate": 7.869328095692313e-06,
2509
+ "loss": 0.3685,
2510
+ "step": 417
2511
+ },
2512
+ {
2513
+ "epoch": 1.75,
2514
+ "learning_rate": 7.825185208405089e-06,
2515
+ "loss": 0.3587,
2516
+ "step": 418
2517
+ },
2518
+ {
2519
+ "epoch": 1.75,
2520
+ "learning_rate": 7.781086758828314e-06,
2521
+ "loss": 0.3902,
2522
+ "step": 419
2523
+ },
2524
+ {
2525
+ "epoch": 1.76,
2526
+ "learning_rate": 7.73703364801993e-06,
2527
+ "loss": 0.3687,
2528
+ "step": 420
2529
+ },
2530
+ {
2531
+ "epoch": 1.76,
2532
+ "learning_rate": 7.69302677611147e-06,
2533
+ "loss": 0.3711,
2534
+ "step": 421
2535
+ },
2536
+ {
2537
+ "epoch": 1.77,
2538
+ "learning_rate": 7.649067042289681e-06,
2539
+ "loss": 0.3389,
2540
+ "step": 422
2541
+ },
2542
+ {
2543
+ "epoch": 1.77,
2544
+ "learning_rate": 7.6051553447781415e-06,
2545
+ "loss": 0.3502,
2546
+ "step": 423
2547
+ },
2548
+ {
2549
+ "epoch": 1.77,
2550
+ "learning_rate": 7.561292580818906e-06,
2551
+ "loss": 0.3586,
2552
+ "step": 424
2553
+ },
2554
+ {
2555
+ "epoch": 1.78,
2556
+ "learning_rate": 7.517479646654184e-06,
2557
+ "loss": 0.3741,
2558
+ "step": 425
2559
+ },
2560
+ {
2561
+ "epoch": 1.78,
2562
+ "learning_rate": 7.47371743750801e-06,
2563
+ "loss": 0.3612,
2564
+ "step": 426
2565
+ },
2566
+ {
2567
+ "epoch": 1.79,
2568
+ "learning_rate": 7.430006847567972e-06,
2569
+ "loss": 0.3683,
2570
+ "step": 427
2571
+ },
2572
+ {
2573
+ "epoch": 1.79,
2574
+ "learning_rate": 7.386348769966918e-06,
2575
+ "loss": 0.3602,
2576
+ "step": 428
2577
+ },
2578
+ {
2579
+ "epoch": 1.79,
2580
+ "learning_rate": 7.342744096764727e-06,
2581
+ "loss": 0.3432,
2582
+ "step": 429
2583
+ },
2584
+ {
2585
+ "epoch": 1.8,
2586
+ "learning_rate": 7.299193718930062e-06,
2587
+ "loss": 0.3879,
2588
+ "step": 430
2589
+ },
2590
+ {
2591
+ "epoch": 1.8,
2592
+ "learning_rate": 7.255698526322188e-06,
2593
+ "loss": 0.3776,
2594
+ "step": 431
2595
+ },
2596
+ {
2597
+ "epoch": 1.81,
2598
+ "learning_rate": 7.2122594076727705e-06,
2599
+ "loss": 0.3681,
2600
+ "step": 432
2601
+ },
2602
+ {
2603
+ "epoch": 1.81,
2604
+ "learning_rate": 7.1688772505677225e-06,
2605
+ "loss": 0.3502,
2606
+ "step": 433
2607
+ },
2608
+ {
2609
+ "epoch": 1.82,
2610
+ "learning_rate": 7.125552941429077e-06,
2611
+ "loss": 0.3609,
2612
+ "step": 434
2613
+ },
2614
+ {
2615
+ "epoch": 1.82,
2616
+ "learning_rate": 7.082287365496852e-06,
2617
+ "loss": 0.3782,
2618
+ "step": 435
2619
+ },
2620
+ {
2621
+ "epoch": 1.82,
2622
+ "learning_rate": 7.0390814068109965e-06,
2623
+ "loss": 0.3778,
2624
+ "step": 436
2625
+ },
2626
+ {
2627
+ "epoch": 1.83,
2628
+ "learning_rate": 6.995935948193294e-06,
2629
+ "loss": 0.3512,
2630
+ "step": 437
2631
+ },
2632
+ {
2633
+ "epoch": 1.83,
2634
+ "learning_rate": 6.9528518712293405e-06,
2635
+ "loss": 0.371,
2636
+ "step": 438
2637
+ },
2638
+ {
2639
+ "epoch": 1.84,
2640
+ "learning_rate": 6.909830056250527e-06,
2641
+ "loss": 0.3671,
2642
+ "step": 439
2643
+ },
2644
+ {
2645
+ "epoch": 1.84,
2646
+ "learning_rate": 6.866871382316063e-06,
2647
+ "loss": 0.3607,
2648
+ "step": 440
2649
+ },
2650
+ {
2651
+ "epoch": 1.85,
2652
+ "learning_rate": 6.823976727194994e-06,
2653
+ "loss": 0.3655,
2654
+ "step": 441
2655
+ },
2656
+ {
2657
+ "epoch": 1.85,
2658
+ "learning_rate": 6.781146967348283e-06,
2659
+ "loss": 0.3604,
2660
+ "step": 442
2661
+ },
2662
+ {
2663
+ "epoch": 1.85,
2664
+ "learning_rate": 6.738382977910898e-06,
2665
+ "loss": 0.3622,
2666
+ "step": 443
2667
+ },
2668
+ {
2669
+ "epoch": 1.86,
2670
+ "learning_rate": 6.695685632673919e-06,
2671
+ "loss": 0.3552,
2672
+ "step": 444
2673
+ },
2674
+ {
2675
+ "epoch": 1.86,
2676
+ "learning_rate": 6.653055804066712e-06,
2677
+ "loss": 0.3643,
2678
+ "step": 445
2679
+ },
2680
+ {
2681
+ "epoch": 1.87,
2682
+ "learning_rate": 6.6104943631390596e-06,
2683
+ "loss": 0.351,
2684
+ "step": 446
2685
+ },
2686
+ {
2687
+ "epoch": 1.87,
2688
+ "learning_rate": 6.568002179543409e-06,
2689
+ "loss": 0.3676,
2690
+ "step": 447
2691
+ },
2692
+ {
2693
+ "epoch": 1.87,
2694
+ "learning_rate": 6.525580121517069e-06,
2695
+ "loss": 0.3563,
2696
+ "step": 448
2697
+ },
2698
+ {
2699
+ "epoch": 1.88,
2700
+ "learning_rate": 6.48322905586449e-06,
2701
+ "loss": 0.3502,
2702
+ "step": 449
2703
+ },
2704
+ {
2705
+ "epoch": 1.88,
2706
+ "learning_rate": 6.440949847939538e-06,
2707
+ "loss": 0.3845,
2708
+ "step": 450
2709
+ },
2710
+ {
2711
+ "epoch": 1.89,
2712
+ "learning_rate": 6.39874336162782e-06,
2713
+ "loss": 0.3567,
2714
+ "step": 451
2715
+ },
2716
+ {
2717
+ "epoch": 1.89,
2718
+ "learning_rate": 6.356610459329038e-06,
2719
+ "loss": 0.3552,
2720
+ "step": 452
2721
+ },
2722
+ {
2723
+ "epoch": 1.9,
2724
+ "learning_rate": 6.314552001939351e-06,
2725
+ "loss": 0.353,
2726
+ "step": 453
2727
+ },
2728
+ {
2729
+ "epoch": 1.9,
2730
+ "learning_rate": 6.272568848833809e-06,
2731
+ "loss": 0.3603,
2732
+ "step": 454
2733
+ },
2734
+ {
2735
+ "epoch": 1.9,
2736
+ "learning_rate": 6.230661857848759e-06,
2737
+ "loss": 0.3501,
2738
+ "step": 455
2739
+ },
2740
+ {
2741
+ "epoch": 1.91,
2742
+ "learning_rate": 6.188831885264357e-06,
2743
+ "loss": 0.3576,
2744
+ "step": 456
2745
+ },
2746
+ {
2747
+ "epoch": 1.91,
2748
+ "learning_rate": 6.147079785787038e-06,
2749
+ "loss": 0.3458,
2750
+ "step": 457
2751
+ },
2752
+ {
2753
+ "epoch": 1.92,
2754
+ "learning_rate": 6.105406412532078e-06,
2755
+ "loss": 0.3639,
2756
+ "step": 458
2757
+ },
2758
+ {
2759
+ "epoch": 1.92,
2760
+ "learning_rate": 6.06381261700614e-06,
2761
+ "loss": 0.3604,
2762
+ "step": 459
2763
+ },
2764
+ {
2765
+ "epoch": 1.92,
2766
+ "learning_rate": 6.022299249089889e-06,
2767
+ "loss": 0.358,
2768
+ "step": 460
2769
+ },
2770
+ {
2771
+ "epoch": 1.93,
2772
+ "learning_rate": 5.980867157020624e-06,
2773
+ "loss": 0.3592,
2774
+ "step": 461
2775
+ },
2776
+ {
2777
+ "epoch": 1.93,
2778
+ "learning_rate": 5.93951718737495e-06,
2779
+ "loss": 0.3664,
2780
+ "step": 462
2781
+ },
2782
+ {
2783
+ "epoch": 1.94,
2784
+ "learning_rate": 5.8982501850514614e-06,
2785
+ "loss": 0.351,
2786
+ "step": 463
2787
+ },
2788
+ {
2789
+ "epoch": 1.94,
2790
+ "learning_rate": 5.857066993253501e-06,
2791
+ "loss": 0.3587,
2792
+ "step": 464
2793
+ },
2794
+ {
2795
+ "epoch": 1.95,
2796
+ "learning_rate": 5.815968453471923e-06,
2797
+ "loss": 0.3763,
2798
+ "step": 465
2799
+ },
2800
+ {
2801
+ "epoch": 1.95,
2802
+ "learning_rate": 5.7749554054679015e-06,
2803
+ "loss": 0.3431,
2804
+ "step": 466
2805
+ },
2806
+ {
2807
+ "epoch": 1.95,
2808
+ "learning_rate": 5.7340286872557515e-06,
2809
+ "loss": 0.3664,
2810
+ "step": 467
2811
+ },
2812
+ {
2813
+ "epoch": 1.96,
2814
+ "learning_rate": 5.693189135085839e-06,
2815
+ "loss": 0.3623,
2816
+ "step": 468
2817
+ },
2818
+ {
2819
+ "epoch": 1.96,
2820
+ "learning_rate": 5.652437583427478e-06,
2821
+ "loss": 0.3609,
2822
+ "step": 469
2823
+ },
2824
+ {
2825
+ "epoch": 1.97,
2826
+ "learning_rate": 5.6117748649518665e-06,
2827
+ "loss": 0.3651,
2828
+ "step": 470
2829
+ },
2830
+ {
2831
+ "epoch": 1.97,
2832
+ "learning_rate": 5.5712018105150914e-06,
2833
+ "loss": 0.3439,
2834
+ "step": 471
2835
+ },
2836
+ {
2837
+ "epoch": 1.97,
2838
+ "learning_rate": 5.530719249141148e-06,
2839
+ "loss": 0.3526,
2840
+ "step": 472
2841
+ },
2842
+ {
2843
+ "epoch": 1.98,
2844
+ "learning_rate": 5.490328008005002e-06,
2845
+ "loss": 0.3275,
2846
+ "step": 473
2847
+ },
2848
+ {
2849
+ "epoch": 1.98,
2850
+ "learning_rate": 5.450028912415672e-06,
2851
+ "loss": 0.3615,
2852
+ "step": 474
2853
+ },
2854
+ {
2855
+ "epoch": 1.99,
2856
+ "learning_rate": 5.409822785799393e-06,
2857
+ "loss": 0.3678,
2858
+ "step": 475
2859
+ },
2860
+ {
2861
+ "epoch": 1.99,
2862
+ "learning_rate": 5.369710449682767e-06,
2863
+ "loss": 0.3561,
2864
+ "step": 476
2865
+ },
2866
+ {
2867
+ "epoch": 2.0,
2868
+ "learning_rate": 5.329692723675994e-06,
2869
+ "loss": 0.37,
2870
+ "step": 477
2871
+ },
2872
+ {
2873
+ "epoch": 2.0,
2874
+ "learning_rate": 5.289770425456109e-06,
2875
+ "loss": 0.3358,
2876
+ "step": 478
2877
+ },
2878
+ {
2879
+ "epoch": 2.0,
2880
+ "learning_rate": 5.249944370750293e-06,
2881
+ "loss": 0.2953,
2882
+ "step": 479
2883
+ },
2884
+ {
2885
+ "epoch": 2.01,
2886
+ "learning_rate": 5.210215373319183e-06,
2887
+ "loss": 0.282,
2888
+ "step": 480
2889
+ },
2890
+ {
2891
+ "epoch": 2.01,
2892
+ "learning_rate": 5.170584244940275e-06,
2893
+ "loss": 0.2803,
2894
+ "step": 481
2895
+ },
2896
+ {
2897
+ "epoch": 2.02,
2898
+ "learning_rate": 5.131051795391302e-06,
2899
+ "loss": 0.3133,
2900
+ "step": 482
2901
+ },
2902
+ {
2903
+ "epoch": 2.02,
2904
+ "learning_rate": 5.091618832433716e-06,
2905
+ "loss": 0.2831,
2906
+ "step": 483
2907
+ },
2908
+ {
2909
+ "epoch": 2.03,
2910
+ "learning_rate": 5.0522861617961694e-06,
2911
+ "loss": 0.2965,
2912
+ "step": 484
2913
+ },
2914
+ {
2915
+ "epoch": 2.03,
2916
+ "learning_rate": 5.0130545871580504e-06,
2917
+ "loss": 0.2828,
2918
+ "step": 485
2919
+ },
2920
+ {
2921
+ "epoch": 2.03,
2922
+ "learning_rate": 4.973924910133071e-06,
2923
+ "loss": 0.277,
2924
+ "step": 486
2925
+ },
2926
+ {
2927
+ "epoch": 2.04,
2928
+ "learning_rate": 4.934897930252887e-06,
2929
+ "loss": 0.2744,
2930
+ "step": 487
2931
+ },
2932
+ {
2933
+ "epoch": 2.04,
2934
+ "learning_rate": 4.895974444950743e-06,
2935
+ "loss": 0.2799,
2936
+ "step": 488
2937
+ },
2938
+ {
2939
+ "epoch": 2.05,
2940
+ "learning_rate": 4.857155249545197e-06,
2941
+ "loss": 0.2943,
2942
+ "step": 489
2943
+ },
2944
+ {
2945
+ "epoch": 2.05,
2946
+ "learning_rate": 4.8184411372238724e-06,
2947
+ "loss": 0.2936,
2948
+ "step": 490
2949
+ },
2950
+ {
2951
+ "epoch": 2.05,
2952
+ "learning_rate": 4.779832899027243e-06,
2953
+ "loss": 0.2746,
2954
+ "step": 491
2955
+ },
2956
+ {
2957
+ "epoch": 2.06,
2958
+ "learning_rate": 4.7413313238324556e-06,
2959
+ "loss": 0.2795,
2960
+ "step": 492
2961
+ },
2962
+ {
2963
+ "epoch": 2.06,
2964
+ "learning_rate": 4.702937198337241e-06,
2965
+ "loss": 0.2914,
2966
+ "step": 493
2967
+ },
2968
+ {
2969
+ "epoch": 2.07,
2970
+ "learning_rate": 4.66465130704382e-06,
2971
+ "loss": 0.2793,
2972
+ "step": 494
2973
+ },
2974
+ {
2975
+ "epoch": 2.07,
2976
+ "learning_rate": 4.626474432242879e-06,
2977
+ "loss": 0.2842,
2978
+ "step": 495
2979
+ },
2980
+ {
2981
+ "epoch": 2.08,
2982
+ "learning_rate": 4.58840735399758e-06,
2983
+ "loss": 0.2617,
2984
+ "step": 496
2985
+ },
2986
+ {
2987
+ "epoch": 2.08,
2988
+ "learning_rate": 4.550450850127626e-06,
2989
+ "loss": 0.2662,
2990
+ "step": 497
2991
+ },
2992
+ {
2993
+ "epoch": 2.08,
2994
+ "learning_rate": 4.512605696193371e-06,
2995
+ "loss": 0.2748,
2996
+ "step": 498
2997
+ },
2998
+ {
2999
+ "epoch": 2.09,
3000
+ "learning_rate": 4.474872665479974e-06,
3001
+ "loss": 0.2757,
3002
+ "step": 499
3003
+ },
3004
+ {
3005
+ "epoch": 2.09,
3006
+ "learning_rate": 4.437252528981586e-06,
3007
+ "loss": 0.2774,
3008
+ "step": 500
3009
+ },
3010
+ {
3011
+ "epoch": 2.1,
3012
+ "learning_rate": 4.3997460553856095e-06,
3013
+ "loss": 0.2918,
3014
+ "step": 501
3015
+ },
3016
+ {
3017
+ "epoch": 2.1,
3018
+ "learning_rate": 4.3623540110569935e-06,
3019
+ "loss": 0.2629,
3020
+ "step": 502
3021
+ },
3022
+ {
3023
+ "epoch": 2.1,
3024
+ "learning_rate": 4.3250771600225536e-06,
3025
+ "loss": 0.2864,
3026
+ "step": 503
3027
+ },
3028
+ {
3029
+ "epoch": 2.11,
3030
+ "learning_rate": 4.2879162639553925e-06,
3031
+ "loss": 0.2888,
3032
+ "step": 504
3033
+ },
3034
+ {
3035
+ "epoch": 2.11,
3036
+ "learning_rate": 4.250872082159305e-06,
3037
+ "loss": 0.2739,
3038
+ "step": 505
3039
+ },
3040
+ {
3041
+ "epoch": 2.12,
3042
+ "learning_rate": 4.213945371553292e-06,
3043
+ "loss": 0.2943,
3044
+ "step": 506
3045
+ },
3046
+ {
3047
+ "epoch": 2.12,
3048
+ "learning_rate": 4.177136886656067e-06,
3049
+ "loss": 0.2941,
3050
+ "step": 507
3051
+ },
3052
+ {
3053
+ "epoch": 2.13,
3054
+ "learning_rate": 4.140447379570663e-06,
3055
+ "loss": 0.2793,
3056
+ "step": 508
3057
+ },
3058
+ {
3059
+ "epoch": 2.13,
3060
+ "learning_rate": 4.103877599969056e-06,
3061
+ "loss": 0.2721,
3062
+ "step": 509
3063
+ },
3064
+ {
3065
+ "epoch": 2.13,
3066
+ "learning_rate": 4.067428295076833e-06,
3067
+ "loss": 0.2887,
3068
+ "step": 510
3069
+ },
3070
+ {
3071
+ "epoch": 2.14,
3072
+ "learning_rate": 4.0311002096579486e-06,
3073
+ "loss": 0.2767,
3074
+ "step": 511
3075
+ },
3076
+ {
3077
+ "epoch": 2.14,
3078
+ "learning_rate": 3.9948940859994964e-06,
3079
+ "loss": 0.2794,
3080
+ "step": 512
3081
+ },
3082
+ {
3083
+ "epoch": 2.15,
3084
+ "learning_rate": 3.958810663896531e-06,
3085
+ "loss": 0.2671,
3086
+ "step": 513
3087
+ },
3088
+ {
3089
+ "epoch": 2.15,
3090
+ "learning_rate": 3.922850680636968e-06,
3091
+ "loss": 0.2853,
3092
+ "step": 514
3093
+ },
3094
+ {
3095
+ "epoch": 2.15,
3096
+ "learning_rate": 3.8870148709865115e-06,
3097
+ "loss": 0.2874,
3098
+ "step": 515
3099
+ },
3100
+ {
3101
+ "epoch": 2.16,
3102
+ "learning_rate": 3.851303967173647e-06,
3103
+ "loss": 0.285,
3104
+ "step": 516
3105
+ },
3106
+ {
3107
+ "epoch": 2.16,
3108
+ "learning_rate": 3.815718698874672e-06,
3109
+ "loss": 0.2774,
3110
+ "step": 517
3111
+ },
3112
+ {
3113
+ "epoch": 2.17,
3114
+ "learning_rate": 3.780259793198784e-06,
3115
+ "loss": 0.2798,
3116
+ "step": 518
3117
+ },
3118
+ {
3119
+ "epoch": 2.17,
3120
+ "learning_rate": 3.744927974673237e-06,
3121
+ "loss": 0.2784,
3122
+ "step": 519
3123
+ },
3124
+ {
3125
+ "epoch": 2.18,
3126
+ "learning_rate": 3.709723965228531e-06,
3127
+ "loss": 0.2739,
3128
+ "step": 520
3129
+ },
3130
+ {
3131
+ "epoch": 2.18,
3132
+ "learning_rate": 3.6746484841836516e-06,
3133
+ "loss": 0.2825,
3134
+ "step": 521
3135
+ },
3136
+ {
3137
+ "epoch": 2.18,
3138
+ "learning_rate": 3.6397022482313804e-06,
3139
+ "loss": 0.2929,
3140
+ "step": 522
3141
+ },
3142
+ {
3143
+ "epoch": 2.19,
3144
+ "learning_rate": 3.6048859714236597e-06,
3145
+ "loss": 0.2661,
3146
+ "step": 523
3147
+ },
3148
+ {
3149
+ "epoch": 2.19,
3150
+ "learning_rate": 3.5702003651569883e-06,
3151
+ "loss": 0.2726,
3152
+ "step": 524
3153
+ },
3154
+ {
3155
+ "epoch": 2.2,
3156
+ "learning_rate": 3.5356461381578865e-06,
3157
+ "loss": 0.2776,
3158
+ "step": 525
3159
+ },
3160
+ {
3161
+ "epoch": 2.2,
3162
+ "learning_rate": 3.501223996468426e-06,
3163
+ "loss": 0.2857,
3164
+ "step": 526
3165
+ },
3166
+ {
3167
+ "epoch": 2.21,
3168
+ "learning_rate": 3.466934643431795e-06,
3169
+ "loss": 0.2654,
3170
+ "step": 527
3171
+ },
3172
+ {
3173
+ "epoch": 2.21,
3174
+ "learning_rate": 3.432778779677921e-06,
3175
+ "loss": 0.2733,
3176
+ "step": 528
3177
+ },
3178
+ {
3179
+ "epoch": 2.21,
3180
+ "learning_rate": 3.3987571031091735e-06,
3181
+ "loss": 0.2708,
3182
+ "step": 529
3183
+ },
3184
+ {
3185
+ "epoch": 2.22,
3186
+ "learning_rate": 3.36487030888608e-06,
3187
+ "loss": 0.2654,
3188
+ "step": 530
3189
+ },
3190
+ {
3191
+ "epoch": 2.22,
3192
+ "learning_rate": 3.3311190894131495e-06,
3193
+ "loss": 0.2674,
3194
+ "step": 531
3195
+ },
3196
+ {
3197
+ "epoch": 2.23,
3198
+ "learning_rate": 3.2975041343246937e-06,
3199
+ "loss": 0.2899,
3200
+ "step": 532
3201
+ },
3202
+ {
3203
+ "epoch": 2.23,
3204
+ "learning_rate": 3.264026130470762e-06,
3205
+ "loss": 0.2919,
3206
+ "step": 533
3207
+ },
3208
+ {
3209
+ "epoch": 2.23,
3210
+ "learning_rate": 3.230685761903094e-06,
3211
+ "loss": 0.2797,
3212
+ "step": 534
3213
+ },
3214
+ {
3215
+ "epoch": 2.24,
3216
+ "learning_rate": 3.1974837098611487e-06,
3217
+ "loss": 0.2717,
3218
+ "step": 535
3219
+ },
3220
+ {
3221
+ "epoch": 2.24,
3222
+ "learning_rate": 3.1644206527581734e-06,
3223
+ "loss": 0.2965,
3224
+ "step": 536
3225
+ },
3226
+ {
3227
+ "epoch": 2.25,
3228
+ "learning_rate": 3.1314972661673572e-06,
3229
+ "loss": 0.2806,
3230
+ "step": 537
3231
+ },
3232
+ {
3233
+ "epoch": 2.25,
3234
+ "learning_rate": 3.0987142228080137e-06,
3235
+ "loss": 0.2781,
3236
+ "step": 538
3237
+ },
3238
+ {
3239
+ "epoch": 2.26,
3240
+ "learning_rate": 3.0660721925318483e-06,
3241
+ "loss": 0.2628,
3242
+ "step": 539
3243
+ },
3244
+ {
3245
+ "epoch": 2.26,
3246
+ "learning_rate": 3.0335718423092553e-06,
3247
+ "loss": 0.2812,
3248
+ "step": 540
3249
+ },
3250
+ {
3251
+ "epoch": 2.26,
3252
+ "learning_rate": 3.0012138362157062e-06,
3253
+ "loss": 0.2826,
3254
+ "step": 541
3255
+ },
3256
+ {
3257
+ "epoch": 2.27,
3258
+ "learning_rate": 2.9689988354181742e-06,
3259
+ "loss": 0.2956,
3260
+ "step": 542
3261
+ },
3262
+ {
3263
+ "epoch": 2.27,
3264
+ "learning_rate": 2.9369274981616137e-06,
3265
+ "loss": 0.276,
3266
+ "step": 543
3267
+ },
3268
+ {
3269
+ "epoch": 2.28,
3270
+ "learning_rate": 2.905000479755531e-06,
3271
+ "loss": 0.2887,
3272
+ "step": 544
3273
+ },
3274
+ {
3275
+ "epoch": 2.28,
3276
+ "learning_rate": 2.8732184325605815e-06,
3277
+ "loss": 0.2927,
3278
+ "step": 545
3279
+ },
3280
+ {
3281
+ "epoch": 2.28,
3282
+ "learning_rate": 2.8415820059752397e-06,
3283
+ "loss": 0.3025,
3284
+ "step": 546
3285
+ },
3286
+ {
3287
+ "epoch": 2.29,
3288
+ "learning_rate": 2.8100918464225304e-06,
3289
+ "loss": 0.2713,
3290
+ "step": 547
3291
+ },
3292
+ {
3293
+ "epoch": 2.29,
3294
+ "learning_rate": 2.7787485973368288e-06,
3295
+ "loss": 0.2926,
3296
+ "step": 548
3297
+ },
3298
+ {
3299
+ "epoch": 2.3,
3300
+ "learning_rate": 2.7475528991507106e-06,
3301
+ "loss": 0.2858,
3302
+ "step": 549
3303
+ },
3304
+ {
3305
+ "epoch": 2.3,
3306
+ "learning_rate": 2.7165053892818495e-06,
3307
+ "loss": 0.2805,
3308
+ "step": 550
3309
+ },
3310
+ {
3311
+ "epoch": 2.31,
3312
+ "learning_rate": 2.685606702120019e-06,
3313
+ "loss": 0.2966,
3314
+ "step": 551
3315
+ },
3316
+ {
3317
+ "epoch": 2.31,
3318
+ "learning_rate": 2.654857469014113e-06,
3319
+ "loss": 0.2799,
3320
+ "step": 552
3321
+ },
3322
+ {
3323
+ "epoch": 2.31,
3324
+ "learning_rate": 2.624258318259253e-06,
3325
+ "loss": 0.2879,
3326
+ "step": 553
3327
+ },
3328
+ {
3329
+ "epoch": 2.32,
3330
+ "learning_rate": 2.5938098750839414e-06,
3331
+ "loss": 0.2838,
3332
+ "step": 554
3333
+ },
3334
+ {
3335
+ "epoch": 2.32,
3336
+ "learning_rate": 2.563512761637291e-06,
3337
+ "loss": 0.287,
3338
+ "step": 555
3339
+ },
3340
+ {
3341
+ "epoch": 2.33,
3342
+ "learning_rate": 2.5333675969763215e-06,
3343
+ "loss": 0.2933,
3344
+ "step": 556
3345
+ },
3346
+ {
3347
+ "epoch": 2.33,
3348
+ "learning_rate": 2.5033749970533015e-06,
3349
+ "loss": 0.2679,
3350
+ "step": 557
3351
+ },
3352
+ {
3353
+ "epoch": 2.33,
3354
+ "learning_rate": 2.4735355747031566e-06,
3355
+ "loss": 0.2697,
3356
+ "step": 558
3357
+ },
3358
+ {
3359
+ "epoch": 2.34,
3360
+ "learning_rate": 2.443849939630959e-06,
3361
+ "loss": 0.2877,
3362
+ "step": 559
3363
+ },
3364
+ {
3365
+ "epoch": 2.34,
3366
+ "learning_rate": 2.4143186983994715e-06,
3367
+ "loss": 0.2805,
3368
+ "step": 560
3369
+ },
3370
+ {
3371
+ "epoch": 2.35,
3372
+ "learning_rate": 2.384942454416734e-06,
3373
+ "loss": 0.2653,
3374
+ "step": 561
3375
+ },
3376
+ {
3377
+ "epoch": 2.35,
3378
+ "learning_rate": 2.3557218079237608e-06,
3379
+ "loss": 0.2629,
3380
+ "step": 562
3381
+ },
3382
+ {
3383
+ "epoch": 2.36,
3384
+ "learning_rate": 2.3266573559822568e-06,
3385
+ "loss": 0.2791,
3386
+ "step": 563
3387
+ },
3388
+ {
3389
+ "epoch": 2.36,
3390
+ "learning_rate": 2.2977496924624223e-06,
3391
+ "loss": 0.2759,
3392
+ "step": 564
3393
+ },
3394
+ {
3395
+ "epoch": 2.36,
3396
+ "learning_rate": 2.26899940803082e-06,
3397
+ "loss": 0.299,
3398
+ "step": 565
3399
+ },
3400
+ {
3401
+ "epoch": 2.37,
3402
+ "learning_rate": 2.240407090138309e-06,
3403
+ "loss": 0.2782,
3404
+ "step": 566
3405
+ },
3406
+ {
3407
+ "epoch": 2.37,
3408
+ "learning_rate": 2.211973323008041e-06,
3409
+ "loss": 0.2876,
3410
+ "step": 567
3411
+ },
3412
+ {
3413
+ "epoch": 2.38,
3414
+ "learning_rate": 2.183698687623511e-06,
3415
+ "loss": 0.2667,
3416
+ "step": 568
3417
+ },
3418
+ {
3419
+ "epoch": 2.38,
3420
+ "learning_rate": 2.155583761716703e-06,
3421
+ "loss": 0.2801,
3422
+ "step": 569
3423
+ },
3424
+ {
3425
+ "epoch": 2.38,
3426
+ "learning_rate": 2.1276291197562772e-06,
3427
+ "loss": 0.2823,
3428
+ "step": 570
3429
+ },
3430
+ {
3431
+ "epoch": 2.39,
3432
+ "learning_rate": 2.0998353329358355e-06,
3433
+ "loss": 0.2687,
3434
+ "step": 571
3435
+ },
3436
+ {
3437
+ "epoch": 2.39,
3438
+ "learning_rate": 2.072202969162234e-06,
3439
+ "loss": 0.2793,
3440
+ "step": 572
3441
+ },
3442
+ {
3443
+ "epoch": 2.4,
3444
+ "learning_rate": 2.0447325930440043e-06,
3445
+ "loss": 0.2905,
3446
+ "step": 573
3447
+ },
3448
+ {
3449
+ "epoch": 2.4,
3450
+ "learning_rate": 2.0174247658798054e-06,
3451
+ "loss": 0.2711,
3452
+ "step": 574
3453
+ },
3454
+ {
3455
+ "epoch": 2.41,
3456
+ "learning_rate": 1.990280045646954e-06,
3457
+ "loss": 0.2812,
3458
+ "step": 575
3459
+ },
3460
+ {
3461
+ "epoch": 2.41,
3462
+ "learning_rate": 1.9632989869900145e-06,
3463
+ "loss": 0.2889,
3464
+ "step": 576
3465
+ },
3466
+ {
3467
+ "epoch": 2.41,
3468
+ "learning_rate": 1.936482141209486e-06,
3469
+ "loss": 0.2894,
3470
+ "step": 577
3471
+ },
3472
+ {
3473
+ "epoch": 2.42,
3474
+ "learning_rate": 1.9098300562505266e-06,
3475
+ "loss": 0.287,
3476
+ "step": 578
3477
+ },
3478
+ {
3479
+ "epoch": 2.42,
3480
+ "learning_rate": 1.8833432766917514e-06,
3481
+ "loss": 0.265,
3482
+ "step": 579
3483
+ },
3484
+ {
3485
+ "epoch": 2.43,
3486
+ "learning_rate": 1.8570223437341118e-06,
3487
+ "loss": 0.267,
3488
+ "step": 580
3489
+ },
3490
+ {
3491
+ "epoch": 2.43,
3492
+ "learning_rate": 1.8308677951898435e-06,
3493
+ "loss": 0.2806,
3494
+ "step": 581
3495
+ },
3496
+ {
3497
+ "epoch": 2.44,
3498
+ "learning_rate": 1.8048801654714687e-06,
3499
+ "loss": 0.2855,
3500
+ "step": 582
3501
+ },
3502
+ {
3503
+ "epoch": 2.44,
3504
+ "learning_rate": 1.7790599855808732e-06,
3505
+ "loss": 0.2801,
3506
+ "step": 583
3507
+ },
3508
+ {
3509
+ "epoch": 2.44,
3510
+ "learning_rate": 1.7534077830984697e-06,
3511
+ "loss": 0.3075,
3512
+ "step": 584
3513
+ },
3514
+ {
3515
+ "epoch": 2.45,
3516
+ "learning_rate": 1.7279240821724063e-06,
3517
+ "loss": 0.2642,
3518
+ "step": 585
3519
+ },
3520
+ {
3521
+ "epoch": 2.45,
3522
+ "learning_rate": 1.7026094035078589e-06,
3523
+ "loss": 0.2687,
3524
+ "step": 586
3525
+ },
3526
+ {
3527
+ "epoch": 2.46,
3528
+ "learning_rate": 1.6774642643563955e-06,
3529
+ "loss": 0.292,
3530
+ "step": 587
3531
+ },
3532
+ {
3533
+ "epoch": 2.46,
3534
+ "learning_rate": 1.6524891785054097e-06,
3535
+ "loss": 0.2845,
3536
+ "step": 588
3537
+ },
3538
+ {
3539
+ "epoch": 2.46,
3540
+ "learning_rate": 1.6276846562676085e-06,
3541
+ "loss": 0.2811,
3542
+ "step": 589
3543
+ },
3544
+ {
3545
+ "epoch": 2.47,
3546
+ "learning_rate": 1.603051204470597e-06,
3547
+ "loss": 0.2693,
3548
+ "step": 590
3549
+ },
3550
+ {
3551
+ "epoch": 2.47,
3552
+ "learning_rate": 1.5785893264465257e-06,
3553
+ "loss": 0.2862,
3554
+ "step": 591
3555
+ },
3556
+ {
3557
+ "epoch": 2.48,
3558
+ "learning_rate": 1.5542995220217961e-06,
3559
+ "loss": 0.2855,
3560
+ "step": 592
3561
+ },
3562
+ {
3563
+ "epoch": 2.48,
3564
+ "learning_rate": 1.530182287506855e-06,
3565
+ "loss": 0.2739,
3566
+ "step": 593
3567
+ },
3568
+ {
3569
+ "epoch": 2.49,
3570
+ "learning_rate": 1.506238115686044e-06,
3571
+ "loss": 0.2747,
3572
+ "step": 594
3573
+ },
3574
+ {
3575
+ "epoch": 2.49,
3576
+ "learning_rate": 1.4824674958075436e-06,
3577
+ "loss": 0.2855,
3578
+ "step": 595
3579
+ },
3580
+ {
3581
+ "epoch": 2.49,
3582
+ "learning_rate": 1.458870913573368e-06,
3583
+ "loss": 0.2549,
3584
+ "step": 596
3585
+ },
3586
+ {
3587
+ "epoch": 2.5,
3588
+ "learning_rate": 1.4354488511294418e-06,
3589
+ "loss": 0.2839,
3590
+ "step": 597
3591
+ },
3592
+ {
3593
+ "epoch": 2.5,
3594
+ "learning_rate": 1.4122017870557458e-06,
3595
+ "loss": 0.2761,
3596
+ "step": 598
3597
+ },
3598
+ {
3599
+ "epoch": 2.51,
3600
+ "learning_rate": 1.3891301963565473e-06,
3601
+ "loss": 0.2748,
3602
+ "step": 599
3603
+ },
3604
+ {
3605
+ "epoch": 2.51,
3606
+ "learning_rate": 1.3662345504506903e-06,
3607
+ "loss": 0.2749,
3608
+ "step": 600
3609
+ },
3610
+ {
3611
+ "epoch": 2.51,
3612
+ "learning_rate": 1.343515317161952e-06,
3613
+ "loss": 0.2668,
3614
+ "step": 601
3615
+ },
3616
+ {
3617
+ "epoch": 2.52,
3618
+ "learning_rate": 1.3209729607095022e-06,
3619
+ "loss": 0.269,
3620
+ "step": 602
3621
+ },
3622
+ {
3623
+ "epoch": 2.52,
3624
+ "learning_rate": 1.2986079416984088e-06,
3625
+ "loss": 0.262,
3626
+ "step": 603
3627
+ },
3628
+ {
3629
+ "epoch": 2.53,
3630
+ "learning_rate": 1.2764207171102206e-06,
3631
+ "loss": 0.2749,
3632
+ "step": 604
3633
+ },
3634
+ {
3635
+ "epoch": 2.53,
3636
+ "learning_rate": 1.2544117402936373e-06,
3637
+ "loss": 0.2641,
3638
+ "step": 605
3639
+ },
3640
+ {
3641
+ "epoch": 2.54,
3642
+ "learning_rate": 1.232581460955249e-06,
3643
+ "loss": 0.2747,
3644
+ "step": 606
3645
+ },
3646
+ {
3647
+ "epoch": 2.54,
3648
+ "learning_rate": 1.2109303251503434e-06,
3649
+ "loss": 0.2792,
3650
+ "step": 607
3651
+ },
3652
+ {
3653
+ "epoch": 2.54,
3654
+ "learning_rate": 1.189458775273784e-06,
3655
+ "loss": 0.294,
3656
+ "step": 608
3657
+ },
3658
+ {
3659
+ "epoch": 2.55,
3660
+ "learning_rate": 1.1681672500509866e-06,
3661
+ "loss": 0.2683,
3662
+ "step": 609
3663
+ },
3664
+ {
3665
+ "epoch": 2.55,
3666
+ "learning_rate": 1.147056184528943e-06,
3667
+ "loss": 0.2803,
3668
+ "step": 610
3669
+ },
3670
+ {
3671
+ "epoch": 2.56,
3672
+ "learning_rate": 1.1261260100673355e-06,
3673
+ "loss": 0.2867,
3674
+ "step": 611
3675
+ },
3676
+ {
3677
+ "epoch": 2.56,
3678
+ "learning_rate": 1.1053771543297198e-06,
3679
+ "loss": 0.2863,
3680
+ "step": 612
3681
+ },
3682
+ {
3683
+ "epoch": 2.56,
3684
+ "learning_rate": 1.0848100412747954e-06,
3685
+ "loss": 0.2759,
3686
+ "step": 613
3687
+ },
3688
+ {
3689
+ "epoch": 2.57,
3690
+ "learning_rate": 1.0644250911477306e-06,
3691
+ "loss": 0.269,
3692
+ "step": 614
3693
+ },
3694
+ {
3695
+ "epoch": 2.57,
3696
+ "learning_rate": 1.0442227204715872e-06,
3697
+ "loss": 0.2931,
3698
+ "step": 615
3699
+ },
3700
+ {
3701
+ "epoch": 2.58,
3702
+ "learning_rate": 1.0242033420388008e-06,
3703
+ "loss": 0.2823,
3704
+ "step": 616
3705
+ },
3706
+ {
3707
+ "epoch": 2.58,
3708
+ "learning_rate": 1.0043673649027519e-06,
3709
+ "loss": 0.2825,
3710
+ "step": 617
3711
+ },
3712
+ {
3713
+ "epoch": 2.59,
3714
+ "learning_rate": 9.8471519436941e-07,
3715
+ "loss": 0.2799,
3716
+ "step": 618
3717
+ },
3718
+ {
3719
+ "epoch": 2.59,
3720
+ "learning_rate": 9.652472319890372e-07,
3721
+ "loss": 0.2811,
3722
+ "step": 619
3723
+ },
3724
+ {
3725
+ "epoch": 2.59,
3726
+ "learning_rate": 9.459638755480038e-07,
3727
+ "loss": 0.271,
3728
+ "step": 620
3729
+ },
3730
+ {
3731
+ "epoch": 2.6,
3732
+ "learning_rate": 9.268655190606501e-07,
3733
+ "loss": 0.2781,
3734
+ "step": 621
3735
+ },
3736
+ {
3737
+ "epoch": 2.6,
3738
+ "learning_rate": 9.079525527612321e-07,
3739
+ "loss": 0.275,
3740
+ "step": 622
3741
+ },
3742
+ {
3743
+ "epoch": 2.61,
3744
+ "learning_rate": 8.892253630959502e-07,
3745
+ "loss": 0.2699,
3746
+ "step": 623
3747
+ },
3748
+ {
3749
+ "epoch": 2.61,
3750
+ "learning_rate": 8.706843327150605e-07,
3751
+ "loss": 0.2765,
3752
+ "step": 624
3753
+ },
3754
+ {
3755
+ "epoch": 2.62,
3756
+ "learning_rate": 8.523298404650504e-07,
3757
+ "loss": 0.2713,
3758
+ "step": 625
3759
+ },
3760
+ {
3761
+ "epoch": 2.62,
3762
+ "learning_rate": 8.34162261380892e-07,
3763
+ "loss": 0.286,
3764
+ "step": 626
3765
+ },
3766
+ {
3767
+ "epoch": 2.62,
3768
+ "learning_rate": 8.161819666783888e-07,
3769
+ "loss": 0.2642,
3770
+ "step": 627
3771
+ },
3772
+ {
3773
+ "epoch": 2.63,
3774
+ "learning_rate": 7.983893237465878e-07,
3775
+ "loss": 0.2822,
3776
+ "step": 628
3777
+ },
3778
+ {
3779
+ "epoch": 2.63,
3780
+ "learning_rate": 7.807846961402699e-07,
3781
+ "loss": 0.2743,
3782
+ "step": 629
3783
+ },
3784
+ {
3785
+ "epoch": 2.64,
3786
+ "learning_rate": 7.633684435725208e-07,
3787
+ "loss": 0.2573,
3788
+ "step": 630
3789
+ },
3790
+ {
3791
+ "epoch": 2.64,
3792
+ "learning_rate": 7.461409219073857e-07,
3793
+ "loss": 0.2813,
3794
+ "step": 631
3795
+ },
3796
+ {
3797
+ "epoch": 2.64,
3798
+ "learning_rate": 7.291024831525961e-07,
3799
+ "loss": 0.2854,
3800
+ "step": 632
3801
+ },
3802
+ {
3803
+ "epoch": 2.65,
3804
+ "learning_rate": 7.122534754523768e-07,
3805
+ "loss": 0.2888,
3806
+ "step": 633
3807
+ },
3808
+ {
3809
+ "epoch": 2.65,
3810
+ "learning_rate": 6.955942430803298e-07,
3811
+ "loss": 0.27,
3812
+ "step": 634
3813
+ },
3814
+ {
3815
+ "epoch": 2.66,
3816
+ "learning_rate": 6.791251264324039e-07,
3817
+ "loss": 0.2792,
3818
+ "step": 635
3819
+ },
3820
+ {
3821
+ "epoch": 2.66,
3822
+ "learning_rate": 6.628464620199404e-07,
3823
+ "loss": 0.2774,
3824
+ "step": 636
3825
+ },
3826
+ {
3827
+ "epoch": 2.67,
3828
+ "learning_rate": 6.467585824627886e-07,
3829
+ "loss": 0.2874,
3830
+ "step": 637
3831
+ },
3832
+ {
3833
+ "epoch": 2.67,
3834
+ "learning_rate": 6.30861816482522e-07,
3835
+ "loss": 0.2792,
3836
+ "step": 638
3837
+ },
3838
+ {
3839
+ "epoch": 2.67,
3840
+ "learning_rate": 6.151564888957084e-07,
3841
+ "loss": 0.2688,
3842
+ "step": 639
3843
+ },
3844
+ {
3845
+ "epoch": 2.68,
3846
+ "learning_rate": 5.996429206072874e-07,
3847
+ "loss": 0.2653,
3848
+ "step": 640
3849
+ },
3850
+ {
3851
+ "epoch": 2.68,
3852
+ "learning_rate": 5.843214286039956e-07,
3853
+ "loss": 0.2846,
3854
+ "step": 641
3855
+ },
3856
+ {
3857
+ "epoch": 2.69,
3858
+ "learning_rate": 5.691923259479093e-07,
3859
+ "loss": 0.275,
3860
+ "step": 642
3861
+ },
3862
+ {
3863
+ "epoch": 2.69,
3864
+ "learning_rate": 5.542559217700339e-07,
3865
+ "loss": 0.2796,
3866
+ "step": 643
3867
+ },
3868
+ {
3869
+ "epoch": 2.69,
3870
+ "learning_rate": 5.395125212639895e-07,
3871
+ "loss": 0.2788,
3872
+ "step": 644
3873
+ },
3874
+ {
3875
+ "epoch": 2.7,
3876
+ "learning_rate": 5.249624256797803e-07,
3877
+ "loss": 0.2688,
3878
+ "step": 645
3879
+ },
3880
+ {
3881
+ "epoch": 2.7,
3882
+ "learning_rate": 5.106059323176371e-07,
3883
+ "loss": 0.2817,
3884
+ "step": 646
3885
+ },
3886
+ {
3887
+ "epoch": 2.71,
3888
+ "learning_rate": 4.964433345219354e-07,
3889
+ "loss": 0.2784,
3890
+ "step": 647
3891
+ },
3892
+ {
3893
+ "epoch": 2.71,
3894
+ "learning_rate": 4.824749216752134e-07,
3895
+ "loss": 0.2789,
3896
+ "step": 648
3897
+ },
3898
+ {
3899
+ "epoch": 2.72,
3900
+ "learning_rate": 4.6870097919224923e-07,
3901
+ "loss": 0.2691,
3902
+ "step": 649
3903
+ },
3904
+ {
3905
+ "epoch": 2.72,
3906
+ "learning_rate": 4.551217885142378e-07,
3907
+ "loss": 0.2759,
3908
+ "step": 650
3909
+ },
3910
+ {
3911
+ "epoch": 2.72,
3912
+ "learning_rate": 4.417376271030327e-07,
3913
+ "loss": 0.2658,
3914
+ "step": 651
3915
+ },
3916
+ {
3917
+ "epoch": 2.73,
3918
+ "learning_rate": 4.285487684354772e-07,
3919
+ "loss": 0.2678,
3920
+ "step": 652
3921
+ },
3922
+ {
3923
+ "epoch": 2.73,
3924
+ "learning_rate": 4.1555548199782357e-07,
3925
+ "loss": 0.2867,
3926
+ "step": 653
3927
+ },
3928
+ {
3929
+ "epoch": 2.74,
3930
+ "learning_rate": 4.0275803328021946e-07,
3931
+ "loss": 0.2882,
3932
+ "step": 654
3933
+ },
3934
+ {
3935
+ "epoch": 2.74,
3936
+ "learning_rate": 3.9015668377128446e-07,
3937
+ "loss": 0.2879,
3938
+ "step": 655
3939
+ },
3940
+ {
3941
+ "epoch": 2.74,
3942
+ "learning_rate": 3.777516909527701e-07,
3943
+ "loss": 0.2676,
3944
+ "step": 656
3945
+ },
3946
+ {
3947
+ "epoch": 2.75,
3948
+ "learning_rate": 3.6554330829429716e-07,
3949
+ "loss": 0.2806,
3950
+ "step": 657
3951
+ },
3952
+ {
3953
+ "epoch": 2.75,
3954
+ "learning_rate": 3.5353178524817566e-07,
3955
+ "loss": 0.2793,
3956
+ "step": 658
3957
+ },
3958
+ {
3959
+ "epoch": 2.76,
3960
+ "learning_rate": 3.417173672443075e-07,
3961
+ "loss": 0.28,
3962
+ "step": 659
3963
+ },
3964
+ {
3965
+ "epoch": 2.76,
3966
+ "learning_rate": 3.30100295685174e-07,
3967
+ "loss": 0.281,
3968
+ "step": 660
3969
+ },
3970
+ {
3971
+ "epoch": 2.77,
3972
+ "learning_rate": 3.1868080794090316e-07,
3973
+ "loss": 0.288,
3974
+ "step": 661
3975
+ },
3976
+ {
3977
+ "epoch": 2.77,
3978
+ "learning_rate": 3.0745913734441357e-07,
3979
+ "loss": 0.2726,
3980
+ "step": 662
3981
+ },
3982
+ {
3983
+ "epoch": 2.77,
3984
+ "learning_rate": 2.9643551318665917e-07,
3985
+ "loss": 0.2707,
3986
+ "step": 663
3987
+ },
3988
+ {
3989
+ "epoch": 2.78,
3990
+ "learning_rate": 2.8561016071192884e-07,
3991
+ "loss": 0.2706,
3992
+ "step": 664
3993
+ },
3994
+ {
3995
+ "epoch": 2.78,
3996
+ "learning_rate": 2.7498330111325635e-07,
3997
+ "loss": 0.2862,
3998
+ "step": 665
3999
+ },
4000
+ {
4001
+ "epoch": 2.79,
4002
+ "learning_rate": 2.6455515152789435e-07,
4003
+ "loss": 0.2803,
4004
+ "step": 666
4005
+ },
4006
+ {
4007
+ "epoch": 2.79,
4008
+ "learning_rate": 2.5432592503288e-07,
4009
+ "loss": 0.288,
4010
+ "step": 667
4011
+ },
4012
+ {
4013
+ "epoch": 2.79,
4014
+ "learning_rate": 2.442958306406795e-07,
4015
+ "loss": 0.2819,
4016
+ "step": 668
4017
+ },
4018
+ {
4019
+ "epoch": 2.8,
4020
+ "learning_rate": 2.3446507329492274e-07,
4021
+ "loss": 0.2679,
4022
+ "step": 669
4023
+ },
4024
+ {
4025
+ "epoch": 2.8,
4026
+ "learning_rate": 2.2483385386620317e-07,
4027
+ "loss": 0.2654,
4028
+ "step": 670
4029
+ },
4030
+ {
4031
+ "epoch": 2.81,
4032
+ "learning_rate": 2.1540236914799116e-07,
4033
+ "loss": 0.2812,
4034
+ "step": 671
4035
+ },
4036
+ {
4037
+ "epoch": 2.81,
4038
+ "learning_rate": 2.0617081185259512e-07,
4039
+ "loss": 0.2898,
4040
+ "step": 672
4041
+ },
4042
+ {
4043
+ "epoch": 2.82,
4044
+ "learning_rate": 1.9713937060723887e-07,
4045
+ "loss": 0.2848,
4046
+ "step": 673
4047
+ },
4048
+ {
4049
+ "epoch": 2.82,
4050
+ "learning_rate": 1.8830822995019593e-07,
4051
+ "loss": 0.29,
4052
+ "step": 674
4053
+ },
4054
+ {
4055
+ "epoch": 2.82,
4056
+ "learning_rate": 1.7967757032702481e-07,
4057
+ "loss": 0.2836,
4058
+ "step": 675
4059
+ },
4060
+ {
4061
+ "epoch": 2.83,
4062
+ "learning_rate": 1.7124756808688525e-07,
4063
+ "loss": 0.2878,
4064
+ "step": 676
4065
+ },
4066
+ {
4067
+ "epoch": 2.83,
4068
+ "learning_rate": 1.630183954789233e-07,
4069
+ "loss": 0.2695,
4070
+ "step": 677
4071
+ },
4072
+ {
4073
+ "epoch": 2.84,
4074
+ "learning_rate": 1.5499022064876412e-07,
4075
+ "loss": 0.2855,
4076
+ "step": 678
4077
+ },
4078
+ {
4079
+ "epoch": 2.84,
4080
+ "learning_rate": 1.4716320763507152e-07,
4081
+ "loss": 0.2836,
4082
+ "step": 679
4083
+ },
4084
+ {
4085
+ "epoch": 2.85,
4086
+ "learning_rate": 1.3953751636619162e-07,
4087
+ "loss": 0.2726,
4088
+ "step": 680
4089
+ },
4090
+ {
4091
+ "epoch": 2.85,
4092
+ "learning_rate": 1.3211330265689104e-07,
4093
+ "loss": 0.2808,
4094
+ "step": 681
4095
+ },
4096
+ {
4097
+ "epoch": 2.85,
4098
+ "learning_rate": 1.2489071820517394e-07,
4099
+ "loss": 0.2708,
4100
+ "step": 682
4101
+ },
4102
+ {
4103
+ "epoch": 2.86,
4104
+ "learning_rate": 1.1786991058917785e-07,
4105
+ "loss": 0.2905,
4106
+ "step": 683
4107
+ },
4108
+ {
4109
+ "epoch": 2.86,
4110
+ "learning_rate": 1.1105102326415929e-07,
4111
+ "loss": 0.2977,
4112
+ "step": 684
4113
+ },
4114
+ {
4115
+ "epoch": 2.87,
4116
+ "learning_rate": 1.0443419555956402e-07,
4117
+ "loss": 0.2726,
4118
+ "step": 685
4119
+ },
4120
+ {
4121
+ "epoch": 2.87,
4122
+ "learning_rate": 9.801956267618262e-08,
4123
+ "loss": 0.2702,
4124
+ "step": 686
4125
+ },
4126
+ {
4127
+ "epoch": 2.87,
4128
+ "learning_rate": 9.180725568338045e-08,
4129
+ "loss": 0.2987,
4130
+ "step": 687
4131
+ },
4132
+ {
4133
+ "epoch": 2.88,
4134
+ "learning_rate": 8.579740151642534e-08,
4135
+ "loss": 0.2706,
4136
+ "step": 688
4137
+ },
4138
+ {
4139
+ "epoch": 2.88,
4140
+ "learning_rate": 7.999012297389419e-08,
4141
+ "loss": 0.2755,
4142
+ "step": 689
4143
+ },
4144
+ {
4145
+ "epoch": 2.89,
4146
+ "learning_rate": 7.438553871516152e-08,
4147
+ "loss": 0.28,
4148
+ "step": 690
4149
+ },
4150
+ {
4151
+ "epoch": 2.89,
4152
+ "learning_rate": 6.898376325797596e-08,
4153
+ "loss": 0.2785,
4154
+ "step": 691
4155
+ },
4156
+ {
4157
+ "epoch": 2.9,
4158
+ "learning_rate": 6.378490697611761e-08,
4159
+ "loss": 0.2775,
4160
+ "step": 692
4161
+ },
4162
+ {
4163
+ "epoch": 2.9,
4164
+ "learning_rate": 5.878907609714879e-08,
4165
+ "loss": 0.2705,
4166
+ "step": 693
4167
+ },
4168
+ {
4169
+ "epoch": 2.9,
4170
+ "learning_rate": 5.399637270023683e-08,
4171
+ "loss": 0.2546,
4172
+ "step": 694
4173
+ },
4174
+ {
4175
+ "epoch": 2.91,
4176
+ "learning_rate": 4.940689471407356e-08,
4177
+ "loss": 0.2658,
4178
+ "step": 695
4179
+ },
4180
+ {
4181
+ "epoch": 2.91,
4182
+ "learning_rate": 4.502073591487244e-08,
4183
+ "loss": 0.2786,
4184
+ "step": 696
4185
+ },
4186
+ {
4187
+ "epoch": 2.92,
4188
+ "learning_rate": 4.083798592444899e-08,
4189
+ "loss": 0.2753,
4190
+ "step": 697
4191
+ },
4192
+ {
4193
+ "epoch": 2.92,
4194
+ "learning_rate": 3.68587302083967e-08,
4195
+ "loss": 0.2914,
4196
+ "step": 698
4197
+ },
4198
+ {
4199
+ "epoch": 2.92,
4200
+ "learning_rate": 3.308305007433399e-08,
4201
+ "loss": 0.2827,
4202
+ "step": 699
4203
+ },
4204
+ {
4205
+ "epoch": 2.93,
4206
+ "learning_rate": 2.9511022670246635e-08,
4207
+ "loss": 0.2731,
4208
+ "step": 700
4209
+ },
4210
+ {
4211
+ "epoch": 2.93,
4212
+ "learning_rate": 2.6142720982911264e-08,
4213
+ "loss": 0.2901,
4214
+ "step": 701
4215
+ },
4216
+ {
4217
+ "epoch": 2.94,
4218
+ "learning_rate": 2.2978213836400974e-08,
4219
+ "loss": 0.2711,
4220
+ "step": 702
4221
+ },
4222
+ {
4223
+ "epoch": 2.94,
4224
+ "learning_rate": 2.0017565890683154e-08,
4225
+ "loss": 0.2774,
4226
+ "step": 703
4227
+ },
4228
+ {
4229
+ "epoch": 2.95,
4230
+ "learning_rate": 1.726083764029607e-08,
4231
+ "loss": 0.2903,
4232
+ "step": 704
4233
+ },
4234
+ {
4235
+ "epoch": 2.95,
4236
+ "learning_rate": 1.4708085413113194e-08,
4237
+ "loss": 0.2848,
4238
+ "step": 705
4239
+ },
4240
+ {
4241
+ "epoch": 2.95,
4242
+ "learning_rate": 1.2359361369190804e-08,
4243
+ "loss": 0.2837,
4244
+ "step": 706
4245
+ },
4246
+ {
4247
+ "epoch": 2.96,
4248
+ "learning_rate": 1.0214713499706596e-08,
4249
+ "loss": 0.2665,
4250
+ "step": 707
4251
+ },
4252
+ {
4253
+ "epoch": 2.96,
4254
+ "learning_rate": 8.274185625971598e-09,
4255
+ "loss": 0.2779,
4256
+ "step": 708
4257
+ },
4258
+ {
4259
+ "epoch": 2.97,
4260
+ "learning_rate": 6.5378173985441994e-09,
4261
+ "loss": 0.2707,
4262
+ "step": 709
4263
+ },
4264
+ {
4265
+ "epoch": 2.97,
4266
+ "learning_rate": 5.0056442964119265e-09,
4267
+ "loss": 0.2554,
4268
+ "step": 710
4269
+ },
4270
+ {
4271
+ "epoch": 2.97,
4272
+ "learning_rate": 3.6776976262697937e-09,
4273
+ "loss": 0.2767,
4274
+ "step": 711
4275
+ },
4276
+ {
4277
+ "epoch": 2.98,
4278
+ "learning_rate": 2.5540045218819256e-09,
4279
+ "loss": 0.2892,
4280
+ "step": 712
4281
+ },
4282
+ {
4283
+ "epoch": 2.98,
4284
+ "learning_rate": 1.6345879435231138e-09,
4285
+ "loss": 0.2843,
4286
+ "step": 713
4287
+ },
4288
+ {
4289
+ "epoch": 2.99,
4290
+ "learning_rate": 9.194666775158567e-10,
4291
+ "loss": 0.2735,
4292
+ "step": 714
4293
+ },
4294
+ {
4295
+ "epoch": 2.99,
4296
+ "learning_rate": 4.086553358395584e-10,
4297
+ "loss": 0.2775,
4298
+ "step": 715
4299
+ },
4300
+ {
4301
+ "epoch": 3.0,
4302
+ "learning_rate": 1.0216435583743079e-10,
4303
+ "loss": 0.2786,
4304
+ "step": 716
4305
+ },
4306
+ {
4307
+ "epoch": 3.0,
4308
+ "learning_rate": 0.0,
4309
+ "loss": 0.2579,
4310
+ "step": 717
4311
+ },
4312
+ {
4313
+ "epoch": 3.0,
4314
+ "step": 717,
4315
+ "total_flos": 1.9663471898819297e+18,
4316
+ "train_loss": 0.3724504075788554,
4317
+ "train_runtime": 6685.3329,
4318
+ "train_samples_per_second": 54.728,
4319
+ "train_steps_per_second": 0.107
4320
+ }
4321
+ ],
4322
+ "max_steps": 717,
4323
+ "num_train_epochs": 3,
4324
+ "total_flos": 1.9663471898819297e+18,
4325
+ "trial_name": null,
4326
+ "trial_params": null
4327
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:142478985dadb6f14a40dcb1885e8cd0d0a543b018a64abaa6a968bfe45980f1
3
+ size 4155