ccore commited on
Commit
f86447e
1 Parent(s): 822d3b9

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,78 +1,59 @@
1
  ---
2
  license: other
3
- base_model: facebook/opt-1.3b
4
  tags:
5
  - generated_from_trainer
6
- - qa
7
- - open data
8
- - opt
9
- - opt-1.3b
10
  metrics:
11
  - accuracy
12
- widget:
13
- - text: |-
14
- # [PAPER]
15
- Pope John Paul II (Latin: Ioannes Paulus II; Italian: Giovanni Paolo II; Polish: Jan Paweł II; born Karol Józef Wojtyła [ˈkarɔl ˈjuzɛv vɔjˈtɨwa];[b] 18 May 1920 – 2 April 2005) was head of the Catholic Church and sovereign of the Vatican City State from 1978 until his death in 2005. He was later canonised as Pope Saint John Paul II. In his youth, Wojtyła dabbled in stage acting. He graduated with excellent grades from an all-boys high school in Wadowice, Poland, shortly before the start of World War II in 1938. During the war, to avoid being kidnapped and sent off to a German slave labor camp, he signed up for work in harsh conditions in a quarry. Wojtyła eventually took up acting and developed a love for the profession and participated at a local theater. The linguistically skilled Wojtyła wanted to study Polish at university. Encouraged by a conversation with Adam Stefan Sapieha, he decided to study theology and become a priest. Eventually, Wojtyła rose to the position of Archbishop of Kraków and then a cardinal, both positions held by his mentor. Wojtyła was elected pope on the third day of the second papal conclave of 1978 (becoming one of the youngest popes in history), which was called after John Paul I, who had been elected in the first papal conclave of 1978 earlier in August to succeed Pope Paul VI, died after 33 days. Wojtyła adopted the name of his predecessor in tribute to him.[20] John Paul II was the first non-Italian pope since Adrian VI in the 16th century, as well as the third-longest-serving pope in history after Pius IX and St. Peter. John Paul II attempted to improve the Catholic Church's relations with Judaism, Islam, and the Eastern Orthodox Church in the spirit of ecumenism, holding atheism as the greatest threat. He maintained the Church's previous positions on such matters as abortion, artificial contraception, the ordination of women, and a celibate clergy, and although he supported the reforms of the Second Vatican Council, he was seen as generally conservative in their interpretation.[21][22] He put emphasis on family and identity, while questioning consumerism, hedonism and the pursuit of wealth. He was one of the most travelled world leaders in history, visiting 129 countries during his pontificate. As part of his special emphasis on the universal call to holiness, he beatified 1,344,[23] and also canonised 483 people, more than the combined tally of his predecessors during the preceding five centuries. By the time of his death, he had named most of the College of Cardinals, consecrated or co-consecrated many of the world's bishops, and ordained many priests.[24] He has been credited with fighting against dictatorships for democracy and with helping to end Communist rule in his native Poland and the rest of Europe.[25] Under John Paul II, the Catholic Church greatly expanded its influence in Africa and Latin America, and retained its influence in Europe and the rest of the world.
16
-
17
- ## [UNDERSTANDING]
18
- This section presents a brief account
19
- datasets:
20
- - ccore/open_data_understanding
21
- pipeline_tag: text-generation
22
  ---
23
 
24
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
25
  should probably proofread and complete it, then remove this comment. -->
26
 
27
- # OPT_1.3b_open_data_understanding
28
 
29
- ## Description
 
 
 
30
 
31
- This model has been trained to understand and respond to any content inserted after the `[PAPER]` tag. It uses advanced language modeling techniques to understand the context, structure, and underlying goals of the input text.
32
 
33
- ## How to use
34
 
35
- To interact with this template, place your text after the `[PAPER]` tag. The model will process the text and respond accordingly. For example:
36
 
37
- [PAPER]
38
- Your text here...
39
 
 
40
 
41
- ## Example
42
 
43
- [PAPER]
44
- We present a scalable method to build a high-quality instruction-following language model...
45
 
 
46
 
47
- The model will understand and respond to your text according to its context and content.
 
 
 
 
 
 
 
 
 
48
 
49
- ## Comprehension Sections
50
 
51
- ### [UNDERSTANDING]
52
- This section provides a detailed analysis and decomposition of the inserted text, facilitating the understanding of the content.
53
 
54
- ### [QUESTIONS AND ANSWERS]
55
- This section addresses questions and answers that could arise based on the text provided.
56
 
57
- ### [OBJECTION AND REPLY]
58
- This section addresses any objections and responses that could arise from analysis of the text.
59
 
60
- ## Common questions
61
-
62
- - **What can this model do?**
63
- - This model can understand and respond to any text placed after the `[PAPER]` tag.
64
-
65
- - **Is a specific format necessary?**
66
- - No, the model is quite flexible regarding the text format.
67
-
68
- - **How does this model perform?**
69
- - The model outperforms other LLaMa-based models on the Alpaca leaderboard, demonstrating a highly effective alignment.
70
-
71
- ## Warnings
72
-
73
- - This model was trained on a diverse corpus, but may still have bias or limitations.
74
- - Continuous validation of the model and its output is essential.
75
-
76
- ## Contact and Support
77
-
78
- For more information, visit [Hugging Face](https://huggingface.co/).
 
1
  ---
2
  license: other
3
+ base_model: facebook/opt-1.3B
4
  tags:
5
  - generated_from_trainer
 
 
 
 
6
  metrics:
7
  - accuracy
8
+ model-index:
9
+ - name: mini3
10
+ results: []
 
 
 
 
 
 
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
+ # mini3
17
 
18
+ This model is a fine-tuned version of [facebook/opt-1.3B](https://huggingface.co/facebook/opt-1.3B) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 4.5592
21
+ - Accuracy: 0.4112
22
 
23
+ ## Model description
24
 
25
+ More information needed
26
 
27
+ ## Intended uses & limitations
28
 
29
+ More information needed
 
30
 
31
+ ## Training and evaluation data
32
 
33
+ More information needed
34
 
35
+ ## Training procedure
 
36
 
37
+ ### Training hyperparameters
38
 
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 0.0001
41
+ - train_batch_size: 1
42
+ - eval_batch_size: 8
43
+ - seed: 42
44
+ - gradient_accumulation_steps: 32
45
+ - total_train_batch_size: 32
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: constant
48
+ - num_epochs: 35.0
49
 
50
+ ### Training results
51
 
 
 
52
 
 
 
53
 
54
+ ### Framework versions
 
55
 
56
+ - Transformers 4.34.0.dev0
57
+ - Pytorch 2.0.1+cu117
58
+ - Datasets 2.14.5
59
+ - Tokenizers 0.14.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 34.63,
3
+ "eval_accuracy": 0.4111726584844864,
4
+ "eval_loss": 4.55915641784668,
5
+ "eval_runtime": 19.5979,
6
+ "eval_samples": 243,
7
+ "eval_samples_per_second": 12.399,
8
+ "eval_steps_per_second": 1.582,
9
+ "perplexity": 95.50288131284444,
10
+ "train_loss": 0.14071606248617172,
11
+ "train_runtime": 19826.8664,
12
+ "train_samples": 2070,
13
+ "train_samples_per_second": 3.654,
14
+ "train_steps_per_second": 0.113
15
+ }
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/opt-1.3B",
3
+ "_remove_final_layer_norm": false,
4
+ "activation_dropout": 0.0,
5
+ "activation_function": "relu",
6
+ "architectures": [
7
+ "OPTForCausalLM"
8
+ ],
9
+ "attention_dropout": 0.0,
10
+ "bos_token_id": 2,
11
+ "do_layer_norm_before": true,
12
+ "dropout": 0.1,
13
+ "enable_bias": true,
14
+ "eos_token_id": 2,
15
+ "ffn_dim": 8192,
16
+ "hidden_size": 2048,
17
+ "init_std": 0.02,
18
+ "layer_norm_elementwise_affine": true,
19
+ "layerdrop": 0.0,
20
+ "max_position_embeddings": 2048,
21
+ "model_type": "opt",
22
+ "num_attention_heads": 32,
23
+ "num_hidden_layers": 24,
24
+ "pad_token_id": 1,
25
+ "prefix": "</s>",
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.34.0.dev0",
28
+ "use_cache": true,
29
+ "vocab_size": 50272,
30
+ "word_embed_proj_dim": 2048
31
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 34.63,
3
+ "eval_accuracy": 0.4111726584844864,
4
+ "eval_loss": 4.55915641784668,
5
+ "eval_runtime": 19.5979,
6
+ "eval_samples": 243,
7
+ "eval_samples_per_second": 12.399,
8
+ "eval_steps_per_second": 1.582,
9
+ "perplexity": 95.50288131284444
10
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 2,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 1,
6
+ "transformers_version": "4.34.0.dev0"
7
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4fe5e1ca87b29cf773cfe06a3711e5fcd0c102f6cb8e037d887f23e1596726d8
3
+ size 2631647709
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "</s>",
3
+ "eos_token": "</s>",
4
+ "pad_token": "<pad>",
5
+ "unk_token": "</s>"
6
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "1": {
6
+ "content": "<pad>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "2": {
14
+ "content": "</s>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ }
21
+ },
22
+ "additional_special_tokens": [],
23
+ "bos_token": "</s>",
24
+ "clean_up_tokenization_spaces": true,
25
+ "eos_token": "</s>",
26
+ "errors": "replace",
27
+ "model_max_length": 1000000000000000019884624838656,
28
+ "pad_token": "<pad>",
29
+ "tokenizer_class": "GPT2Tokenizer",
30
+ "unk_token": "</s>"
31
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 34.63,
3
+ "train_loss": 0.14071606248617172,
4
+ "train_runtime": 19826.8664,
5
+ "train_samples": 2070,
6
+ "train_samples_per_second": 3.654,
7
+ "train_steps_per_second": 0.113
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2716 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 34.628019323671495,
5
+ "eval_steps": 500,
6
+ "global_step": 2240,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.08,
13
+ "learning_rate": 0.0001,
14
+ "loss": 2.2092,
15
+ "step": 5
16
+ },
17
+ {
18
+ "epoch": 0.15,
19
+ "learning_rate": 0.0001,
20
+ "loss": 1.4086,
21
+ "step": 10
22
+ },
23
+ {
24
+ "epoch": 0.23,
25
+ "learning_rate": 0.0001,
26
+ "loss": 1.1906,
27
+ "step": 15
28
+ },
29
+ {
30
+ "epoch": 0.31,
31
+ "learning_rate": 0.0001,
32
+ "loss": 1.1447,
33
+ "step": 20
34
+ },
35
+ {
36
+ "epoch": 0.39,
37
+ "learning_rate": 0.0001,
38
+ "loss": 1.1125,
39
+ "step": 25
40
+ },
41
+ {
42
+ "epoch": 0.46,
43
+ "learning_rate": 0.0001,
44
+ "loss": 1.1011,
45
+ "step": 30
46
+ },
47
+ {
48
+ "epoch": 0.54,
49
+ "learning_rate": 0.0001,
50
+ "loss": 1.0439,
51
+ "step": 35
52
+ },
53
+ {
54
+ "epoch": 0.62,
55
+ "learning_rate": 0.0001,
56
+ "loss": 1.0341,
57
+ "step": 40
58
+ },
59
+ {
60
+ "epoch": 0.7,
61
+ "learning_rate": 0.0001,
62
+ "loss": 1.0454,
63
+ "step": 45
64
+ },
65
+ {
66
+ "epoch": 0.77,
67
+ "learning_rate": 0.0001,
68
+ "loss": 1.0261,
69
+ "step": 50
70
+ },
71
+ {
72
+ "epoch": 0.85,
73
+ "learning_rate": 0.0001,
74
+ "loss": 1.0393,
75
+ "step": 55
76
+ },
77
+ {
78
+ "epoch": 0.93,
79
+ "learning_rate": 0.0001,
80
+ "loss": 1.0407,
81
+ "step": 60
82
+ },
83
+ {
84
+ "epoch": 1.0,
85
+ "learning_rate": 0.0001,
86
+ "loss": 0.9819,
87
+ "step": 65
88
+ },
89
+ {
90
+ "epoch": 1.08,
91
+ "learning_rate": 0.0001,
92
+ "loss": 0.7776,
93
+ "step": 70
94
+ },
95
+ {
96
+ "epoch": 1.16,
97
+ "learning_rate": 0.0001,
98
+ "loss": 0.742,
99
+ "step": 75
100
+ },
101
+ {
102
+ "epoch": 1.24,
103
+ "learning_rate": 0.0001,
104
+ "loss": 0.7434,
105
+ "step": 80
106
+ },
107
+ {
108
+ "epoch": 1.31,
109
+ "learning_rate": 0.0001,
110
+ "loss": 0.7963,
111
+ "step": 85
112
+ },
113
+ {
114
+ "epoch": 1.39,
115
+ "learning_rate": 0.0001,
116
+ "loss": 0.7459,
117
+ "step": 90
118
+ },
119
+ {
120
+ "epoch": 1.47,
121
+ "learning_rate": 0.0001,
122
+ "loss": 0.7714,
123
+ "step": 95
124
+ },
125
+ {
126
+ "epoch": 1.55,
127
+ "learning_rate": 0.0001,
128
+ "loss": 0.7663,
129
+ "step": 100
130
+ },
131
+ {
132
+ "epoch": 1.62,
133
+ "learning_rate": 0.0001,
134
+ "loss": 0.7458,
135
+ "step": 105
136
+ },
137
+ {
138
+ "epoch": 1.7,
139
+ "learning_rate": 0.0001,
140
+ "loss": 0.7591,
141
+ "step": 110
142
+ },
143
+ {
144
+ "epoch": 1.78,
145
+ "learning_rate": 0.0001,
146
+ "loss": 0.7567,
147
+ "step": 115
148
+ },
149
+ {
150
+ "epoch": 1.86,
151
+ "learning_rate": 0.0001,
152
+ "loss": 0.7394,
153
+ "step": 120
154
+ },
155
+ {
156
+ "epoch": 1.93,
157
+ "learning_rate": 0.0001,
158
+ "loss": 0.7631,
159
+ "step": 125
160
+ },
161
+ {
162
+ "epoch": 2.01,
163
+ "learning_rate": 0.0001,
164
+ "loss": 0.7383,
165
+ "step": 130
166
+ },
167
+ {
168
+ "epoch": 2.09,
169
+ "learning_rate": 0.0001,
170
+ "loss": 0.5741,
171
+ "step": 135
172
+ },
173
+ {
174
+ "epoch": 2.16,
175
+ "learning_rate": 0.0001,
176
+ "loss": 0.5493,
177
+ "step": 140
178
+ },
179
+ {
180
+ "epoch": 2.24,
181
+ "learning_rate": 0.0001,
182
+ "loss": 0.5863,
183
+ "step": 145
184
+ },
185
+ {
186
+ "epoch": 2.32,
187
+ "learning_rate": 0.0001,
188
+ "loss": 0.5505,
189
+ "step": 150
190
+ },
191
+ {
192
+ "epoch": 2.4,
193
+ "learning_rate": 0.0001,
194
+ "loss": 0.545,
195
+ "step": 155
196
+ },
197
+ {
198
+ "epoch": 2.47,
199
+ "learning_rate": 0.0001,
200
+ "loss": 0.5583,
201
+ "step": 160
202
+ },
203
+ {
204
+ "epoch": 2.55,
205
+ "learning_rate": 0.0001,
206
+ "loss": 0.5454,
207
+ "step": 165
208
+ },
209
+ {
210
+ "epoch": 2.63,
211
+ "learning_rate": 0.0001,
212
+ "loss": 0.551,
213
+ "step": 170
214
+ },
215
+ {
216
+ "epoch": 2.71,
217
+ "learning_rate": 0.0001,
218
+ "loss": 0.5609,
219
+ "step": 175
220
+ },
221
+ {
222
+ "epoch": 2.78,
223
+ "learning_rate": 0.0001,
224
+ "loss": 0.5414,
225
+ "step": 180
226
+ },
227
+ {
228
+ "epoch": 2.86,
229
+ "learning_rate": 0.0001,
230
+ "loss": 0.6485,
231
+ "step": 185
232
+ },
233
+ {
234
+ "epoch": 2.94,
235
+ "learning_rate": 0.0001,
236
+ "loss": 0.5638,
237
+ "step": 190
238
+ },
239
+ {
240
+ "epoch": 3.01,
241
+ "learning_rate": 0.0001,
242
+ "loss": 0.5406,
243
+ "step": 195
244
+ },
245
+ {
246
+ "epoch": 3.09,
247
+ "learning_rate": 0.0001,
248
+ "loss": 0.4256,
249
+ "step": 200
250
+ },
251
+ {
252
+ "epoch": 3.17,
253
+ "learning_rate": 0.0001,
254
+ "loss": 0.3931,
255
+ "step": 205
256
+ },
257
+ {
258
+ "epoch": 3.25,
259
+ "learning_rate": 0.0001,
260
+ "loss": 0.4069,
261
+ "step": 210
262
+ },
263
+ {
264
+ "epoch": 3.32,
265
+ "learning_rate": 0.0001,
266
+ "loss": 0.4141,
267
+ "step": 215
268
+ },
269
+ {
270
+ "epoch": 3.4,
271
+ "learning_rate": 0.0001,
272
+ "loss": 0.4716,
273
+ "step": 220
274
+ },
275
+ {
276
+ "epoch": 3.48,
277
+ "learning_rate": 0.0001,
278
+ "loss": 0.4219,
279
+ "step": 225
280
+ },
281
+ {
282
+ "epoch": 3.56,
283
+ "learning_rate": 0.0001,
284
+ "loss": 0.4143,
285
+ "step": 230
286
+ },
287
+ {
288
+ "epoch": 3.63,
289
+ "learning_rate": 0.0001,
290
+ "loss": 0.4139,
291
+ "step": 235
292
+ },
293
+ {
294
+ "epoch": 3.71,
295
+ "learning_rate": 0.0001,
296
+ "loss": 0.4412,
297
+ "step": 240
298
+ },
299
+ {
300
+ "epoch": 3.79,
301
+ "learning_rate": 0.0001,
302
+ "loss": 0.4283,
303
+ "step": 245
304
+ },
305
+ {
306
+ "epoch": 3.86,
307
+ "learning_rate": 0.0001,
308
+ "loss": 0.4261,
309
+ "step": 250
310
+ },
311
+ {
312
+ "epoch": 3.94,
313
+ "learning_rate": 0.0001,
314
+ "loss": 0.4304,
315
+ "step": 255
316
+ },
317
+ {
318
+ "epoch": 4.02,
319
+ "learning_rate": 0.0001,
320
+ "loss": 0.4093,
321
+ "step": 260
322
+ },
323
+ {
324
+ "epoch": 4.1,
325
+ "learning_rate": 0.0001,
326
+ "loss": 0.3088,
327
+ "step": 265
328
+ },
329
+ {
330
+ "epoch": 4.17,
331
+ "learning_rate": 0.0001,
332
+ "loss": 0.3017,
333
+ "step": 270
334
+ },
335
+ {
336
+ "epoch": 4.25,
337
+ "learning_rate": 0.0001,
338
+ "loss": 0.3169,
339
+ "step": 275
340
+ },
341
+ {
342
+ "epoch": 4.33,
343
+ "learning_rate": 0.0001,
344
+ "loss": 0.3077,
345
+ "step": 280
346
+ },
347
+ {
348
+ "epoch": 4.41,
349
+ "learning_rate": 0.0001,
350
+ "loss": 0.3131,
351
+ "step": 285
352
+ },
353
+ {
354
+ "epoch": 4.48,
355
+ "learning_rate": 0.0001,
356
+ "loss": 0.3239,
357
+ "step": 290
358
+ },
359
+ {
360
+ "epoch": 4.56,
361
+ "learning_rate": 0.0001,
362
+ "loss": 0.3428,
363
+ "step": 295
364
+ },
365
+ {
366
+ "epoch": 4.64,
367
+ "learning_rate": 0.0001,
368
+ "loss": 0.3267,
369
+ "step": 300
370
+ },
371
+ {
372
+ "epoch": 4.71,
373
+ "learning_rate": 0.0001,
374
+ "loss": 0.3194,
375
+ "step": 305
376
+ },
377
+ {
378
+ "epoch": 4.79,
379
+ "learning_rate": 0.0001,
380
+ "loss": 0.3276,
381
+ "step": 310
382
+ },
383
+ {
384
+ "epoch": 4.87,
385
+ "learning_rate": 0.0001,
386
+ "loss": 0.3314,
387
+ "step": 315
388
+ },
389
+ {
390
+ "epoch": 4.95,
391
+ "learning_rate": 0.0001,
392
+ "loss": 0.4401,
393
+ "step": 320
394
+ },
395
+ {
396
+ "epoch": 5.02,
397
+ "learning_rate": 0.0001,
398
+ "loss": 0.3017,
399
+ "step": 325
400
+ },
401
+ {
402
+ "epoch": 5.1,
403
+ "learning_rate": 0.0001,
404
+ "loss": 0.2653,
405
+ "step": 330
406
+ },
407
+ {
408
+ "epoch": 5.18,
409
+ "learning_rate": 0.0001,
410
+ "loss": 0.293,
411
+ "step": 335
412
+ },
413
+ {
414
+ "epoch": 5.26,
415
+ "learning_rate": 0.0001,
416
+ "loss": 0.2381,
417
+ "step": 340
418
+ },
419
+ {
420
+ "epoch": 5.33,
421
+ "learning_rate": 0.0001,
422
+ "loss": 0.245,
423
+ "step": 345
424
+ },
425
+ {
426
+ "epoch": 5.41,
427
+ "learning_rate": 0.0001,
428
+ "loss": 0.2307,
429
+ "step": 350
430
+ },
431
+ {
432
+ "epoch": 5.49,
433
+ "learning_rate": 0.0001,
434
+ "loss": 0.2469,
435
+ "step": 355
436
+ },
437
+ {
438
+ "epoch": 5.57,
439
+ "learning_rate": 0.0001,
440
+ "loss": 0.241,
441
+ "step": 360
442
+ },
443
+ {
444
+ "epoch": 5.64,
445
+ "learning_rate": 0.0001,
446
+ "loss": 0.2494,
447
+ "step": 365
448
+ },
449
+ {
450
+ "epoch": 5.72,
451
+ "learning_rate": 0.0001,
452
+ "loss": 0.2509,
453
+ "step": 370
454
+ },
455
+ {
456
+ "epoch": 5.8,
457
+ "learning_rate": 0.0001,
458
+ "loss": 0.2459,
459
+ "step": 375
460
+ },
461
+ {
462
+ "epoch": 5.87,
463
+ "learning_rate": 0.0001,
464
+ "loss": 0.2524,
465
+ "step": 380
466
+ },
467
+ {
468
+ "epoch": 5.95,
469
+ "learning_rate": 0.0001,
470
+ "loss": 0.2613,
471
+ "step": 385
472
+ },
473
+ {
474
+ "epoch": 6.03,
475
+ "learning_rate": 0.0001,
476
+ "loss": 0.2321,
477
+ "step": 390
478
+ },
479
+ {
480
+ "epoch": 6.11,
481
+ "learning_rate": 0.0001,
482
+ "loss": 0.1857,
483
+ "step": 395
484
+ },
485
+ {
486
+ "epoch": 6.18,
487
+ "learning_rate": 0.0001,
488
+ "loss": 0.1834,
489
+ "step": 400
490
+ },
491
+ {
492
+ "epoch": 6.26,
493
+ "learning_rate": 0.0001,
494
+ "loss": 0.1888,
495
+ "step": 405
496
+ },
497
+ {
498
+ "epoch": 6.34,
499
+ "learning_rate": 0.0001,
500
+ "loss": 0.1857,
501
+ "step": 410
502
+ },
503
+ {
504
+ "epoch": 6.42,
505
+ "learning_rate": 0.0001,
506
+ "loss": 0.1901,
507
+ "step": 415
508
+ },
509
+ {
510
+ "epoch": 6.49,
511
+ "learning_rate": 0.0001,
512
+ "loss": 0.2473,
513
+ "step": 420
514
+ },
515
+ {
516
+ "epoch": 6.57,
517
+ "learning_rate": 0.0001,
518
+ "loss": 0.1957,
519
+ "step": 425
520
+ },
521
+ {
522
+ "epoch": 6.65,
523
+ "learning_rate": 0.0001,
524
+ "loss": 0.1991,
525
+ "step": 430
526
+ },
527
+ {
528
+ "epoch": 6.72,
529
+ "learning_rate": 0.0001,
530
+ "loss": 0.2053,
531
+ "step": 435
532
+ },
533
+ {
534
+ "epoch": 6.8,
535
+ "learning_rate": 0.0001,
536
+ "loss": 0.196,
537
+ "step": 440
538
+ },
539
+ {
540
+ "epoch": 6.88,
541
+ "learning_rate": 0.0001,
542
+ "loss": 0.2074,
543
+ "step": 445
544
+ },
545
+ {
546
+ "epoch": 6.96,
547
+ "learning_rate": 0.0001,
548
+ "loss": 0.2103,
549
+ "step": 450
550
+ },
551
+ {
552
+ "epoch": 7.03,
553
+ "learning_rate": 0.0001,
554
+ "loss": 0.2051,
555
+ "step": 455
556
+ },
557
+ {
558
+ "epoch": 7.11,
559
+ "learning_rate": 0.0001,
560
+ "loss": 0.1453,
561
+ "step": 460
562
+ },
563
+ {
564
+ "epoch": 7.19,
565
+ "learning_rate": 0.0001,
566
+ "loss": 0.1502,
567
+ "step": 465
568
+ },
569
+ {
570
+ "epoch": 7.27,
571
+ "learning_rate": 0.0001,
572
+ "loss": 0.1432,
573
+ "step": 470
574
+ },
575
+ {
576
+ "epoch": 7.34,
577
+ "learning_rate": 0.0001,
578
+ "loss": 0.1494,
579
+ "step": 475
580
+ },
581
+ {
582
+ "epoch": 7.42,
583
+ "learning_rate": 0.0001,
584
+ "loss": 0.1475,
585
+ "step": 480
586
+ },
587
+ {
588
+ "epoch": 7.5,
589
+ "learning_rate": 0.0001,
590
+ "loss": 0.153,
591
+ "step": 485
592
+ },
593
+ {
594
+ "epoch": 7.57,
595
+ "learning_rate": 0.0001,
596
+ "loss": 0.1525,
597
+ "step": 490
598
+ },
599
+ {
600
+ "epoch": 7.65,
601
+ "learning_rate": 0.0001,
602
+ "loss": 0.1604,
603
+ "step": 495
604
+ },
605
+ {
606
+ "epoch": 7.73,
607
+ "learning_rate": 0.0001,
608
+ "loss": 0.206,
609
+ "step": 500
610
+ },
611
+ {
612
+ "epoch": 7.81,
613
+ "learning_rate": 0.0001,
614
+ "loss": 0.1656,
615
+ "step": 505
616
+ },
617
+ {
618
+ "epoch": 7.88,
619
+ "learning_rate": 0.0001,
620
+ "loss": 0.2155,
621
+ "step": 510
622
+ },
623
+ {
624
+ "epoch": 7.96,
625
+ "learning_rate": 0.0001,
626
+ "loss": 0.1727,
627
+ "step": 515
628
+ },
629
+ {
630
+ "epoch": 8.04,
631
+ "learning_rate": 0.0001,
632
+ "loss": 0.1458,
633
+ "step": 520
634
+ },
635
+ {
636
+ "epoch": 8.12,
637
+ "learning_rate": 0.0001,
638
+ "loss": 0.1598,
639
+ "step": 525
640
+ },
641
+ {
642
+ "epoch": 8.19,
643
+ "learning_rate": 0.0001,
644
+ "loss": 0.1173,
645
+ "step": 530
646
+ },
647
+ {
648
+ "epoch": 8.27,
649
+ "learning_rate": 0.0001,
650
+ "loss": 0.12,
651
+ "step": 535
652
+ },
653
+ {
654
+ "epoch": 8.35,
655
+ "learning_rate": 0.0001,
656
+ "loss": 0.1216,
657
+ "step": 540
658
+ },
659
+ {
660
+ "epoch": 8.43,
661
+ "learning_rate": 0.0001,
662
+ "loss": 0.1242,
663
+ "step": 545
664
+ },
665
+ {
666
+ "epoch": 8.5,
667
+ "learning_rate": 0.0001,
668
+ "loss": 0.126,
669
+ "step": 550
670
+ },
671
+ {
672
+ "epoch": 8.58,
673
+ "learning_rate": 0.0001,
674
+ "loss": 0.1706,
675
+ "step": 555
676
+ },
677
+ {
678
+ "epoch": 8.66,
679
+ "learning_rate": 0.0001,
680
+ "loss": 0.1386,
681
+ "step": 560
682
+ },
683
+ {
684
+ "epoch": 8.73,
685
+ "learning_rate": 0.0001,
686
+ "loss": 0.1341,
687
+ "step": 565
688
+ },
689
+ {
690
+ "epoch": 8.81,
691
+ "learning_rate": 0.0001,
692
+ "loss": 0.1466,
693
+ "step": 570
694
+ },
695
+ {
696
+ "epoch": 8.89,
697
+ "learning_rate": 0.0001,
698
+ "loss": 0.1395,
699
+ "step": 575
700
+ },
701
+ {
702
+ "epoch": 8.97,
703
+ "learning_rate": 0.0001,
704
+ "loss": 0.1403,
705
+ "step": 580
706
+ },
707
+ {
708
+ "epoch": 9.04,
709
+ "learning_rate": 0.0001,
710
+ "loss": 0.1172,
711
+ "step": 585
712
+ },
713
+ {
714
+ "epoch": 9.12,
715
+ "learning_rate": 0.0001,
716
+ "loss": 0.0994,
717
+ "step": 590
718
+ },
719
+ {
720
+ "epoch": 9.2,
721
+ "learning_rate": 0.0001,
722
+ "loss": 0.1263,
723
+ "step": 595
724
+ },
725
+ {
726
+ "epoch": 9.28,
727
+ "learning_rate": 0.0001,
728
+ "loss": 0.1073,
729
+ "step": 600
730
+ },
731
+ {
732
+ "epoch": 9.35,
733
+ "learning_rate": 0.0001,
734
+ "loss": 0.1062,
735
+ "step": 605
736
+ },
737
+ {
738
+ "epoch": 9.43,
739
+ "learning_rate": 0.0001,
740
+ "loss": 0.1072,
741
+ "step": 610
742
+ },
743
+ {
744
+ "epoch": 9.51,
745
+ "learning_rate": 0.0001,
746
+ "loss": 0.129,
747
+ "step": 615
748
+ },
749
+ {
750
+ "epoch": 9.58,
751
+ "learning_rate": 0.0001,
752
+ "loss": 0.1103,
753
+ "step": 620
754
+ },
755
+ {
756
+ "epoch": 9.66,
757
+ "learning_rate": 0.0001,
758
+ "loss": 0.14,
759
+ "step": 625
760
+ },
761
+ {
762
+ "epoch": 9.74,
763
+ "learning_rate": 0.0001,
764
+ "loss": 0.1138,
765
+ "step": 630
766
+ },
767
+ {
768
+ "epoch": 9.82,
769
+ "learning_rate": 0.0001,
770
+ "loss": 0.1136,
771
+ "step": 635
772
+ },
773
+ {
774
+ "epoch": 9.89,
775
+ "learning_rate": 0.0001,
776
+ "loss": 0.1161,
777
+ "step": 640
778
+ },
779
+ {
780
+ "epoch": 9.97,
781
+ "learning_rate": 0.0001,
782
+ "loss": 0.1126,
783
+ "step": 645
784
+ },
785
+ {
786
+ "epoch": 10.05,
787
+ "learning_rate": 0.0001,
788
+ "loss": 0.0942,
789
+ "step": 650
790
+ },
791
+ {
792
+ "epoch": 10.13,
793
+ "learning_rate": 0.0001,
794
+ "loss": 0.1245,
795
+ "step": 655
796
+ },
797
+ {
798
+ "epoch": 10.2,
799
+ "learning_rate": 0.0001,
800
+ "loss": 0.0892,
801
+ "step": 660
802
+ },
803
+ {
804
+ "epoch": 10.28,
805
+ "learning_rate": 0.0001,
806
+ "loss": 0.1198,
807
+ "step": 665
808
+ },
809
+ {
810
+ "epoch": 10.36,
811
+ "learning_rate": 0.0001,
812
+ "loss": 0.0929,
813
+ "step": 670
814
+ },
815
+ {
816
+ "epoch": 10.43,
817
+ "learning_rate": 0.0001,
818
+ "loss": 0.0923,
819
+ "step": 675
820
+ },
821
+ {
822
+ "epoch": 10.51,
823
+ "learning_rate": 0.0001,
824
+ "loss": 0.0962,
825
+ "step": 680
826
+ },
827
+ {
828
+ "epoch": 10.59,
829
+ "learning_rate": 0.0001,
830
+ "loss": 0.0945,
831
+ "step": 685
832
+ },
833
+ {
834
+ "epoch": 10.67,
835
+ "learning_rate": 0.0001,
836
+ "loss": 0.0986,
837
+ "step": 690
838
+ },
839
+ {
840
+ "epoch": 10.74,
841
+ "learning_rate": 0.0001,
842
+ "loss": 0.0982,
843
+ "step": 695
844
+ },
845
+ {
846
+ "epoch": 10.82,
847
+ "learning_rate": 0.0001,
848
+ "loss": 0.0982,
849
+ "step": 700
850
+ },
851
+ {
852
+ "epoch": 10.9,
853
+ "learning_rate": 0.0001,
854
+ "loss": 0.1011,
855
+ "step": 705
856
+ },
857
+ {
858
+ "epoch": 10.98,
859
+ "learning_rate": 0.0001,
860
+ "loss": 0.1037,
861
+ "step": 710
862
+ },
863
+ {
864
+ "epoch": 11.05,
865
+ "learning_rate": 0.0001,
866
+ "loss": 0.0817,
867
+ "step": 715
868
+ },
869
+ {
870
+ "epoch": 11.13,
871
+ "learning_rate": 0.0001,
872
+ "loss": 0.0979,
873
+ "step": 720
874
+ },
875
+ {
876
+ "epoch": 11.21,
877
+ "learning_rate": 0.0001,
878
+ "loss": 0.0738,
879
+ "step": 725
880
+ },
881
+ {
882
+ "epoch": 11.29,
883
+ "learning_rate": 0.0001,
884
+ "loss": 0.0757,
885
+ "step": 730
886
+ },
887
+ {
888
+ "epoch": 11.36,
889
+ "learning_rate": 0.0001,
890
+ "loss": 0.0795,
891
+ "step": 735
892
+ },
893
+ {
894
+ "epoch": 11.44,
895
+ "learning_rate": 0.0001,
896
+ "loss": 0.0762,
897
+ "step": 740
898
+ },
899
+ {
900
+ "epoch": 11.52,
901
+ "learning_rate": 0.0001,
902
+ "loss": 0.0863,
903
+ "step": 745
904
+ },
905
+ {
906
+ "epoch": 11.59,
907
+ "learning_rate": 0.0001,
908
+ "loss": 0.0784,
909
+ "step": 750
910
+ },
911
+ {
912
+ "epoch": 11.67,
913
+ "learning_rate": 0.0001,
914
+ "loss": 0.0828,
915
+ "step": 755
916
+ },
917
+ {
918
+ "epoch": 11.75,
919
+ "learning_rate": 0.0001,
920
+ "loss": 0.0812,
921
+ "step": 760
922
+ },
923
+ {
924
+ "epoch": 11.83,
925
+ "learning_rate": 0.0001,
926
+ "loss": 0.0838,
927
+ "step": 765
928
+ },
929
+ {
930
+ "epoch": 11.9,
931
+ "learning_rate": 0.0001,
932
+ "loss": 0.089,
933
+ "step": 770
934
+ },
935
+ {
936
+ "epoch": 11.98,
937
+ "learning_rate": 0.0001,
938
+ "loss": 0.1106,
939
+ "step": 775
940
+ },
941
+ {
942
+ "epoch": 12.06,
943
+ "learning_rate": 0.0001,
944
+ "loss": 0.0737,
945
+ "step": 780
946
+ },
947
+ {
948
+ "epoch": 12.14,
949
+ "learning_rate": 0.0001,
950
+ "loss": 0.0666,
951
+ "step": 785
952
+ },
953
+ {
954
+ "epoch": 12.21,
955
+ "learning_rate": 0.0001,
956
+ "loss": 0.0675,
957
+ "step": 790
958
+ },
959
+ {
960
+ "epoch": 12.29,
961
+ "learning_rate": 0.0001,
962
+ "loss": 0.0851,
963
+ "step": 795
964
+ },
965
+ {
966
+ "epoch": 12.37,
967
+ "learning_rate": 0.0001,
968
+ "loss": 0.0681,
969
+ "step": 800
970
+ },
971
+ {
972
+ "epoch": 12.44,
973
+ "learning_rate": 0.0001,
974
+ "loss": 0.0708,
975
+ "step": 805
976
+ },
977
+ {
978
+ "epoch": 12.52,
979
+ "learning_rate": 0.0001,
980
+ "loss": 0.0693,
981
+ "step": 810
982
+ },
983
+ {
984
+ "epoch": 12.6,
985
+ "learning_rate": 0.0001,
986
+ "loss": 0.0693,
987
+ "step": 815
988
+ },
989
+ {
990
+ "epoch": 12.68,
991
+ "learning_rate": 0.0001,
992
+ "loss": 0.1068,
993
+ "step": 820
994
+ },
995
+ {
996
+ "epoch": 12.75,
997
+ "learning_rate": 0.0001,
998
+ "loss": 0.07,
999
+ "step": 825
1000
+ },
1001
+ {
1002
+ "epoch": 12.83,
1003
+ "learning_rate": 0.0001,
1004
+ "loss": 0.0871,
1005
+ "step": 830
1006
+ },
1007
+ {
1008
+ "epoch": 12.91,
1009
+ "learning_rate": 0.0001,
1010
+ "loss": 0.0743,
1011
+ "step": 835
1012
+ },
1013
+ {
1014
+ "epoch": 12.99,
1015
+ "learning_rate": 0.0001,
1016
+ "loss": 0.0722,
1017
+ "step": 840
1018
+ },
1019
+ {
1020
+ "epoch": 13.06,
1021
+ "learning_rate": 0.0001,
1022
+ "loss": 0.0593,
1023
+ "step": 845
1024
+ },
1025
+ {
1026
+ "epoch": 13.14,
1027
+ "learning_rate": 0.0001,
1028
+ "loss": 0.0549,
1029
+ "step": 850
1030
+ },
1031
+ {
1032
+ "epoch": 13.22,
1033
+ "learning_rate": 0.0001,
1034
+ "loss": 0.0548,
1035
+ "step": 855
1036
+ },
1037
+ {
1038
+ "epoch": 13.29,
1039
+ "learning_rate": 0.0001,
1040
+ "loss": 0.0582,
1041
+ "step": 860
1042
+ },
1043
+ {
1044
+ "epoch": 13.37,
1045
+ "learning_rate": 0.0001,
1046
+ "loss": 0.0579,
1047
+ "step": 865
1048
+ },
1049
+ {
1050
+ "epoch": 13.45,
1051
+ "learning_rate": 0.0001,
1052
+ "loss": 0.0601,
1053
+ "step": 870
1054
+ },
1055
+ {
1056
+ "epoch": 13.53,
1057
+ "learning_rate": 0.0001,
1058
+ "loss": 0.0576,
1059
+ "step": 875
1060
+ },
1061
+ {
1062
+ "epoch": 13.6,
1063
+ "learning_rate": 0.0001,
1064
+ "loss": 0.0594,
1065
+ "step": 880
1066
+ },
1067
+ {
1068
+ "epoch": 13.68,
1069
+ "learning_rate": 0.0001,
1070
+ "loss": 0.0605,
1071
+ "step": 885
1072
+ },
1073
+ {
1074
+ "epoch": 13.76,
1075
+ "learning_rate": 0.0001,
1076
+ "loss": 0.0732,
1077
+ "step": 890
1078
+ },
1079
+ {
1080
+ "epoch": 13.84,
1081
+ "learning_rate": 0.0001,
1082
+ "loss": 0.0652,
1083
+ "step": 895
1084
+ },
1085
+ {
1086
+ "epoch": 13.91,
1087
+ "learning_rate": 0.0001,
1088
+ "loss": 0.0628,
1089
+ "step": 900
1090
+ },
1091
+ {
1092
+ "epoch": 13.99,
1093
+ "learning_rate": 0.0001,
1094
+ "loss": 0.09,
1095
+ "step": 905
1096
+ },
1097
+ {
1098
+ "epoch": 14.07,
1099
+ "learning_rate": 0.0001,
1100
+ "loss": 0.0564,
1101
+ "step": 910
1102
+ },
1103
+ {
1104
+ "epoch": 14.14,
1105
+ "learning_rate": 0.0001,
1106
+ "loss": 0.0481,
1107
+ "step": 915
1108
+ },
1109
+ {
1110
+ "epoch": 14.22,
1111
+ "learning_rate": 0.0001,
1112
+ "loss": 0.048,
1113
+ "step": 920
1114
+ },
1115
+ {
1116
+ "epoch": 14.3,
1117
+ "learning_rate": 0.0001,
1118
+ "loss": 0.0468,
1119
+ "step": 925
1120
+ },
1121
+ {
1122
+ "epoch": 14.38,
1123
+ "learning_rate": 0.0001,
1124
+ "loss": 0.0989,
1125
+ "step": 930
1126
+ },
1127
+ {
1128
+ "epoch": 14.45,
1129
+ "learning_rate": 0.0001,
1130
+ "loss": 0.0497,
1131
+ "step": 935
1132
+ },
1133
+ {
1134
+ "epoch": 14.53,
1135
+ "learning_rate": 0.0001,
1136
+ "loss": 0.0495,
1137
+ "step": 940
1138
+ },
1139
+ {
1140
+ "epoch": 14.61,
1141
+ "learning_rate": 0.0001,
1142
+ "loss": 0.0499,
1143
+ "step": 945
1144
+ },
1145
+ {
1146
+ "epoch": 14.69,
1147
+ "learning_rate": 0.0001,
1148
+ "loss": 0.049,
1149
+ "step": 950
1150
+ },
1151
+ {
1152
+ "epoch": 14.76,
1153
+ "learning_rate": 0.0001,
1154
+ "loss": 0.0629,
1155
+ "step": 955
1156
+ },
1157
+ {
1158
+ "epoch": 14.84,
1159
+ "learning_rate": 0.0001,
1160
+ "loss": 0.0536,
1161
+ "step": 960
1162
+ },
1163
+ {
1164
+ "epoch": 14.92,
1165
+ "learning_rate": 0.0001,
1166
+ "loss": 0.0515,
1167
+ "step": 965
1168
+ },
1169
+ {
1170
+ "epoch": 15.0,
1171
+ "learning_rate": 0.0001,
1172
+ "loss": 0.0679,
1173
+ "step": 970
1174
+ },
1175
+ {
1176
+ "epoch": 15.07,
1177
+ "learning_rate": 0.0001,
1178
+ "loss": 0.04,
1179
+ "step": 975
1180
+ },
1181
+ {
1182
+ "epoch": 15.15,
1183
+ "learning_rate": 0.0001,
1184
+ "loss": 0.0596,
1185
+ "step": 980
1186
+ },
1187
+ {
1188
+ "epoch": 15.23,
1189
+ "learning_rate": 0.0001,
1190
+ "loss": 0.0742,
1191
+ "step": 985
1192
+ },
1193
+ {
1194
+ "epoch": 15.3,
1195
+ "learning_rate": 0.0001,
1196
+ "loss": 0.0693,
1197
+ "step": 990
1198
+ },
1199
+ {
1200
+ "epoch": 15.38,
1201
+ "learning_rate": 0.0001,
1202
+ "loss": 0.0414,
1203
+ "step": 995
1204
+ },
1205
+ {
1206
+ "epoch": 15.46,
1207
+ "learning_rate": 0.0001,
1208
+ "loss": 0.0442,
1209
+ "step": 1000
1210
+ },
1211
+ {
1212
+ "epoch": 15.54,
1213
+ "learning_rate": 0.0001,
1214
+ "loss": 0.0409,
1215
+ "step": 1005
1216
+ },
1217
+ {
1218
+ "epoch": 15.61,
1219
+ "learning_rate": 0.0001,
1220
+ "loss": 0.04,
1221
+ "step": 1010
1222
+ },
1223
+ {
1224
+ "epoch": 15.69,
1225
+ "learning_rate": 0.0001,
1226
+ "loss": 0.0414,
1227
+ "step": 1015
1228
+ },
1229
+ {
1230
+ "epoch": 15.77,
1231
+ "learning_rate": 0.0001,
1232
+ "loss": 0.0393,
1233
+ "step": 1020
1234
+ },
1235
+ {
1236
+ "epoch": 15.85,
1237
+ "learning_rate": 0.0001,
1238
+ "loss": 0.0398,
1239
+ "step": 1025
1240
+ },
1241
+ {
1242
+ "epoch": 15.92,
1243
+ "learning_rate": 0.0001,
1244
+ "loss": 0.0411,
1245
+ "step": 1030
1246
+ },
1247
+ {
1248
+ "epoch": 16.0,
1249
+ "learning_rate": 0.0001,
1250
+ "loss": 0.0407,
1251
+ "step": 1035
1252
+ },
1253
+ {
1254
+ "epoch": 16.08,
1255
+ "learning_rate": 0.0001,
1256
+ "loss": 0.0305,
1257
+ "step": 1040
1258
+ },
1259
+ {
1260
+ "epoch": 16.15,
1261
+ "learning_rate": 0.0001,
1262
+ "loss": 0.0338,
1263
+ "step": 1045
1264
+ },
1265
+ {
1266
+ "epoch": 16.23,
1267
+ "learning_rate": 0.0001,
1268
+ "loss": 0.0324,
1269
+ "step": 1050
1270
+ },
1271
+ {
1272
+ "epoch": 16.31,
1273
+ "learning_rate": 0.0001,
1274
+ "loss": 0.0326,
1275
+ "step": 1055
1276
+ },
1277
+ {
1278
+ "epoch": 16.39,
1279
+ "learning_rate": 0.0001,
1280
+ "loss": 0.0314,
1281
+ "step": 1060
1282
+ },
1283
+ {
1284
+ "epoch": 16.46,
1285
+ "learning_rate": 0.0001,
1286
+ "loss": 0.0332,
1287
+ "step": 1065
1288
+ },
1289
+ {
1290
+ "epoch": 16.54,
1291
+ "learning_rate": 0.0001,
1292
+ "loss": 0.0329,
1293
+ "step": 1070
1294
+ },
1295
+ {
1296
+ "epoch": 16.62,
1297
+ "learning_rate": 0.0001,
1298
+ "loss": 0.0406,
1299
+ "step": 1075
1300
+ },
1301
+ {
1302
+ "epoch": 16.7,
1303
+ "learning_rate": 0.0001,
1304
+ "loss": 0.0324,
1305
+ "step": 1080
1306
+ },
1307
+ {
1308
+ "epoch": 16.77,
1309
+ "learning_rate": 0.0001,
1310
+ "loss": 0.0324,
1311
+ "step": 1085
1312
+ },
1313
+ {
1314
+ "epoch": 16.85,
1315
+ "learning_rate": 0.0001,
1316
+ "loss": 0.0721,
1317
+ "step": 1090
1318
+ },
1319
+ {
1320
+ "epoch": 16.93,
1321
+ "learning_rate": 0.0001,
1322
+ "loss": 0.0344,
1323
+ "step": 1095
1324
+ },
1325
+ {
1326
+ "epoch": 17.0,
1327
+ "learning_rate": 0.0001,
1328
+ "loss": 0.0333,
1329
+ "step": 1100
1330
+ },
1331
+ {
1332
+ "epoch": 17.08,
1333
+ "learning_rate": 0.0001,
1334
+ "loss": 0.029,
1335
+ "step": 1105
1336
+ },
1337
+ {
1338
+ "epoch": 17.16,
1339
+ "learning_rate": 0.0001,
1340
+ "loss": 0.028,
1341
+ "step": 1110
1342
+ },
1343
+ {
1344
+ "epoch": 17.24,
1345
+ "learning_rate": 0.0001,
1346
+ "loss": 0.0288,
1347
+ "step": 1115
1348
+ },
1349
+ {
1350
+ "epoch": 17.31,
1351
+ "learning_rate": 0.0001,
1352
+ "loss": 0.0274,
1353
+ "step": 1120
1354
+ },
1355
+ {
1356
+ "epoch": 17.39,
1357
+ "learning_rate": 0.0001,
1358
+ "loss": 0.0278,
1359
+ "step": 1125
1360
+ },
1361
+ {
1362
+ "epoch": 17.47,
1363
+ "learning_rate": 0.0001,
1364
+ "loss": 0.0288,
1365
+ "step": 1130
1366
+ },
1367
+ {
1368
+ "epoch": 17.55,
1369
+ "learning_rate": 0.0001,
1370
+ "loss": 0.0414,
1371
+ "step": 1135
1372
+ },
1373
+ {
1374
+ "epoch": 17.62,
1375
+ "learning_rate": 0.0001,
1376
+ "loss": 0.0322,
1377
+ "step": 1140
1378
+ },
1379
+ {
1380
+ "epoch": 17.7,
1381
+ "learning_rate": 0.0001,
1382
+ "loss": 0.0303,
1383
+ "step": 1145
1384
+ },
1385
+ {
1386
+ "epoch": 17.78,
1387
+ "learning_rate": 0.0001,
1388
+ "loss": 0.0291,
1389
+ "step": 1150
1390
+ },
1391
+ {
1392
+ "epoch": 17.86,
1393
+ "learning_rate": 0.0001,
1394
+ "loss": 0.0411,
1395
+ "step": 1155
1396
+ },
1397
+ {
1398
+ "epoch": 17.93,
1399
+ "learning_rate": 0.0001,
1400
+ "loss": 0.0282,
1401
+ "step": 1160
1402
+ },
1403
+ {
1404
+ "epoch": 18.01,
1405
+ "learning_rate": 0.0001,
1406
+ "loss": 0.0285,
1407
+ "step": 1165
1408
+ },
1409
+ {
1410
+ "epoch": 18.09,
1411
+ "learning_rate": 0.0001,
1412
+ "loss": 0.0227,
1413
+ "step": 1170
1414
+ },
1415
+ {
1416
+ "epoch": 18.16,
1417
+ "learning_rate": 0.0001,
1418
+ "loss": 0.0232,
1419
+ "step": 1175
1420
+ },
1421
+ {
1422
+ "epoch": 18.24,
1423
+ "learning_rate": 0.0001,
1424
+ "loss": 0.0228,
1425
+ "step": 1180
1426
+ },
1427
+ {
1428
+ "epoch": 18.32,
1429
+ "learning_rate": 0.0001,
1430
+ "loss": 0.0332,
1431
+ "step": 1185
1432
+ },
1433
+ {
1434
+ "epoch": 18.4,
1435
+ "learning_rate": 0.0001,
1436
+ "loss": 0.0226,
1437
+ "step": 1190
1438
+ },
1439
+ {
1440
+ "epoch": 18.47,
1441
+ "learning_rate": 0.0001,
1442
+ "loss": 0.0367,
1443
+ "step": 1195
1444
+ },
1445
+ {
1446
+ "epoch": 18.55,
1447
+ "learning_rate": 0.0001,
1448
+ "loss": 0.0255,
1449
+ "step": 1200
1450
+ },
1451
+ {
1452
+ "epoch": 18.63,
1453
+ "learning_rate": 0.0001,
1454
+ "loss": 0.0354,
1455
+ "step": 1205
1456
+ },
1457
+ {
1458
+ "epoch": 18.71,
1459
+ "learning_rate": 0.0001,
1460
+ "loss": 0.0238,
1461
+ "step": 1210
1462
+ },
1463
+ {
1464
+ "epoch": 18.78,
1465
+ "learning_rate": 0.0001,
1466
+ "loss": 0.0262,
1467
+ "step": 1215
1468
+ },
1469
+ {
1470
+ "epoch": 18.86,
1471
+ "learning_rate": 0.0001,
1472
+ "loss": 0.0243,
1473
+ "step": 1220
1474
+ },
1475
+ {
1476
+ "epoch": 18.94,
1477
+ "learning_rate": 0.0001,
1478
+ "loss": 0.0241,
1479
+ "step": 1225
1480
+ },
1481
+ {
1482
+ "epoch": 19.01,
1483
+ "learning_rate": 0.0001,
1484
+ "loss": 0.0237,
1485
+ "step": 1230
1486
+ },
1487
+ {
1488
+ "epoch": 19.09,
1489
+ "learning_rate": 0.0001,
1490
+ "loss": 0.0207,
1491
+ "step": 1235
1492
+ },
1493
+ {
1494
+ "epoch": 19.17,
1495
+ "learning_rate": 0.0001,
1496
+ "loss": 0.0223,
1497
+ "step": 1240
1498
+ },
1499
+ {
1500
+ "epoch": 19.25,
1501
+ "learning_rate": 0.0001,
1502
+ "loss": 0.02,
1503
+ "step": 1245
1504
+ },
1505
+ {
1506
+ "epoch": 19.32,
1507
+ "learning_rate": 0.0001,
1508
+ "loss": 0.0205,
1509
+ "step": 1250
1510
+ },
1511
+ {
1512
+ "epoch": 19.4,
1513
+ "learning_rate": 0.0001,
1514
+ "loss": 0.0204,
1515
+ "step": 1255
1516
+ },
1517
+ {
1518
+ "epoch": 19.48,
1519
+ "learning_rate": 0.0001,
1520
+ "loss": 0.0195,
1521
+ "step": 1260
1522
+ },
1523
+ {
1524
+ "epoch": 19.56,
1525
+ "learning_rate": 0.0001,
1526
+ "loss": 0.0205,
1527
+ "step": 1265
1528
+ },
1529
+ {
1530
+ "epoch": 19.63,
1531
+ "learning_rate": 0.0001,
1532
+ "loss": 0.0222,
1533
+ "step": 1270
1534
+ },
1535
+ {
1536
+ "epoch": 19.71,
1537
+ "learning_rate": 0.0001,
1538
+ "loss": 0.0473,
1539
+ "step": 1275
1540
+ },
1541
+ {
1542
+ "epoch": 19.79,
1543
+ "learning_rate": 0.0001,
1544
+ "loss": 0.0216,
1545
+ "step": 1280
1546
+ },
1547
+ {
1548
+ "epoch": 19.86,
1549
+ "learning_rate": 0.0001,
1550
+ "loss": 0.0242,
1551
+ "step": 1285
1552
+ },
1553
+ {
1554
+ "epoch": 19.94,
1555
+ "learning_rate": 0.0001,
1556
+ "loss": 0.0209,
1557
+ "step": 1290
1558
+ },
1559
+ {
1560
+ "epoch": 20.02,
1561
+ "learning_rate": 0.0001,
1562
+ "loss": 0.0212,
1563
+ "step": 1295
1564
+ },
1565
+ {
1566
+ "epoch": 20.1,
1567
+ "learning_rate": 0.0001,
1568
+ "loss": 0.0243,
1569
+ "step": 1300
1570
+ },
1571
+ {
1572
+ "epoch": 20.17,
1573
+ "learning_rate": 0.0001,
1574
+ "loss": 0.0205,
1575
+ "step": 1305
1576
+ },
1577
+ {
1578
+ "epoch": 20.25,
1579
+ "learning_rate": 0.0001,
1580
+ "loss": 0.0197,
1581
+ "step": 1310
1582
+ },
1583
+ {
1584
+ "epoch": 20.33,
1585
+ "learning_rate": 0.0001,
1586
+ "loss": 0.0191,
1587
+ "step": 1315
1588
+ },
1589
+ {
1590
+ "epoch": 20.41,
1591
+ "learning_rate": 0.0001,
1592
+ "loss": 0.0186,
1593
+ "step": 1320
1594
+ },
1595
+ {
1596
+ "epoch": 20.48,
1597
+ "learning_rate": 0.0001,
1598
+ "loss": 0.0264,
1599
+ "step": 1325
1600
+ },
1601
+ {
1602
+ "epoch": 20.56,
1603
+ "learning_rate": 0.0001,
1604
+ "loss": 0.0194,
1605
+ "step": 1330
1606
+ },
1607
+ {
1608
+ "epoch": 20.64,
1609
+ "learning_rate": 0.0001,
1610
+ "loss": 0.0206,
1611
+ "step": 1335
1612
+ },
1613
+ {
1614
+ "epoch": 20.71,
1615
+ "learning_rate": 0.0001,
1616
+ "loss": 0.0203,
1617
+ "step": 1340
1618
+ },
1619
+ {
1620
+ "epoch": 20.79,
1621
+ "learning_rate": 0.0001,
1622
+ "loss": 0.019,
1623
+ "step": 1345
1624
+ },
1625
+ {
1626
+ "epoch": 20.87,
1627
+ "learning_rate": 0.0001,
1628
+ "loss": 0.0191,
1629
+ "step": 1350
1630
+ },
1631
+ {
1632
+ "epoch": 20.95,
1633
+ "learning_rate": 0.0001,
1634
+ "loss": 0.0262,
1635
+ "step": 1355
1636
+ },
1637
+ {
1638
+ "epoch": 21.02,
1639
+ "learning_rate": 0.0001,
1640
+ "loss": 0.0293,
1641
+ "step": 1360
1642
+ },
1643
+ {
1644
+ "epoch": 21.1,
1645
+ "learning_rate": 0.0001,
1646
+ "loss": 0.0169,
1647
+ "step": 1365
1648
+ },
1649
+ {
1650
+ "epoch": 21.18,
1651
+ "learning_rate": 0.0001,
1652
+ "loss": 0.0175,
1653
+ "step": 1370
1654
+ },
1655
+ {
1656
+ "epoch": 21.26,
1657
+ "learning_rate": 0.0001,
1658
+ "loss": 0.0175,
1659
+ "step": 1375
1660
+ },
1661
+ {
1662
+ "epoch": 21.33,
1663
+ "learning_rate": 0.0001,
1664
+ "loss": 0.0179,
1665
+ "step": 1380
1666
+ },
1667
+ {
1668
+ "epoch": 21.41,
1669
+ "learning_rate": 0.0001,
1670
+ "loss": 0.017,
1671
+ "step": 1385
1672
+ },
1673
+ {
1674
+ "epoch": 21.49,
1675
+ "learning_rate": 0.0001,
1676
+ "loss": 0.0211,
1677
+ "step": 1390
1678
+ },
1679
+ {
1680
+ "epoch": 21.57,
1681
+ "learning_rate": 0.0001,
1682
+ "loss": 0.0169,
1683
+ "step": 1395
1684
+ },
1685
+ {
1686
+ "epoch": 21.64,
1687
+ "learning_rate": 0.0001,
1688
+ "loss": 0.0168,
1689
+ "step": 1400
1690
+ },
1691
+ {
1692
+ "epoch": 21.72,
1693
+ "learning_rate": 0.0001,
1694
+ "loss": 0.0164,
1695
+ "step": 1405
1696
+ },
1697
+ {
1698
+ "epoch": 21.8,
1699
+ "learning_rate": 0.0001,
1700
+ "loss": 0.0298,
1701
+ "step": 1410
1702
+ },
1703
+ {
1704
+ "epoch": 21.87,
1705
+ "learning_rate": 0.0001,
1706
+ "loss": 0.02,
1707
+ "step": 1415
1708
+ },
1709
+ {
1710
+ "epoch": 21.95,
1711
+ "learning_rate": 0.0001,
1712
+ "loss": 0.0235,
1713
+ "step": 1420
1714
+ },
1715
+ {
1716
+ "epoch": 22.03,
1717
+ "learning_rate": 0.0001,
1718
+ "loss": 0.018,
1719
+ "step": 1425
1720
+ },
1721
+ {
1722
+ "epoch": 22.11,
1723
+ "learning_rate": 0.0001,
1724
+ "loss": 0.0164,
1725
+ "step": 1430
1726
+ },
1727
+ {
1728
+ "epoch": 22.18,
1729
+ "learning_rate": 0.0001,
1730
+ "loss": 0.0225,
1731
+ "step": 1435
1732
+ },
1733
+ {
1734
+ "epoch": 22.26,
1735
+ "learning_rate": 0.0001,
1736
+ "loss": 0.0167,
1737
+ "step": 1440
1738
+ },
1739
+ {
1740
+ "epoch": 22.34,
1741
+ "learning_rate": 0.0001,
1742
+ "loss": 0.024,
1743
+ "step": 1445
1744
+ },
1745
+ {
1746
+ "epoch": 22.42,
1747
+ "learning_rate": 0.0001,
1748
+ "loss": 0.0161,
1749
+ "step": 1450
1750
+ },
1751
+ {
1752
+ "epoch": 22.49,
1753
+ "learning_rate": 0.0001,
1754
+ "loss": 0.0224,
1755
+ "step": 1455
1756
+ },
1757
+ {
1758
+ "epoch": 22.57,
1759
+ "learning_rate": 0.0001,
1760
+ "loss": 0.0203,
1761
+ "step": 1460
1762
+ },
1763
+ {
1764
+ "epoch": 22.65,
1765
+ "learning_rate": 0.0001,
1766
+ "loss": 0.0169,
1767
+ "step": 1465
1768
+ },
1769
+ {
1770
+ "epoch": 22.72,
1771
+ "learning_rate": 0.0001,
1772
+ "loss": 0.0166,
1773
+ "step": 1470
1774
+ },
1775
+ {
1776
+ "epoch": 22.8,
1777
+ "learning_rate": 0.0001,
1778
+ "loss": 0.0163,
1779
+ "step": 1475
1780
+ },
1781
+ {
1782
+ "epoch": 22.88,
1783
+ "learning_rate": 0.0001,
1784
+ "loss": 0.0165,
1785
+ "step": 1480
1786
+ },
1787
+ {
1788
+ "epoch": 22.96,
1789
+ "learning_rate": 0.0001,
1790
+ "loss": 0.0155,
1791
+ "step": 1485
1792
+ },
1793
+ {
1794
+ "epoch": 23.03,
1795
+ "learning_rate": 0.0001,
1796
+ "loss": 0.0164,
1797
+ "step": 1490
1798
+ },
1799
+ {
1800
+ "epoch": 23.11,
1801
+ "learning_rate": 0.0001,
1802
+ "loss": 0.0148,
1803
+ "step": 1495
1804
+ },
1805
+ {
1806
+ "epoch": 23.19,
1807
+ "learning_rate": 0.0001,
1808
+ "loss": 0.0151,
1809
+ "step": 1500
1810
+ },
1811
+ {
1812
+ "epoch": 23.27,
1813
+ "learning_rate": 0.0001,
1814
+ "loss": 0.0176,
1815
+ "step": 1505
1816
+ },
1817
+ {
1818
+ "epoch": 23.34,
1819
+ "learning_rate": 0.0001,
1820
+ "loss": 0.0418,
1821
+ "step": 1510
1822
+ },
1823
+ {
1824
+ "epoch": 23.42,
1825
+ "learning_rate": 0.0001,
1826
+ "loss": 0.0155,
1827
+ "step": 1515
1828
+ },
1829
+ {
1830
+ "epoch": 23.5,
1831
+ "learning_rate": 0.0001,
1832
+ "loss": 0.0163,
1833
+ "step": 1520
1834
+ },
1835
+ {
1836
+ "epoch": 23.57,
1837
+ "learning_rate": 0.0001,
1838
+ "loss": 0.0161,
1839
+ "step": 1525
1840
+ },
1841
+ {
1842
+ "epoch": 23.65,
1843
+ "learning_rate": 0.0001,
1844
+ "loss": 0.0157,
1845
+ "step": 1530
1846
+ },
1847
+ {
1848
+ "epoch": 23.73,
1849
+ "learning_rate": 0.0001,
1850
+ "loss": 0.0219,
1851
+ "step": 1535
1852
+ },
1853
+ {
1854
+ "epoch": 23.81,
1855
+ "learning_rate": 0.0001,
1856
+ "loss": 0.0154,
1857
+ "step": 1540
1858
+ },
1859
+ {
1860
+ "epoch": 23.88,
1861
+ "learning_rate": 0.0001,
1862
+ "loss": 0.0149,
1863
+ "step": 1545
1864
+ },
1865
+ {
1866
+ "epoch": 23.96,
1867
+ "learning_rate": 0.0001,
1868
+ "loss": 0.0151,
1869
+ "step": 1550
1870
+ },
1871
+ {
1872
+ "epoch": 24.04,
1873
+ "learning_rate": 0.0001,
1874
+ "loss": 0.0138,
1875
+ "step": 1555
1876
+ },
1877
+ {
1878
+ "epoch": 24.12,
1879
+ "learning_rate": 0.0001,
1880
+ "loss": 0.0134,
1881
+ "step": 1560
1882
+ },
1883
+ {
1884
+ "epoch": 24.19,
1885
+ "learning_rate": 0.0001,
1886
+ "loss": 0.0349,
1887
+ "step": 1565
1888
+ },
1889
+ {
1890
+ "epoch": 24.27,
1891
+ "learning_rate": 0.0001,
1892
+ "loss": 0.0139,
1893
+ "step": 1570
1894
+ },
1895
+ {
1896
+ "epoch": 24.35,
1897
+ "learning_rate": 0.0001,
1898
+ "loss": 0.0138,
1899
+ "step": 1575
1900
+ },
1901
+ {
1902
+ "epoch": 24.43,
1903
+ "learning_rate": 0.0001,
1904
+ "loss": 0.014,
1905
+ "step": 1580
1906
+ },
1907
+ {
1908
+ "epoch": 24.5,
1909
+ "learning_rate": 0.0001,
1910
+ "loss": 0.0139,
1911
+ "step": 1585
1912
+ },
1913
+ {
1914
+ "epoch": 24.58,
1915
+ "learning_rate": 0.0001,
1916
+ "loss": 0.0137,
1917
+ "step": 1590
1918
+ },
1919
+ {
1920
+ "epoch": 24.66,
1921
+ "learning_rate": 0.0001,
1922
+ "loss": 0.0132,
1923
+ "step": 1595
1924
+ },
1925
+ {
1926
+ "epoch": 24.73,
1927
+ "learning_rate": 0.0001,
1928
+ "loss": 0.0133,
1929
+ "step": 1600
1930
+ },
1931
+ {
1932
+ "epoch": 24.81,
1933
+ "learning_rate": 0.0001,
1934
+ "loss": 0.027,
1935
+ "step": 1605
1936
+ },
1937
+ {
1938
+ "epoch": 24.89,
1939
+ "learning_rate": 0.0001,
1940
+ "loss": 0.0135,
1941
+ "step": 1610
1942
+ },
1943
+ {
1944
+ "epoch": 24.97,
1945
+ "learning_rate": 0.0001,
1946
+ "loss": 0.0141,
1947
+ "step": 1615
1948
+ },
1949
+ {
1950
+ "epoch": 25.04,
1951
+ "learning_rate": 0.0001,
1952
+ "loss": 0.0135,
1953
+ "step": 1620
1954
+ },
1955
+ {
1956
+ "epoch": 25.12,
1957
+ "learning_rate": 0.0001,
1958
+ "loss": 0.0119,
1959
+ "step": 1625
1960
+ },
1961
+ {
1962
+ "epoch": 25.2,
1963
+ "learning_rate": 0.0001,
1964
+ "loss": 0.0197,
1965
+ "step": 1630
1966
+ },
1967
+ {
1968
+ "epoch": 25.28,
1969
+ "learning_rate": 0.0001,
1970
+ "loss": 0.012,
1971
+ "step": 1635
1972
+ },
1973
+ {
1974
+ "epoch": 25.35,
1975
+ "learning_rate": 0.0001,
1976
+ "loss": 0.0119,
1977
+ "step": 1640
1978
+ },
1979
+ {
1980
+ "epoch": 25.43,
1981
+ "learning_rate": 0.0001,
1982
+ "loss": 0.0124,
1983
+ "step": 1645
1984
+ },
1985
+ {
1986
+ "epoch": 25.51,
1987
+ "learning_rate": 0.0001,
1988
+ "loss": 0.0121,
1989
+ "step": 1650
1990
+ },
1991
+ {
1992
+ "epoch": 25.58,
1993
+ "learning_rate": 0.0001,
1994
+ "loss": 0.0173,
1995
+ "step": 1655
1996
+ },
1997
+ {
1998
+ "epoch": 25.66,
1999
+ "learning_rate": 0.0001,
2000
+ "loss": 0.0153,
2001
+ "step": 1660
2002
+ },
2003
+ {
2004
+ "epoch": 25.74,
2005
+ "learning_rate": 0.0001,
2006
+ "loss": 0.0127,
2007
+ "step": 1665
2008
+ },
2009
+ {
2010
+ "epoch": 25.82,
2011
+ "learning_rate": 0.0001,
2012
+ "loss": 0.0125,
2013
+ "step": 1670
2014
+ },
2015
+ {
2016
+ "epoch": 25.89,
2017
+ "learning_rate": 0.0001,
2018
+ "loss": 0.0124,
2019
+ "step": 1675
2020
+ },
2021
+ {
2022
+ "epoch": 25.97,
2023
+ "learning_rate": 0.0001,
2024
+ "loss": 0.0146,
2025
+ "step": 1680
2026
+ },
2027
+ {
2028
+ "epoch": 26.05,
2029
+ "learning_rate": 0.0001,
2030
+ "loss": 0.0169,
2031
+ "step": 1685
2032
+ },
2033
+ {
2034
+ "epoch": 26.13,
2035
+ "learning_rate": 0.0001,
2036
+ "loss": 0.0137,
2037
+ "step": 1690
2038
+ },
2039
+ {
2040
+ "epoch": 26.2,
2041
+ "learning_rate": 0.0001,
2042
+ "loss": 0.0147,
2043
+ "step": 1695
2044
+ },
2045
+ {
2046
+ "epoch": 26.28,
2047
+ "learning_rate": 0.0001,
2048
+ "loss": 0.0116,
2049
+ "step": 1700
2050
+ },
2051
+ {
2052
+ "epoch": 26.36,
2053
+ "learning_rate": 0.0001,
2054
+ "loss": 0.0135,
2055
+ "step": 1705
2056
+ },
2057
+ {
2058
+ "epoch": 26.43,
2059
+ "learning_rate": 0.0001,
2060
+ "loss": 0.0129,
2061
+ "step": 1710
2062
+ },
2063
+ {
2064
+ "epoch": 26.51,
2065
+ "learning_rate": 0.0001,
2066
+ "loss": 0.0124,
2067
+ "step": 1715
2068
+ },
2069
+ {
2070
+ "epoch": 26.59,
2071
+ "learning_rate": 0.0001,
2072
+ "loss": 0.0115,
2073
+ "step": 1720
2074
+ },
2075
+ {
2076
+ "epoch": 26.67,
2077
+ "learning_rate": 0.0001,
2078
+ "loss": 0.0119,
2079
+ "step": 1725
2080
+ },
2081
+ {
2082
+ "epoch": 26.74,
2083
+ "learning_rate": 0.0001,
2084
+ "loss": 0.0118,
2085
+ "step": 1730
2086
+ },
2087
+ {
2088
+ "epoch": 26.82,
2089
+ "learning_rate": 0.0001,
2090
+ "loss": 0.0123,
2091
+ "step": 1735
2092
+ },
2093
+ {
2094
+ "epoch": 26.9,
2095
+ "learning_rate": 0.0001,
2096
+ "loss": 0.0114,
2097
+ "step": 1740
2098
+ },
2099
+ {
2100
+ "epoch": 26.98,
2101
+ "learning_rate": 0.0001,
2102
+ "loss": 0.0119,
2103
+ "step": 1745
2104
+ },
2105
+ {
2106
+ "epoch": 27.05,
2107
+ "learning_rate": 0.0001,
2108
+ "loss": 0.011,
2109
+ "step": 1750
2110
+ },
2111
+ {
2112
+ "epoch": 27.13,
2113
+ "learning_rate": 0.0001,
2114
+ "loss": 0.0122,
2115
+ "step": 1755
2116
+ },
2117
+ {
2118
+ "epoch": 27.21,
2119
+ "learning_rate": 0.0001,
2120
+ "loss": 0.0104,
2121
+ "step": 1760
2122
+ },
2123
+ {
2124
+ "epoch": 27.29,
2125
+ "learning_rate": 0.0001,
2126
+ "loss": 0.011,
2127
+ "step": 1765
2128
+ },
2129
+ {
2130
+ "epoch": 27.36,
2131
+ "learning_rate": 0.0001,
2132
+ "loss": 0.0104,
2133
+ "step": 1770
2134
+ },
2135
+ {
2136
+ "epoch": 27.44,
2137
+ "learning_rate": 0.0001,
2138
+ "loss": 0.0105,
2139
+ "step": 1775
2140
+ },
2141
+ {
2142
+ "epoch": 27.52,
2143
+ "learning_rate": 0.0001,
2144
+ "loss": 0.0106,
2145
+ "step": 1780
2146
+ },
2147
+ {
2148
+ "epoch": 27.59,
2149
+ "learning_rate": 0.0001,
2150
+ "loss": 0.0138,
2151
+ "step": 1785
2152
+ },
2153
+ {
2154
+ "epoch": 27.67,
2155
+ "learning_rate": 0.0001,
2156
+ "loss": 0.0129,
2157
+ "step": 1790
2158
+ },
2159
+ {
2160
+ "epoch": 27.75,
2161
+ "learning_rate": 0.0001,
2162
+ "loss": 0.0109,
2163
+ "step": 1795
2164
+ },
2165
+ {
2166
+ "epoch": 27.83,
2167
+ "learning_rate": 0.0001,
2168
+ "loss": 0.0108,
2169
+ "step": 1800
2170
+ },
2171
+ {
2172
+ "epoch": 27.9,
2173
+ "learning_rate": 0.0001,
2174
+ "loss": 0.0108,
2175
+ "step": 1805
2176
+ },
2177
+ {
2178
+ "epoch": 27.98,
2179
+ "learning_rate": 0.0001,
2180
+ "loss": 0.0109,
2181
+ "step": 1810
2182
+ },
2183
+ {
2184
+ "epoch": 28.06,
2185
+ "learning_rate": 0.0001,
2186
+ "loss": 0.0098,
2187
+ "step": 1815
2188
+ },
2189
+ {
2190
+ "epoch": 28.14,
2191
+ "learning_rate": 0.0001,
2192
+ "loss": 0.0127,
2193
+ "step": 1820
2194
+ },
2195
+ {
2196
+ "epoch": 28.21,
2197
+ "learning_rate": 0.0001,
2198
+ "loss": 0.0095,
2199
+ "step": 1825
2200
+ },
2201
+ {
2202
+ "epoch": 28.29,
2203
+ "learning_rate": 0.0001,
2204
+ "loss": 0.0095,
2205
+ "step": 1830
2206
+ },
2207
+ {
2208
+ "epoch": 28.37,
2209
+ "learning_rate": 0.0001,
2210
+ "loss": 0.0095,
2211
+ "step": 1835
2212
+ },
2213
+ {
2214
+ "epoch": 28.44,
2215
+ "learning_rate": 0.0001,
2216
+ "loss": 0.0101,
2217
+ "step": 1840
2218
+ },
2219
+ {
2220
+ "epoch": 28.52,
2221
+ "learning_rate": 0.0001,
2222
+ "loss": 0.0105,
2223
+ "step": 1845
2224
+ },
2225
+ {
2226
+ "epoch": 28.6,
2227
+ "learning_rate": 0.0001,
2228
+ "loss": 0.0096,
2229
+ "step": 1850
2230
+ },
2231
+ {
2232
+ "epoch": 28.68,
2233
+ "learning_rate": 0.0001,
2234
+ "loss": 0.0101,
2235
+ "step": 1855
2236
+ },
2237
+ {
2238
+ "epoch": 28.75,
2239
+ "learning_rate": 0.0001,
2240
+ "loss": 0.0103,
2241
+ "step": 1860
2242
+ },
2243
+ {
2244
+ "epoch": 28.83,
2245
+ "learning_rate": 0.0001,
2246
+ "loss": 0.0103,
2247
+ "step": 1865
2248
+ },
2249
+ {
2250
+ "epoch": 28.91,
2251
+ "learning_rate": 0.0001,
2252
+ "loss": 0.0139,
2253
+ "step": 1870
2254
+ },
2255
+ {
2256
+ "epoch": 28.99,
2257
+ "learning_rate": 0.0001,
2258
+ "loss": 0.0104,
2259
+ "step": 1875
2260
+ },
2261
+ {
2262
+ "epoch": 29.06,
2263
+ "learning_rate": 0.0001,
2264
+ "loss": 0.0096,
2265
+ "step": 1880
2266
+ },
2267
+ {
2268
+ "epoch": 29.14,
2269
+ "learning_rate": 0.0001,
2270
+ "loss": 0.01,
2271
+ "step": 1885
2272
+ },
2273
+ {
2274
+ "epoch": 29.22,
2275
+ "learning_rate": 0.0001,
2276
+ "loss": 0.009,
2277
+ "step": 1890
2278
+ },
2279
+ {
2280
+ "epoch": 29.29,
2281
+ "learning_rate": 0.0001,
2282
+ "loss": 0.0097,
2283
+ "step": 1895
2284
+ },
2285
+ {
2286
+ "epoch": 29.37,
2287
+ "learning_rate": 0.0001,
2288
+ "loss": 0.0094,
2289
+ "step": 1900
2290
+ },
2291
+ {
2292
+ "epoch": 29.45,
2293
+ "learning_rate": 0.0001,
2294
+ "loss": 0.0311,
2295
+ "step": 1905
2296
+ },
2297
+ {
2298
+ "epoch": 29.53,
2299
+ "learning_rate": 0.0001,
2300
+ "loss": 0.0101,
2301
+ "step": 1910
2302
+ },
2303
+ {
2304
+ "epoch": 29.6,
2305
+ "learning_rate": 0.0001,
2306
+ "loss": 0.0103,
2307
+ "step": 1915
2308
+ },
2309
+ {
2310
+ "epoch": 29.68,
2311
+ "learning_rate": 0.0001,
2312
+ "loss": 0.012,
2313
+ "step": 1920
2314
+ },
2315
+ {
2316
+ "epoch": 29.76,
2317
+ "learning_rate": 0.0001,
2318
+ "loss": 0.01,
2319
+ "step": 1925
2320
+ },
2321
+ {
2322
+ "epoch": 29.84,
2323
+ "learning_rate": 0.0001,
2324
+ "loss": 0.0112,
2325
+ "step": 1930
2326
+ },
2327
+ {
2328
+ "epoch": 29.91,
2329
+ "learning_rate": 0.0001,
2330
+ "loss": 0.0101,
2331
+ "step": 1935
2332
+ },
2333
+ {
2334
+ "epoch": 29.99,
2335
+ "learning_rate": 0.0001,
2336
+ "loss": 0.0105,
2337
+ "step": 1940
2338
+ },
2339
+ {
2340
+ "epoch": 30.07,
2341
+ "learning_rate": 0.0001,
2342
+ "loss": 0.0093,
2343
+ "step": 1945
2344
+ },
2345
+ {
2346
+ "epoch": 30.14,
2347
+ "learning_rate": 0.0001,
2348
+ "loss": 0.0095,
2349
+ "step": 1950
2350
+ },
2351
+ {
2352
+ "epoch": 30.22,
2353
+ "learning_rate": 0.0001,
2354
+ "loss": 0.0089,
2355
+ "step": 1955
2356
+ },
2357
+ {
2358
+ "epoch": 30.3,
2359
+ "learning_rate": 0.0001,
2360
+ "loss": 0.0101,
2361
+ "step": 1960
2362
+ },
2363
+ {
2364
+ "epoch": 30.38,
2365
+ "learning_rate": 0.0001,
2366
+ "loss": 0.0094,
2367
+ "step": 1965
2368
+ },
2369
+ {
2370
+ "epoch": 30.45,
2371
+ "learning_rate": 0.0001,
2372
+ "loss": 0.0093,
2373
+ "step": 1970
2374
+ },
2375
+ {
2376
+ "epoch": 30.53,
2377
+ "learning_rate": 0.0001,
2378
+ "loss": 0.0094,
2379
+ "step": 1975
2380
+ },
2381
+ {
2382
+ "epoch": 30.61,
2383
+ "learning_rate": 0.0001,
2384
+ "loss": 0.0197,
2385
+ "step": 1980
2386
+ },
2387
+ {
2388
+ "epoch": 30.69,
2389
+ "learning_rate": 0.0001,
2390
+ "loss": 0.0103,
2391
+ "step": 1985
2392
+ },
2393
+ {
2394
+ "epoch": 30.76,
2395
+ "learning_rate": 0.0001,
2396
+ "loss": 0.0107,
2397
+ "step": 1990
2398
+ },
2399
+ {
2400
+ "epoch": 30.84,
2401
+ "learning_rate": 0.0001,
2402
+ "loss": 0.01,
2403
+ "step": 1995
2404
+ },
2405
+ {
2406
+ "epoch": 30.92,
2407
+ "learning_rate": 0.0001,
2408
+ "loss": 0.0112,
2409
+ "step": 2000
2410
+ },
2411
+ {
2412
+ "epoch": 31.0,
2413
+ "learning_rate": 0.0001,
2414
+ "loss": 0.0104,
2415
+ "step": 2005
2416
+ },
2417
+ {
2418
+ "epoch": 31.07,
2419
+ "learning_rate": 0.0001,
2420
+ "loss": 0.0096,
2421
+ "step": 2010
2422
+ },
2423
+ {
2424
+ "epoch": 31.15,
2425
+ "learning_rate": 0.0001,
2426
+ "loss": 0.0089,
2427
+ "step": 2015
2428
+ },
2429
+ {
2430
+ "epoch": 31.23,
2431
+ "learning_rate": 0.0001,
2432
+ "loss": 0.0085,
2433
+ "step": 2020
2434
+ },
2435
+ {
2436
+ "epoch": 31.3,
2437
+ "learning_rate": 0.0001,
2438
+ "loss": 0.0092,
2439
+ "step": 2025
2440
+ },
2441
+ {
2442
+ "epoch": 31.38,
2443
+ "learning_rate": 0.0001,
2444
+ "loss": 0.0087,
2445
+ "step": 2030
2446
+ },
2447
+ {
2448
+ "epoch": 31.46,
2449
+ "learning_rate": 0.0001,
2450
+ "loss": 0.0093,
2451
+ "step": 2035
2452
+ },
2453
+ {
2454
+ "epoch": 31.54,
2455
+ "learning_rate": 0.0001,
2456
+ "loss": 0.01,
2457
+ "step": 2040
2458
+ },
2459
+ {
2460
+ "epoch": 31.61,
2461
+ "learning_rate": 0.0001,
2462
+ "loss": 0.0088,
2463
+ "step": 2045
2464
+ },
2465
+ {
2466
+ "epoch": 31.69,
2467
+ "learning_rate": 0.0001,
2468
+ "loss": 0.0099,
2469
+ "step": 2050
2470
+ },
2471
+ {
2472
+ "epoch": 31.77,
2473
+ "learning_rate": 0.0001,
2474
+ "loss": 0.0211,
2475
+ "step": 2055
2476
+ },
2477
+ {
2478
+ "epoch": 31.85,
2479
+ "learning_rate": 0.0001,
2480
+ "loss": 0.0096,
2481
+ "step": 2060
2482
+ },
2483
+ {
2484
+ "epoch": 31.92,
2485
+ "learning_rate": 0.0001,
2486
+ "loss": 0.0093,
2487
+ "step": 2065
2488
+ },
2489
+ {
2490
+ "epoch": 32.0,
2491
+ "learning_rate": 0.0001,
2492
+ "loss": 0.01,
2493
+ "step": 2070
2494
+ },
2495
+ {
2496
+ "epoch": 32.08,
2497
+ "learning_rate": 0.0001,
2498
+ "loss": 0.0089,
2499
+ "step": 2075
2500
+ },
2501
+ {
2502
+ "epoch": 32.15,
2503
+ "learning_rate": 0.0001,
2504
+ "loss": 0.0088,
2505
+ "step": 2080
2506
+ },
2507
+ {
2508
+ "epoch": 32.23,
2509
+ "learning_rate": 0.0001,
2510
+ "loss": 0.0082,
2511
+ "step": 2085
2512
+ },
2513
+ {
2514
+ "epoch": 32.31,
2515
+ "learning_rate": 0.0001,
2516
+ "loss": 0.0084,
2517
+ "step": 2090
2518
+ },
2519
+ {
2520
+ "epoch": 32.39,
2521
+ "learning_rate": 0.0001,
2522
+ "loss": 0.0084,
2523
+ "step": 2095
2524
+ },
2525
+ {
2526
+ "epoch": 32.46,
2527
+ "learning_rate": 0.0001,
2528
+ "loss": 0.0088,
2529
+ "step": 2100
2530
+ },
2531
+ {
2532
+ "epoch": 32.54,
2533
+ "learning_rate": 0.0001,
2534
+ "loss": 0.0103,
2535
+ "step": 2105
2536
+ },
2537
+ {
2538
+ "epoch": 32.62,
2539
+ "learning_rate": 0.0001,
2540
+ "loss": 0.0087,
2541
+ "step": 2110
2542
+ },
2543
+ {
2544
+ "epoch": 32.7,
2545
+ "learning_rate": 0.0001,
2546
+ "loss": 0.0154,
2547
+ "step": 2115
2548
+ },
2549
+ {
2550
+ "epoch": 32.77,
2551
+ "learning_rate": 0.0001,
2552
+ "loss": 0.0093,
2553
+ "step": 2120
2554
+ },
2555
+ {
2556
+ "epoch": 32.85,
2557
+ "learning_rate": 0.0001,
2558
+ "loss": 0.009,
2559
+ "step": 2125
2560
+ },
2561
+ {
2562
+ "epoch": 32.93,
2563
+ "learning_rate": 0.0001,
2564
+ "loss": 0.0095,
2565
+ "step": 2130
2566
+ },
2567
+ {
2568
+ "epoch": 33.0,
2569
+ "learning_rate": 0.0001,
2570
+ "loss": 0.0093,
2571
+ "step": 2135
2572
+ },
2573
+ {
2574
+ "epoch": 33.08,
2575
+ "learning_rate": 0.0001,
2576
+ "loss": 0.0261,
2577
+ "step": 2140
2578
+ },
2579
+ {
2580
+ "epoch": 33.16,
2581
+ "learning_rate": 0.0001,
2582
+ "loss": 0.0095,
2583
+ "step": 2145
2584
+ },
2585
+ {
2586
+ "epoch": 33.24,
2587
+ "learning_rate": 0.0001,
2588
+ "loss": 0.0088,
2589
+ "step": 2150
2590
+ },
2591
+ {
2592
+ "epoch": 33.31,
2593
+ "learning_rate": 0.0001,
2594
+ "loss": 0.0091,
2595
+ "step": 2155
2596
+ },
2597
+ {
2598
+ "epoch": 33.39,
2599
+ "learning_rate": 0.0001,
2600
+ "loss": 0.0091,
2601
+ "step": 2160
2602
+ },
2603
+ {
2604
+ "epoch": 33.47,
2605
+ "learning_rate": 0.0001,
2606
+ "loss": 0.0092,
2607
+ "step": 2165
2608
+ },
2609
+ {
2610
+ "epoch": 33.55,
2611
+ "learning_rate": 0.0001,
2612
+ "loss": 0.0093,
2613
+ "step": 2170
2614
+ },
2615
+ {
2616
+ "epoch": 33.62,
2617
+ "learning_rate": 0.0001,
2618
+ "loss": 0.009,
2619
+ "step": 2175
2620
+ },
2621
+ {
2622
+ "epoch": 33.7,
2623
+ "learning_rate": 0.0001,
2624
+ "loss": 0.0093,
2625
+ "step": 2180
2626
+ },
2627
+ {
2628
+ "epoch": 33.78,
2629
+ "learning_rate": 0.0001,
2630
+ "loss": 0.0096,
2631
+ "step": 2185
2632
+ },
2633
+ {
2634
+ "epoch": 33.86,
2635
+ "learning_rate": 0.0001,
2636
+ "loss": 0.0089,
2637
+ "step": 2190
2638
+ },
2639
+ {
2640
+ "epoch": 33.93,
2641
+ "learning_rate": 0.0001,
2642
+ "loss": 0.0091,
2643
+ "step": 2195
2644
+ },
2645
+ {
2646
+ "epoch": 34.01,
2647
+ "learning_rate": 0.0001,
2648
+ "loss": 0.0098,
2649
+ "step": 2200
2650
+ },
2651
+ {
2652
+ "epoch": 34.09,
2653
+ "learning_rate": 0.0001,
2654
+ "loss": 0.0084,
2655
+ "step": 2205
2656
+ },
2657
+ {
2658
+ "epoch": 34.16,
2659
+ "learning_rate": 0.0001,
2660
+ "loss": 0.008,
2661
+ "step": 2210
2662
+ },
2663
+ {
2664
+ "epoch": 34.24,
2665
+ "learning_rate": 0.0001,
2666
+ "loss": 0.0085,
2667
+ "step": 2215
2668
+ },
2669
+ {
2670
+ "epoch": 34.32,
2671
+ "learning_rate": 0.0001,
2672
+ "loss": 0.0082,
2673
+ "step": 2220
2674
+ },
2675
+ {
2676
+ "epoch": 34.4,
2677
+ "learning_rate": 0.0001,
2678
+ "loss": 0.0077,
2679
+ "step": 2225
2680
+ },
2681
+ {
2682
+ "epoch": 34.47,
2683
+ "learning_rate": 0.0001,
2684
+ "loss": 0.0099,
2685
+ "step": 2230
2686
+ },
2687
+ {
2688
+ "epoch": 34.55,
2689
+ "learning_rate": 0.0001,
2690
+ "loss": 0.0192,
2691
+ "step": 2235
2692
+ },
2693
+ {
2694
+ "epoch": 34.63,
2695
+ "learning_rate": 0.0001,
2696
+ "loss": 0.0087,
2697
+ "step": 2240
2698
+ },
2699
+ {
2700
+ "epoch": 34.63,
2701
+ "step": 2240,
2702
+ "total_flos": 5.322709161266381e+17,
2703
+ "train_loss": 0.14071606248617172,
2704
+ "train_runtime": 19826.8664,
2705
+ "train_samples_per_second": 3.654,
2706
+ "train_steps_per_second": 0.113
2707
+ }
2708
+ ],
2709
+ "logging_steps": 5,
2710
+ "max_steps": 2240,
2711
+ "num_train_epochs": 35,
2712
+ "save_steps": -2240,
2713
+ "total_flos": 5.322709161266381e+17,
2714
+ "trial_name": null,
2715
+ "trial_params": null
2716
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:376b907197074e491952f3bad2a20f0b5d24eddaf217a39d8cd6f3b0b02b4eba
3
+ size 4027
vocab.json ADDED
The diff for this file is too large to render. See raw diff