Training in progress, step 337, checkpoint

Browse files

Files changed (13) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state_0.pth +3 -0
last-checkpoint/rng_state_1.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +42 -0
last-checkpoint/trainer_state.json +2408 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: NousResearch/CodeLlama-7b-hf-flash
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "NousResearch/CodeLlama-7b-hf-flash",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "down_proj",
+    "gate_proj",
+    "k_proj",
+    "up_proj",
+    "v_proj",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:846080621f8c2fad8047631e5deaf95cb5a406d93376a4e5703974a46c8534da
+size 159967880

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee74ff18cd85a6446b8aebaa3aa7fe72deeec28aa32f32d843b1dac8bb31c63d
+size 320194002

last-checkpoint/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2cbe1cf5d3e5ed7c6c03e84f40f13e5dca16eaaf082c0ed8b2bb400bf38cb009
+size 14512

last-checkpoint/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af37c374f3977bc5284dcdcccd9aadc82ecdcffd787f194b56904295429736fb
+size 14512

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:15dbdc8f8389ac30c57464ff4ee036eeb1fdb179e276b07965ff7588e23a83a5
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45ccb9c8b6b561889acea59191d66986d314e7cbd6a78abc6e49b139ca91c1e6
+size 500058

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2408 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.2502784998143335,
+  "eval_steps": 337,
+  "global_step": 337,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0007426661715558856,
+      "grad_norm": 1.4470734596252441,
+      "learning_rate": 2.0000000000000003e-06,
+      "loss": 1.3568,
+      "step": 1
+    },
+    {
+      "epoch": 0.0007426661715558856,
+      "eval_loss": 0.4649902582168579,
+      "eval_runtime": 189.2701,
+      "eval_samples_per_second": 5.991,
+      "eval_steps_per_second": 2.996,
+      "step": 1
+    },
+    {
+      "epoch": 0.0014853323431117712,
+      "grad_norm": 1.335021734237671,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 1.7148,
+      "step": 2
+    },
+    {
+      "epoch": 0.0022279985146676567,
+      "grad_norm": 2.0106494426727295,
+      "learning_rate": 6e-06,
+      "loss": 1.9387,
+      "step": 3
+    },
+    {
+      "epoch": 0.0029706646862235424,
+      "grad_norm": 2.0463790893554688,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 2.1194,
+      "step": 4
+    },
+    {
+      "epoch": 0.003713330857779428,
+      "grad_norm": 2.6946020126342773,
+      "learning_rate": 1e-05,
+      "loss": 2.2371,
+      "step": 5
+    },
+    {
+      "epoch": 0.004455997029335313,
+      "grad_norm": 2.0213823318481445,
+      "learning_rate": 1.2e-05,
+      "loss": 2.4407,
+      "step": 6
+    },
+    {
+      "epoch": 0.0051986632008912,
+      "grad_norm": 2.055884599685669,
+      "learning_rate": 1.4000000000000001e-05,
+      "loss": 2.5741,
+      "step": 7
+    },
+    {
+      "epoch": 0.005941329372447085,
+      "grad_norm": 2.562527656555176,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 2.7273,
+      "step": 8
+    },
+    {
+      "epoch": 0.006683995544002971,
+      "grad_norm": 2.70515775680542,
+      "learning_rate": 1.8e-05,
+      "loss": 2.9015,
+      "step": 9
+    },
+    {
+      "epoch": 0.007426661715558856,
+      "grad_norm": 2.6423349380493164,
+      "learning_rate": 2e-05,
+      "loss": 2.9069,
+      "step": 10
+    },
+    {
+      "epoch": 0.008169327887114742,
+      "grad_norm": 2.7331981658935547,
+      "learning_rate": 2.2000000000000003e-05,
+      "loss": 2.8755,
+      "step": 11
+    },
+    {
+      "epoch": 0.008911994058670627,
+      "grad_norm": 4.324583053588867,
+      "learning_rate": 2.4e-05,
+      "loss": 3.0139,
+      "step": 12
+    },
+    {
+      "epoch": 0.009654660230226514,
+      "grad_norm": 3.278507947921753,
+      "learning_rate": 2.6000000000000002e-05,
+      "loss": 3.1708,
+      "step": 13
+    },
+    {
+      "epoch": 0.0103973264017824,
+      "grad_norm": 3.9470252990722656,
+      "learning_rate": 2.8000000000000003e-05,
+      "loss": 3.2871,
+      "step": 14
+    },
+    {
+      "epoch": 0.011139992573338284,
+      "grad_norm": 4.76015567779541,
+      "learning_rate": 3e-05,
+      "loss": 2.9986,
+      "step": 15
+    },
+    {
+      "epoch": 0.01188265874489417,
+      "grad_norm": 6.300633907318115,
+      "learning_rate": 3.2000000000000005e-05,
+      "loss": 3.4458,
+      "step": 16
+    },
+    {
+      "epoch": 0.012625324916450055,
+      "grad_norm": 4.9842095375061035,
+      "learning_rate": 3.4000000000000007e-05,
+      "loss": 3.2977,
+      "step": 17
+    },
+    {
+      "epoch": 0.013367991088005942,
+      "grad_norm": 4.979720592498779,
+      "learning_rate": 3.6e-05,
+      "loss": 3.285,
+      "step": 18
+    },
+    {
+      "epoch": 0.014110657259561827,
+      "grad_norm": 5.599627494812012,
+      "learning_rate": 3.8e-05,
+      "loss": 3.0219,
+      "step": 19
+    },
+    {
+      "epoch": 0.014853323431117713,
+      "grad_norm": 5.2525153160095215,
+      "learning_rate": 4e-05,
+      "loss": 2.9258,
+      "step": 20
+    },
+    {
+      "epoch": 0.015595989602673598,
+      "grad_norm": 5.435408115386963,
+      "learning_rate": 4.2e-05,
+      "loss": 2.7997,
+      "step": 21
+    },
+    {
+      "epoch": 0.016338655774229483,
+      "grad_norm": 7.129375457763672,
+      "learning_rate": 4.4000000000000006e-05,
+      "loss": 2.8325,
+      "step": 22
+    },
+    {
+      "epoch": 0.01708132194578537,
+      "grad_norm": 6.04606819152832,
+      "learning_rate": 4.600000000000001e-05,
+      "loss": 2.4109,
+      "step": 23
+    },
+    {
+      "epoch": 0.017823988117341254,
+      "grad_norm": 7.30807638168335,
+      "learning_rate": 4.8e-05,
+      "loss": 2.5344,
+      "step": 24
+    },
+    {
+      "epoch": 0.01856665428889714,
+      "grad_norm": 6.1737775802612305,
+      "learning_rate": 5e-05,
+      "loss": 2.2762,
+      "step": 25
+    },
+    {
+      "epoch": 0.019309320460453028,
+      "grad_norm": 1.2856624126434326,
+      "learning_rate": 5.2000000000000004e-05,
+      "loss": 0.4641,
+      "step": 26
+    },
+    {
+      "epoch": 0.02005198663200891,
+      "grad_norm": 1.5805637836456299,
+      "learning_rate": 5.4000000000000005e-05,
+      "loss": 0.4788,
+      "step": 27
+    },
+    {
+      "epoch": 0.0207946528035648,
+      "grad_norm": 1.7802543640136719,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 0.4178,
+      "step": 28
+    },
+    {
+      "epoch": 0.021537318975120682,
+      "grad_norm": 1.6685826778411865,
+      "learning_rate": 5.8e-05,
+      "loss": 0.3613,
+      "step": 29
+    },
+    {
+      "epoch": 0.02227998514667657,
+      "grad_norm": 1.2964603900909424,
+      "learning_rate": 6e-05,
+      "loss": 0.2742,
+      "step": 30
+    },
+    {
+      "epoch": 0.023022651318232456,
+      "grad_norm": 1.162868618965149,
+      "learning_rate": 6.2e-05,
+      "loss": 0.1826,
+      "step": 31
+    },
+    {
+      "epoch": 0.02376531748978834,
+      "grad_norm": 1.0783743858337402,
+      "learning_rate": 6.400000000000001e-05,
+      "loss": 0.1383,
+      "step": 32
+    },
+    {
+      "epoch": 0.024507983661344226,
+      "grad_norm": 0.7907594442367554,
+      "learning_rate": 6.6e-05,
+      "loss": 0.0578,
+      "step": 33
+    },
+    {
+      "epoch": 0.02525064983290011,
+      "grad_norm": 0.45891255140304565,
+      "learning_rate": 6.800000000000001e-05,
+      "loss": 0.0416,
+      "step": 34
+    },
+    {
+      "epoch": 0.025993316004455997,
+      "grad_norm": 0.2138753980398178,
+      "learning_rate": 7e-05,
+      "loss": 0.0117,
+      "step": 35
+    },
+    {
+      "epoch": 0.026735982176011884,
+      "grad_norm": 0.09297079592943192,
+      "learning_rate": 7.2e-05,
+      "loss": 0.0046,
+      "step": 36
+    },
+    {
+      "epoch": 0.027478648347567768,
+      "grad_norm": 0.27470022439956665,
+      "learning_rate": 7.4e-05,
+      "loss": 0.0102,
+      "step": 37
+    },
+    {
+      "epoch": 0.028221314519123655,
+      "grad_norm": 0.05123934894800186,
+      "learning_rate": 7.6e-05,
+      "loss": 0.0022,
+      "step": 38
+    },
+    {
+      "epoch": 0.028963980690679538,
+      "grad_norm": 0.22415509819984436,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 0.0058,
+      "step": 39
+    },
+    {
+      "epoch": 0.029706646862235425,
+      "grad_norm": 1.0602823495864868,
+      "learning_rate": 8e-05,
+      "loss": 0.0308,
+      "step": 40
+    },
+    {
+      "epoch": 0.030449313033791312,
+      "grad_norm": 0.019605087116360664,
+      "learning_rate": 8.2e-05,
+      "loss": 0.0008,
+      "step": 41
+    },
+    {
+      "epoch": 0.031191979205347196,
+      "grad_norm": 0.020324615761637688,
+      "learning_rate": 8.4e-05,
+      "loss": 0.001,
+      "step": 42
+    },
+    {
+      "epoch": 0.03193464537690308,
+      "grad_norm": 0.0388132706284523,
+      "learning_rate": 8.6e-05,
+      "loss": 0.0009,
+      "step": 43
+    },
+    {
+      "epoch": 0.032677311548458966,
+      "grad_norm": 0.11963042616844177,
+      "learning_rate": 8.800000000000001e-05,
+      "loss": 0.0016,
+      "step": 44
+    },
+    {
+      "epoch": 0.03341997772001486,
+      "grad_norm": 0.04200240597128868,
+      "learning_rate": 9e-05,
+      "loss": 0.0007,
+      "step": 45
+    },
+    {
+      "epoch": 0.03416264389157074,
+      "grad_norm": 1.10586416721344,
+      "learning_rate": 9.200000000000001e-05,
+      "loss": 0.0202,
+      "step": 46
+    },
+    {
+      "epoch": 0.034905310063126624,
+      "grad_norm": 0.007396018132567406,
+      "learning_rate": 9.4e-05,
+      "loss": 0.0004,
+      "step": 47
+    },
+    {
+      "epoch": 0.03564797623468251,
+      "grad_norm": 0.012978550978004932,
+      "learning_rate": 9.6e-05,
+      "loss": 0.0005,
+      "step": 48
+    },
+    {
+      "epoch": 0.0363906424062384,
+      "grad_norm": 0.006396067328751087,
+      "learning_rate": 9.8e-05,
+      "loss": 0.0002,
+      "step": 49
+    },
+    {
+      "epoch": 0.03713330857779428,
+      "grad_norm": 0.024195007979869843,
+      "learning_rate": 0.0001,
+      "loss": 0.0007,
+      "step": 50
+    },
+    {
+      "epoch": 0.037875974749350165,
+      "grad_norm": 1.803991436958313,
+      "learning_rate": 9.999985309738107e-05,
+      "loss": 0.0337,
+      "step": 51
+    },
+    {
+      "epoch": 0.038618640920906055,
+      "grad_norm": 3.9776086807250977,
+      "learning_rate": 9.999941239038748e-05,
+      "loss": 0.0281,
+      "step": 52
+    },
+    {
+      "epoch": 0.03936130709246194,
+      "grad_norm": 0.039939988404512405,
+      "learning_rate": 9.999867788160888e-05,
+      "loss": 0.0005,
+      "step": 53
+    },
+    {
+      "epoch": 0.04010397326401782,
+      "grad_norm": 2.4416584968566895,
+      "learning_rate": 9.999764957536131e-05,
+      "loss": 0.0216,
+      "step": 54
+    },
+    {
+      "epoch": 0.04084663943557371,
+      "grad_norm": 0.005858874414116144,
+      "learning_rate": 9.999632747768722e-05,
+      "loss": 0.0003,
+      "step": 55
+    },
+    {
+      "epoch": 0.0415893056071296,
+      "grad_norm": 0.004351081792265177,
+      "learning_rate": 9.999471159635539e-05,
+      "loss": 0.0003,
+      "step": 56
+    },
+    {
+      "epoch": 0.04233197177868548,
+      "grad_norm": 0.37215837836265564,
+      "learning_rate": 9.999280194086089e-05,
+      "loss": 0.0037,
+      "step": 57
+    },
+    {
+      "epoch": 0.043074637950241364,
+      "grad_norm": 0.005829016678035259,
+      "learning_rate": 9.999059852242507e-05,
+      "loss": 0.0004,
+      "step": 58
+    },
+    {
+      "epoch": 0.043817304121797254,
+      "grad_norm": 0.11440680176019669,
+      "learning_rate": 9.998810135399546e-05,
+      "loss": 0.0011,
+      "step": 59
+    },
+    {
+      "epoch": 0.04455997029335314,
+      "grad_norm": 0.023830680176615715,
+      "learning_rate": 9.998531045024566e-05,
+      "loss": 0.0006,
+      "step": 60
+    },
+    {
+      "epoch": 0.04530263646490902,
+      "grad_norm": 5.839324474334717,
+      "learning_rate": 9.998222582757533e-05,
+      "loss": 0.0621,
+      "step": 61
+    },
+    {
+      "epoch": 0.04604530263646491,
+      "grad_norm": 0.17139233648777008,
+      "learning_rate": 9.997884750411005e-05,
+      "loss": 0.0021,
+      "step": 62
+    },
+    {
+      "epoch": 0.046787968808020795,
+      "grad_norm": 0.030187880620360374,
+      "learning_rate": 9.997517549970115e-05,
+      "loss": 0.0007,
+      "step": 63
+    },
+    {
+      "epoch": 0.04753063497957668,
+      "grad_norm": 0.015891535207629204,
+      "learning_rate": 9.997120983592574e-05,
+      "loss": 0.0004,
+      "step": 64
+    },
+    {
+      "epoch": 0.04827330115113257,
+      "grad_norm": 0.001739410450682044,
+      "learning_rate": 9.996695053608651e-05,
+      "loss": 0.0002,
+      "step": 65
+    },
+    {
+      "epoch": 0.04901596732268845,
+      "grad_norm": 0.010511704720556736,
+      "learning_rate": 9.996239762521151e-05,
+      "loss": 0.0003,
+      "step": 66
+    },
+    {
+      "epoch": 0.049758633494244336,
+      "grad_norm": 0.0017766759265214205,
+      "learning_rate": 9.995755113005414e-05,
+      "loss": 0.0002,
+      "step": 67
+    },
+    {
+      "epoch": 0.05050129966580022,
+      "grad_norm": 0.004469791427254677,
+      "learning_rate": 9.99524110790929e-05,
+      "loss": 0.0003,
+      "step": 68
+    },
+    {
+      "epoch": 0.05124396583735611,
+      "grad_norm": 0.0036557905841618776,
+      "learning_rate": 9.994697750253127e-05,
+      "loss": 0.0004,
+      "step": 69
+    },
+    {
+      "epoch": 0.051986632008911994,
+      "grad_norm": 0.0024416742380708456,
+      "learning_rate": 9.994125043229752e-05,
+      "loss": 0.0003,
+      "step": 70
+    },
+    {
+      "epoch": 0.05272929818046788,
+      "grad_norm": 0.015205912292003632,
+      "learning_rate": 9.993522990204453e-05,
+      "loss": 0.0004,
+      "step": 71
+    },
+    {
+      "epoch": 0.05347196435202377,
+      "grad_norm": 0.014475185424089432,
+      "learning_rate": 9.992891594714954e-05,
+      "loss": 0.0005,
+      "step": 72
+    },
+    {
+      "epoch": 0.05421463052357965,
+      "grad_norm": 0.007128004450351,
+      "learning_rate": 9.992230860471402e-05,
+      "loss": 0.0004,
+      "step": 73
+    },
+    {
+      "epoch": 0.054957296695135535,
+      "grad_norm": 0.014512602239847183,
+      "learning_rate": 9.991540791356342e-05,
+      "loss": 0.0004,
+      "step": 74
+    },
+    {
+      "epoch": 0.055699962866691426,
+      "grad_norm": 0.006873926613479853,
+      "learning_rate": 9.990821391424689e-05,
+      "loss": 0.0003,
+      "step": 75
+    },
+    {
+      "epoch": 0.05644262903824731,
+      "grad_norm": 0.4753952920436859,
+      "learning_rate": 9.990072664903717e-05,
+      "loss": 0.0152,
+      "step": 76
+    },
+    {
+      "epoch": 0.05718529520980319,
+      "grad_norm": 0.006181332748383284,
+      "learning_rate": 9.989294616193017e-05,
+      "loss": 0.0004,
+      "step": 77
+    },
+    {
+      "epoch": 0.057927961381359076,
+      "grad_norm": 0.005821248050779104,
+      "learning_rate": 9.988487249864489e-05,
+      "loss": 0.0004,
+      "step": 78
+    },
+    {
+      "epoch": 0.05867062755291497,
+      "grad_norm": 0.005324830766767263,
+      "learning_rate": 9.9876505706623e-05,
+      "loss": 0.0003,
+      "step": 79
+    },
+    {
+      "epoch": 0.05941329372447085,
+      "grad_norm": 0.0035147082526236773,
+      "learning_rate": 9.986784583502862e-05,
+      "loss": 0.0003,
+      "step": 80
+    },
+    {
+      "epoch": 0.060155959896026734,
+      "grad_norm": 0.011637026444077492,
+      "learning_rate": 9.98588929347481e-05,
+      "loss": 0.0006,
+      "step": 81
+    },
+    {
+      "epoch": 0.060898626067582624,
+      "grad_norm": 0.006236548535525799,
+      "learning_rate": 9.98496470583896e-05,
+      "loss": 0.0006,
+      "step": 82
+    },
+    {
+      "epoch": 0.06164129223913851,
+      "grad_norm": 0.006345892325043678,
+      "learning_rate": 9.984010826028288e-05,
+      "loss": 0.0003,
+      "step": 83
+    },
+    {
+      "epoch": 0.06238395841069439,
+      "grad_norm": 0.016584008932113647,
+      "learning_rate": 9.98302765964789e-05,
+      "loss": 0.0006,
+      "step": 84
+    },
+    {
+      "epoch": 0.06312662458225028,
+      "grad_norm": 0.0018910560756921768,
+      "learning_rate": 9.982015212474955e-05,
+      "loss": 0.0002,
+      "step": 85
+    },
+    {
+      "epoch": 0.06386929075380617,
+      "grad_norm": 0.0022837521973997355,
+      "learning_rate": 9.980973490458728e-05,
+      "loss": 0.0002,
+      "step": 86
+    },
+    {
+      "epoch": 0.06461195692536205,
+      "grad_norm": 1.3271163702011108,
+      "learning_rate": 9.979902499720477e-05,
+      "loss": 0.0071,
+      "step": 87
+    },
+    {
+      "epoch": 0.06535462309691793,
+      "grad_norm": 0.011062121950089931,
+      "learning_rate": 9.978802246553459e-05,
+      "loss": 0.0003,
+      "step": 88
+    },
+    {
+      "epoch": 0.06609728926847382,
+      "grad_norm": 0.007717825006693602,
+      "learning_rate": 9.97767273742287e-05,
+      "loss": 0.0004,
+      "step": 89
+    },
+    {
+      "epoch": 0.06683995544002971,
+      "grad_norm": 0.010918798856437206,
+      "learning_rate": 9.976513978965829e-05,
+      "loss": 0.0006,
+      "step": 90
+    },
+    {
+      "epoch": 0.0675826216115856,
+      "grad_norm": 0.006601768545806408,
+      "learning_rate": 9.975325977991322e-05,
+      "loss": 0.0003,
+      "step": 91
+    },
+    {
+      "epoch": 0.06832528778314148,
+      "grad_norm": 0.009449873119592667,
+      "learning_rate": 9.974108741480165e-05,
+      "loss": 0.0004,
+      "step": 92
+    },
+    {
+      "epoch": 0.06906795395469736,
+      "grad_norm": 0.015310431830585003,
+      "learning_rate": 9.97286227658497e-05,
+      "loss": 0.0004,
+      "step": 93
+    },
+    {
+      "epoch": 0.06981062012625325,
+      "grad_norm": 0.09934096038341522,
+      "learning_rate": 9.971586590630093e-05,
+      "loss": 0.0019,
+      "step": 94
+    },
+    {
+      "epoch": 0.07055328629780913,
+      "grad_norm": 0.1965462863445282,
+      "learning_rate": 9.970281691111598e-05,
+      "loss": 0.0012,
+      "step": 95
+    },
+    {
+      "epoch": 0.07129595246936501,
+      "grad_norm": 0.05416925996541977,
+      "learning_rate": 9.968947585697214e-05,
+      "loss": 0.0007,
+      "step": 96
+    },
+    {
+      "epoch": 0.07203861864092091,
+      "grad_norm": 0.012609624303877354,
+      "learning_rate": 9.967584282226281e-05,
+      "loss": 0.0005,
+      "step": 97
+    },
+    {
+      "epoch": 0.0727812848124768,
+      "grad_norm": 0.4876333177089691,
+      "learning_rate": 9.966191788709716e-05,
+      "loss": 0.0031,
+      "step": 98
+    },
+    {
+      "epoch": 0.07352395098403268,
+      "grad_norm": 0.01990601420402527,
+      "learning_rate": 9.964770113329953e-05,
+      "loss": 0.0009,
+      "step": 99
+    },
+    {
+      "epoch": 0.07426661715558856,
+      "grad_norm": 0.00828948151320219,
+      "learning_rate": 9.96331926444091e-05,
+      "loss": 0.0004,
+      "step": 100
+    },
+    {
+      "epoch": 0.07500928332714445,
+      "grad_norm": 0.005279685370624065,
+      "learning_rate": 9.961839250567924e-05,
+      "loss": 0.0004,
+      "step": 101
+    },
+    {
+      "epoch": 0.07575194949870033,
+      "grad_norm": 1.349811315536499,
+      "learning_rate": 9.960330080407711e-05,
+      "loss": 0.1017,
+      "step": 102
+    },
+    {
+      "epoch": 0.07649461567025621,
+      "grad_norm": 0.006720814388245344,
+      "learning_rate": 9.958791762828317e-05,
+      "loss": 0.0005,
+      "step": 103
+    },
+    {
+      "epoch": 0.07723728184181211,
+      "grad_norm": 0.026319369673728943,
+      "learning_rate": 9.957224306869053e-05,
+      "loss": 0.0007,
+      "step": 104
+    },
+    {
+      "epoch": 0.077979948013368,
+      "grad_norm": 0.013718683272600174,
+      "learning_rate": 9.955627721740454e-05,
+      "loss": 0.0007,
+      "step": 105
+    },
+    {
+      "epoch": 0.07872261418492388,
+      "grad_norm": 0.030150871723890305,
+      "learning_rate": 9.954002016824227e-05,
+      "loss": 0.0008,
+      "step": 106
+    },
+    {
+      "epoch": 0.07946528035647976,
+      "grad_norm": 0.03413335233926773,
+      "learning_rate": 9.95234720167318e-05,
+      "loss": 0.0009,
+      "step": 107
+    },
+    {
+      "epoch": 0.08020794652803565,
+      "grad_norm": 0.03414030373096466,
+      "learning_rate": 9.950663286011179e-05,
+      "loss": 0.0009,
+      "step": 108
+    },
+    {
+      "epoch": 0.08095061269959153,
+      "grad_norm": 0.7618962526321411,
+      "learning_rate": 9.948950279733093e-05,
+      "loss": 0.0031,
+      "step": 109
+    },
+    {
+      "epoch": 0.08169327887114743,
+      "grad_norm": 0.01711968518793583,
+      "learning_rate": 9.947208192904722e-05,
+      "loss": 0.0005,
+      "step": 110
+    },
+    {
+      "epoch": 0.08243594504270331,
+      "grad_norm": 0.014231199398636818,
+      "learning_rate": 9.945437035762754e-05,
+      "loss": 0.0005,
+      "step": 111
+    },
+    {
+      "epoch": 0.0831786112142592,
+      "grad_norm": 0.08236388117074966,
+      "learning_rate": 9.943636818714695e-05,
+      "loss": 0.0021,
+      "step": 112
+    },
+    {
+      "epoch": 0.08392127738581508,
+      "grad_norm": 0.008401082828640938,
+      "learning_rate": 9.941807552338804e-05,
+      "loss": 0.0004,
+      "step": 113
+    },
+    {
+      "epoch": 0.08466394355737096,
+      "grad_norm": 0.00980079360306263,
+      "learning_rate": 9.939949247384046e-05,
+      "loss": 0.0005,
+      "step": 114
+    },
+    {
+      "epoch": 0.08540660972892684,
+      "grad_norm": 0.01246787328273058,
+      "learning_rate": 9.938061914770012e-05,
+      "loss": 0.0006,
+      "step": 115
+    },
+    {
+      "epoch": 0.08614927590048273,
+      "grad_norm": 0.011044768616557121,
+      "learning_rate": 9.936145565586871e-05,
+      "loss": 0.0005,
+      "step": 116
+    },
+    {
+      "epoch": 0.08689194207203862,
+      "grad_norm": 0.0040915291756391525,
+      "learning_rate": 9.934200211095288e-05,
+      "loss": 0.0003,
+      "step": 117
+    },
+    {
+      "epoch": 0.08763460824359451,
+      "grad_norm": 0.004765264689922333,
+      "learning_rate": 9.93222586272637e-05,
+      "loss": 0.0004,
+      "step": 118
+    },
+    {
+      "epoch": 0.08837727441515039,
+      "grad_norm": 0.0034806388430297375,
+      "learning_rate": 9.930222532081597e-05,
+      "loss": 0.0003,
+      "step": 119
+    },
+    {
+      "epoch": 0.08911994058670628,
+      "grad_norm": 0.005507381167262793,
+      "learning_rate": 9.928190230932746e-05,
+      "loss": 0.0003,
+      "step": 120
+    },
+    {
+      "epoch": 0.08986260675826216,
+      "grad_norm": 0.42908811569213867,
+      "learning_rate": 9.926128971221835e-05,
+      "loss": 0.0026,
+      "step": 121
+    },
+    {
+      "epoch": 0.09060527292981804,
+      "grad_norm": 0.00801977701485157,
+      "learning_rate": 9.924038765061042e-05,
+      "loss": 0.0003,
+      "step": 122
+    },
+    {
+      "epoch": 0.09134793910137393,
+      "grad_norm": 0.007514494471251965,
+      "learning_rate": 9.921919624732635e-05,
+      "loss": 0.0004,
+      "step": 123
+    },
+    {
+      "epoch": 0.09209060527292982,
+      "grad_norm": 0.006028357893228531,
+      "learning_rate": 9.919771562688904e-05,
+      "loss": 0.0002,
+      "step": 124
+    },
+    {
+      "epoch": 0.09283327144448571,
+      "grad_norm": 0.009683290496468544,
+      "learning_rate": 9.917594591552089e-05,
+      "loss": 0.0005,
+      "step": 125
+    },
+    {
+      "epoch": 0.09357593761604159,
+      "grad_norm": 0.01857328973710537,
+      "learning_rate": 9.915388724114301e-05,
+      "loss": 0.0011,
+      "step": 126
+    },
+    {
+      "epoch": 0.09431860378759747,
+      "grad_norm": 0.0045487345196306705,
+      "learning_rate": 9.913153973337446e-05,
+      "loss": 0.0004,
+      "step": 127
+    },
+    {
+      "epoch": 0.09506126995915336,
+      "grad_norm": 0.019089125096797943,
+      "learning_rate": 9.910890352353153e-05,
+      "loss": 0.0009,
+      "step": 128
+    },
+    {
+      "epoch": 0.09580393613070924,
+      "grad_norm": 0.01162141002714634,
+      "learning_rate": 9.908597874462699e-05,
+      "loss": 0.0004,
+      "step": 129
+    },
+    {
+      "epoch": 0.09654660230226514,
+      "grad_norm": 0.0069759683683514595,
+      "learning_rate": 9.906276553136923e-05,
+      "loss": 0.0005,
+      "step": 130
+    },
+    {
+      "epoch": 0.09728926847382102,
+      "grad_norm": 1.388071894645691,
+      "learning_rate": 9.903926402016153e-05,
+      "loss": 0.0127,
+      "step": 131
+    },
+    {
+      "epoch": 0.0980319346453769,
+      "grad_norm": 0.006189362611621618,
+      "learning_rate": 9.901547434910122e-05,
+      "loss": 0.0004,
+      "step": 132
+    },
+    {
+      "epoch": 0.09877460081693279,
+      "grad_norm": 0.0009308147127740085,
+      "learning_rate": 9.899139665797887e-05,
+      "loss": 0.0001,
+      "step": 133
+    },
+    {
+      "epoch": 0.09951726698848867,
+      "grad_norm": 0.003263165010139346,
+      "learning_rate": 9.896703108827759e-05,
+      "loss": 0.0002,
+      "step": 134
+    },
+    {
+      "epoch": 0.10025993316004456,
+      "grad_norm": 0.0018793240888044238,
+      "learning_rate": 9.894237778317195e-05,
+      "loss": 0.0002,
+      "step": 135
+    },
+    {
+      "epoch": 0.10100259933160044,
+      "grad_norm": 0.2546728551387787,
+      "learning_rate": 9.891743688752738e-05,
+      "loss": 0.0021,
+      "step": 136
+    },
+    {
+      "epoch": 0.10174526550315634,
+      "grad_norm": 0.01736506260931492,
+      "learning_rate": 9.88922085478992e-05,
+      "loss": 0.0004,
+      "step": 137
+    },
+    {
+      "epoch": 0.10248793167471222,
+      "grad_norm": 0.003645472228527069,
+      "learning_rate": 9.88666929125318e-05,
+      "loss": 0.0003,
+      "step": 138
+    },
+    {
+      "epoch": 0.1032305978462681,
+      "grad_norm": 0.002554230624809861,
+      "learning_rate": 9.884089013135766e-05,
+      "loss": 0.0002,
+      "step": 139
+    },
+    {
+      "epoch": 0.10397326401782399,
+      "grad_norm": 0.003567540319636464,
+      "learning_rate": 9.881480035599667e-05,
+      "loss": 0.0002,
+      "step": 140
+    },
+    {
+      "epoch": 0.10471593018937987,
+      "grad_norm": 0.0011336462339386344,
+      "learning_rate": 9.87884237397551e-05,
+      "loss": 0.0002,
+      "step": 141
+    },
+    {
+      "epoch": 0.10545859636093576,
+      "grad_norm": 0.0014857546193525195,
+      "learning_rate": 9.876176043762467e-05,
+      "loss": 0.0001,
+      "step": 142
+    },
+    {
+      "epoch": 0.10620126253249164,
+      "grad_norm": 0.022715391591191292,
+      "learning_rate": 9.873481060628174e-05,
+      "loss": 0.0006,
+      "step": 143
+    },
+    {
+      "epoch": 0.10694392870404754,
+      "grad_norm": 1.0306154489517212,
+      "learning_rate": 9.870757440408638e-05,
+      "loss": 0.0073,
+      "step": 144
+    },
+    {
+      "epoch": 0.10768659487560342,
+      "grad_norm": 0.002907106187194586,
+      "learning_rate": 9.868005199108133e-05,
+      "loss": 0.0002,
+      "step": 145
+    },
+    {
+      "epoch": 0.1084292610471593,
+      "grad_norm": 0.0016110404394567013,
+      "learning_rate": 9.865224352899119e-05,
+      "loss": 0.0002,
+      "step": 146
+    },
+    {
+      "epoch": 0.10917192721871519,
+      "grad_norm": 0.0025813078973442316,
+      "learning_rate": 9.862414918122141e-05,
+      "loss": 0.0002,
+      "step": 147
+    },
+    {
+      "epoch": 0.10991459339027107,
+      "grad_norm": 0.0018323366530239582,
+      "learning_rate": 9.859576911285728e-05,
+      "loss": 0.0002,
+      "step": 148
+    },
+    {
+      "epoch": 0.11065725956182695,
+      "grad_norm": 0.7255179286003113,
+      "learning_rate": 9.856710349066307e-05,
+      "loss": 0.0081,
+      "step": 149
+    },
+    {
+      "epoch": 0.11139992573338285,
+      "grad_norm": 3.8413281440734863,
+      "learning_rate": 9.853815248308101e-05,
+      "loss": 0.2887,
+      "step": 150
+    },
+    {
+      "epoch": 0.11214259190493873,
+      "grad_norm": 0.01481606438755989,
+      "learning_rate": 9.850891626023022e-05,
+      "loss": 0.0009,
+      "step": 151
+    },
+    {
+      "epoch": 0.11288525807649462,
+      "grad_norm": 0.005497956182807684,
+      "learning_rate": 9.84793949939058e-05,
+      "loss": 0.0004,
+      "step": 152
+    },
+    {
+      "epoch": 0.1136279242480505,
+      "grad_norm": 0.05774915963411331,
+      "learning_rate": 9.844958885757784e-05,
+      "loss": 0.0008,
+      "step": 153
+    },
+    {
+      "epoch": 0.11437059041960639,
+      "grad_norm": 0.22279635071754456,
+      "learning_rate": 9.84194980263903e-05,
+      "loss": 0.0023,
+      "step": 154
+    },
+    {
+      "epoch": 0.11511325659116227,
+      "grad_norm": 0.0037247207947075367,
+      "learning_rate": 9.838912267716005e-05,
+      "loss": 0.0003,
+      "step": 155
+    },
+    {
+      "epoch": 0.11585592276271815,
+      "grad_norm": 0.0031422695610672235,
+      "learning_rate": 9.835846298837584e-05,
+      "loss": 0.0003,
+      "step": 156
+    },
+    {
+      "epoch": 0.11659858893427405,
+      "grad_norm": 0.004694859962910414,
+      "learning_rate": 9.83275191401972e-05,
+      "loss": 0.0003,
+      "step": 157
+    },
+    {
+      "epoch": 0.11734125510582993,
+      "grad_norm": 0.009304534643888474,
+      "learning_rate": 9.829629131445342e-05,
+      "loss": 0.0004,
+      "step": 158
+    },
+    {
+      "epoch": 0.11808392127738582,
+      "grad_norm": 0.0023734685964882374,
+      "learning_rate": 9.826477969464249e-05,
+      "loss": 0.0003,
+      "step": 159
+    },
+    {
+      "epoch": 0.1188265874489417,
+      "grad_norm": 0.004084714688360691,
+      "learning_rate": 9.823298446592998e-05,
+      "loss": 0.0003,
+      "step": 160
+    },
+    {
+      "epoch": 0.11956925362049758,
+      "grad_norm": 0.005245341453701258,
+      "learning_rate": 9.820090581514797e-05,
+      "loss": 0.0003,
+      "step": 161
+    },
+    {
+      "epoch": 0.12031191979205347,
+      "grad_norm": 0.00829974003136158,
+      "learning_rate": 9.816854393079403e-05,
+      "loss": 0.0004,
+      "step": 162
+    },
+    {
+      "epoch": 0.12105458596360935,
+      "grad_norm": 0.07217549532651901,
+      "learning_rate": 9.81358990030299e-05,
+      "loss": 0.001,
+      "step": 163
+    },
+    {
+      "epoch": 0.12179725213516525,
+      "grad_norm": 0.0019713877700269222,
+      "learning_rate": 9.810297122368067e-05,
+      "loss": 0.0002,
+      "step": 164
+    },
+    {
+      "epoch": 0.12253991830672113,
+      "grad_norm": 0.008508375845849514,
+      "learning_rate": 9.806976078623337e-05,
+      "loss": 0.0005,
+      "step": 165
+    },
+    {
+      "epoch": 0.12328258447827702,
+      "grad_norm": 0.0026160285342484713,
+      "learning_rate": 9.803626788583603e-05,
+      "loss": 0.0002,
+      "step": 166
+    },
+    {
+      "epoch": 0.1240252506498329,
+      "grad_norm": 0.004115263931453228,
+      "learning_rate": 9.800249271929645e-05,
+      "loss": 0.0003,
+      "step": 167
+    },
+    {
+      "epoch": 0.12476791682138878,
+      "grad_norm": 0.009765752591192722,
+      "learning_rate": 9.796843548508101e-05,
+      "loss": 0.0004,
+      "step": 168
+    },
+    {
+      "epoch": 0.12551058299294468,
+      "grad_norm": 0.032703742384910583,
+      "learning_rate": 9.793409638331363e-05,
+      "loss": 0.0009,
+      "step": 169
+    },
+    {
+      "epoch": 0.12625324916450056,
+      "grad_norm": 0.005391250364482403,
+      "learning_rate": 9.789947561577445e-05,
+      "loss": 0.0003,
+      "step": 170
+    },
+    {
+      "epoch": 0.12699591533605645,
+      "grad_norm": 0.2329409420490265,
+      "learning_rate": 9.786457338589872e-05,
+      "loss": 0.0022,
+      "step": 171
+    },
+    {
+      "epoch": 0.12773858150761233,
+      "grad_norm": 0.006565611343830824,
+      "learning_rate": 9.782938989877562e-05,
+      "loss": 0.0003,
+      "step": 172
+    },
+    {
+      "epoch": 0.12848124767916821,
+      "grad_norm": 0.00805259495973587,
+      "learning_rate": 9.779392536114698e-05,
+      "loss": 0.0005,
+      "step": 173
+    },
+    {
+      "epoch": 0.1292239138507241,
+      "grad_norm": 0.05728701129555702,
+      "learning_rate": 9.775817998140616e-05,
+      "loss": 0.0008,
+      "step": 174
+    },
+    {
+      "epoch": 0.12996658002227998,
+      "grad_norm": 0.030414534732699394,
+      "learning_rate": 9.772215396959674e-05,
+      "loss": 0.0006,
+      "step": 175
+    },
+    {
+      "epoch": 0.13070924619383587,
+      "grad_norm": 0.003174105193465948,
+      "learning_rate": 9.768584753741134e-05,
+      "loss": 0.0003,
+      "step": 176
+    },
+    {
+      "epoch": 0.13145191236539175,
+      "grad_norm": 0.005474573001265526,
+      "learning_rate": 9.764926089819038e-05,
+      "loss": 0.0005,
+      "step": 177
+    },
+    {
+      "epoch": 0.13219457853694763,
+      "grad_norm": 0.01835116557776928,
+      "learning_rate": 9.761239426692077e-05,
+      "loss": 0.0006,
+      "step": 178
+    },
+    {
+      "epoch": 0.13293724470850352,
+      "grad_norm": 0.07570798695087433,
+      "learning_rate": 9.757524786023468e-05,
+      "loss": 0.0025,
+      "step": 179
+    },
+    {
+      "epoch": 0.13367991088005943,
+      "grad_norm": 0.0024187075905501842,
+      "learning_rate": 9.753782189640834e-05,
+      "loss": 0.0002,
+      "step": 180
+    },
+    {
+      "epoch": 0.1344225770516153,
+      "grad_norm": 0.014368316158652306,
+      "learning_rate": 9.750011659536058e-05,
+      "loss": 0.0006,
+      "step": 181
+    },
+    {
+      "epoch": 0.1351652432231712,
+      "grad_norm": 0.013278312981128693,
+      "learning_rate": 9.74621321786517e-05,
+      "loss": 0.0005,
+      "step": 182
+    },
+    {
+      "epoch": 0.13590790939472708,
+      "grad_norm": 0.01532573252916336,
+      "learning_rate": 9.742386886948213e-05,
+      "loss": 0.0003,
+      "step": 183
+    },
+    {
+      "epoch": 0.13665057556628296,
+      "grad_norm": 0.0070452457293868065,
+      "learning_rate": 9.738532689269112e-05,
+      "loss": 0.0003,
+      "step": 184
+    },
+    {
+      "epoch": 0.13739324173783884,
+      "grad_norm": 0.5086126923561096,
+      "learning_rate": 9.73465064747553e-05,
+      "loss": 0.0046,
+      "step": 185
+    },
+    {
+      "epoch": 0.13813590790939473,
+      "grad_norm": 0.00677888048812747,
+      "learning_rate": 9.730740784378753e-05,
+      "loss": 0.0004,
+      "step": 186
+    },
+    {
+      "epoch": 0.1388785740809506,
+      "grad_norm": 0.014153935015201569,
+      "learning_rate": 9.726803122953547e-05,
+      "loss": 0.0005,
+      "step": 187
+    },
+    {
+      "epoch": 0.1396212402525065,
+      "grad_norm": 0.005835146643221378,
+      "learning_rate": 9.722837686338025e-05,
+      "loss": 0.0002,
+      "step": 188
+    },
+    {
+      "epoch": 0.14036390642406238,
+      "grad_norm": 0.004742791876196861,
+      "learning_rate": 9.718844497833504e-05,
+      "loss": 0.0003,
+      "step": 189
+    },
+    {
+      "epoch": 0.14110657259561826,
+      "grad_norm": 0.002392592839896679,
+      "learning_rate": 9.71482358090438e-05,
+      "loss": 0.0002,
+      "step": 190
+    },
+    {
+      "epoch": 0.14184923876717415,
+      "grad_norm": 0.009971565566956997,
+      "learning_rate": 9.710774959177983e-05,
+      "loss": 0.0004,
+      "step": 191
+    },
+    {
+      "epoch": 0.14259190493873003,
+      "grad_norm": 0.000985423568636179,
+      "learning_rate": 9.706698656444438e-05,
+      "loss": 0.0001,
+      "step": 192
+    },
+    {
+      "epoch": 0.1433345711102859,
+      "grad_norm": 0.002976580522954464,
+      "learning_rate": 9.702594696656524e-05,
+      "loss": 0.0003,
+      "step": 193
+    },
+    {
+      "epoch": 0.14407723728184182,
+      "grad_norm": 0.004122724290937185,
+      "learning_rate": 9.698463103929542e-05,
+      "loss": 0.0003,
+      "step": 194
+    },
+    {
+      "epoch": 0.1448199034533977,
+      "grad_norm": 0.004222078714519739,
+      "learning_rate": 9.694303902541163e-05,
+      "loss": 0.0002,
+      "step": 195
+    },
+    {
+      "epoch": 0.1455625696249536,
+      "grad_norm": 0.002287927782163024,
+      "learning_rate": 9.69011711693129e-05,
+      "loss": 0.0002,
+      "step": 196
+    },
+    {
+      "epoch": 0.14630523579650948,
+      "grad_norm": 0.001492491108365357,
+      "learning_rate": 9.685902771701913e-05,
+      "loss": 0.0002,
+      "step": 197
+    },
+    {
+      "epoch": 0.14704790196806536,
+      "grad_norm": 0.00485377898439765,
+      "learning_rate": 9.681660891616966e-05,
+      "loss": 0.0003,
+      "step": 198
+    },
+    {
+      "epoch": 0.14779056813962124,
+      "grad_norm": 0.014361650682985783,
+      "learning_rate": 9.677391501602182e-05,
+      "loss": 0.0005,
+      "step": 199
+    },
+    {
+      "epoch": 0.14853323431117713,
+      "grad_norm": 1.621951699256897,
+      "learning_rate": 9.673094626744942e-05,
+      "loss": 0.0662,
+      "step": 200
+    },
+    {
+      "epoch": 0.149275900482733,
+      "grad_norm": 0.0026435288600623608,
+      "learning_rate": 9.668770292294136e-05,
+      "loss": 0.0002,
+      "step": 201
+    },
+    {
+      "epoch": 0.1500185666542889,
+      "grad_norm": 0.0024962294846773148,
+      "learning_rate": 9.664418523660004e-05,
+      "loss": 0.0002,
+      "step": 202
+    },
+    {
+      "epoch": 0.15076123282584478,
+      "grad_norm": 0.002211767714470625,
+      "learning_rate": 9.660039346413994e-05,
+      "loss": 0.0002,
+      "step": 203
+    },
+    {
+      "epoch": 0.15150389899740066,
+      "grad_norm": 0.002533277263864875,
+      "learning_rate": 9.65563278628861e-05,
+      "loss": 0.0003,
+      "step": 204
+    },
+    {
+      "epoch": 0.15224656516895654,
+      "grad_norm": 0.006425884552299976,
+      "learning_rate": 9.651198869177263e-05,
+      "loss": 0.0004,
+      "step": 205
+    },
+    {
+      "epoch": 0.15298923134051243,
+      "grad_norm": 0.0048479656688869,
+      "learning_rate": 9.646737621134112e-05,
+      "loss": 0.0003,
+      "step": 206
+    },
+    {
+      "epoch": 0.15373189751206834,
+      "grad_norm": 0.004739306401461363,
+      "learning_rate": 9.642249068373921e-05,
+      "loss": 0.0004,
+      "step": 207
+    },
+    {
+      "epoch": 0.15447456368362422,
+      "grad_norm": 0.0035690851509571075,
+      "learning_rate": 9.637733237271894e-05,
+      "loss": 0.0002,
+      "step": 208
+    },
+    {
+      "epoch": 0.1552172298551801,
+      "grad_norm": 0.001971122808754444,
+      "learning_rate": 9.633190154363527e-05,
+      "loss": 0.0002,
+      "step": 209
+    },
+    {
+      "epoch": 0.155959896026736,
+      "grad_norm": 0.0017060886602848768,
+      "learning_rate": 9.628619846344454e-05,
+      "loss": 0.0002,
+      "step": 210
+    },
+    {
+      "epoch": 0.15670256219829187,
+      "grad_norm": 0.011329671368002892,
+      "learning_rate": 9.624022340070279e-05,
+      "loss": 0.0003,
+      "step": 211
+    },
+    {
+      "epoch": 0.15744522836984776,
+      "grad_norm": 0.002570141339674592,
+      "learning_rate": 9.619397662556435e-05,
+      "loss": 0.0002,
+      "step": 212
+    },
+    {
+      "epoch": 0.15818789454140364,
+      "grad_norm": 0.0023831925354897976,
+      "learning_rate": 9.614745840978008e-05,
+      "loss": 0.0002,
+      "step": 213
+    },
+    {
+      "epoch": 0.15893056071295952,
+      "grad_norm": 0.016878001391887665,
+      "learning_rate": 9.610066902669592e-05,
+      "loss": 0.0007,
+      "step": 214
+    },
+    {
+      "epoch": 0.1596732268845154,
+      "grad_norm": 0.008881067857146263,
+      "learning_rate": 9.605360875125117e-05,
+      "loss": 0.0002,
+      "step": 215
+    },
+    {
+      "epoch": 0.1604158930560713,
+      "grad_norm": 0.0010246317833662033,
+      "learning_rate": 9.600627785997696e-05,
+      "loss": 0.0001,
+      "step": 216
+    },
+    {
+      "epoch": 0.16115855922762717,
+      "grad_norm": 0.0019209448946639895,
+      "learning_rate": 9.595867663099453e-05,
+      "loss": 0.0002,
+      "step": 217
+    },
+    {
+      "epoch": 0.16190122539918306,
+      "grad_norm": 0.0022903543431311846,
+      "learning_rate": 9.591080534401371e-05,
+      "loss": 0.0002,
+      "step": 218
+    },
+    {
+      "epoch": 0.16264389157073894,
+      "grad_norm": 0.009357557632029057,
+      "learning_rate": 9.586266428033119e-05,
+      "loss": 0.0003,
+      "step": 219
+    },
+    {
+      "epoch": 0.16338655774229485,
+      "grad_norm": 0.003212034935131669,
+      "learning_rate": 9.581425372282891e-05,
+      "loss": 0.0002,
+      "step": 220
+    },
+    {
+      "epoch": 0.16412922391385074,
+      "grad_norm": 0.006715564522892237,
+      "learning_rate": 9.576557395597236e-05,
+      "loss": 0.0005,
+      "step": 221
+    },
+    {
+      "epoch": 0.16487189008540662,
+      "grad_norm": 0.009416691958904266,
+      "learning_rate": 9.571662526580898e-05,
+      "loss": 0.0007,
+      "step": 222
+    },
+    {
+      "epoch": 0.1656145562569625,
+      "grad_norm": 0.005933799315243959,
+      "learning_rate": 9.566740793996637e-05,
+      "loss": 0.0003,
+      "step": 223
+    },
+    {
+      "epoch": 0.1663572224285184,
+      "grad_norm": 0.0062180450186133385,
+      "learning_rate": 9.561792226765072e-05,
+      "loss": 0.0004,
+      "step": 224
+    },
+    {
+      "epoch": 0.16709988860007427,
+      "grad_norm": 0.007876376621425152,
+      "learning_rate": 9.5568168539645e-05,
+      "loss": 0.0004,
+      "step": 225
+    },
+    {
+      "epoch": 0.16784255477163015,
+      "grad_norm": 0.06271418929100037,
+      "learning_rate": 9.551814704830734e-05,
+      "loss": 0.0017,
+      "step": 226
+    },
+    {
+      "epoch": 0.16858522094318604,
+      "grad_norm": 0.002005163347348571,
+      "learning_rate": 9.546785808756926e-05,
+      "loss": 0.0002,
+      "step": 227
+    },
+    {
+      "epoch": 0.16932788711474192,
+      "grad_norm": 0.022918762639164925,
+      "learning_rate": 9.541730195293397e-05,
+      "loss": 0.0004,
+      "step": 228
+    },
+    {
+      "epoch": 0.1700705532862978,
+      "grad_norm": 0.018031733110547066,
+      "learning_rate": 9.53664789414746e-05,
+      "loss": 0.0004,
+      "step": 229
+    },
+    {
+      "epoch": 0.1708132194578537,
+      "grad_norm": 0.0034406071063131094,
+      "learning_rate": 9.53153893518325e-05,
+      "loss": 0.0002,
+      "step": 230
+    },
+    {
+      "epoch": 0.17155588562940957,
+      "grad_norm": 0.03680207580327988,
+      "learning_rate": 9.526403348421544e-05,
+      "loss": 0.0004,
+      "step": 231
+    },
+    {
+      "epoch": 0.17229855180096545,
+      "grad_norm": 0.001635130844078958,
+      "learning_rate": 9.521241164039589e-05,
+      "loss": 0.0002,
+      "step": 232
+    },
+    {
+      "epoch": 0.17304121797252134,
+      "grad_norm": 0.012099578976631165,
+      "learning_rate": 9.516052412370921e-05,
+      "loss": 0.0004,
+      "step": 233
+    },
+    {
+      "epoch": 0.17378388414407725,
+      "grad_norm": 0.001356737338937819,
+      "learning_rate": 9.51083712390519e-05,
+      "loss": 0.0001,
+      "step": 234
+    },
+    {
+      "epoch": 0.17452655031563313,
+      "grad_norm": 0.003389824880287051,
+      "learning_rate": 9.505595329287972e-05,
+      "loss": 0.0001,
+      "step": 235
+    },
+    {
+      "epoch": 0.17526921648718902,
+      "grad_norm": 0.004536811728030443,
+      "learning_rate": 9.500327059320606e-05,
+      "loss": 0.0003,
+      "step": 236
+    },
+    {
+      "epoch": 0.1760118826587449,
+      "grad_norm": 0.2518163323402405,
+      "learning_rate": 9.495032344959998e-05,
+      "loss": 0.0022,
+      "step": 237
+    },
+    {
+      "epoch": 0.17675454883030078,
+      "grad_norm": 0.0022397220600396395,
+      "learning_rate": 9.48971121731844e-05,
+      "loss": 0.0002,
+      "step": 238
+    },
+    {
+      "epoch": 0.17749721500185667,
+      "grad_norm": 0.0010416822042316198,
+      "learning_rate": 9.484363707663442e-05,
+      "loss": 0.0001,
+      "step": 239
+    },
+    {
+      "epoch": 0.17823988117341255,
+      "grad_norm": 0.0013076276518404484,
+      "learning_rate": 9.478989847417526e-05,
+      "loss": 0.0001,
+      "step": 240
+    },
+    {
+      "epoch": 0.17898254734496843,
+      "grad_norm": 0.0012382504064589739,
+      "learning_rate": 9.473589668158061e-05,
+      "loss": 0.0002,
+      "step": 241
+    },
+    {
+      "epoch": 0.17972521351652432,
+      "grad_norm": 0.0214553065598011,
+      "learning_rate": 9.468163201617062e-05,
+      "loss": 0.0006,
+      "step": 242
+    },
+    {
+      "epoch": 0.1804678796880802,
+      "grad_norm": 0.002527498174458742,
+      "learning_rate": 9.462710479681019e-05,
+      "loss": 0.0002,
+      "step": 243
+    },
+    {
+      "epoch": 0.18121054585963609,
+      "grad_norm": 0.011280801147222519,
+      "learning_rate": 9.457231534390694e-05,
+      "loss": 0.0003,
+      "step": 244
+    },
+    {
+      "epoch": 0.18195321203119197,
+      "grad_norm": 0.0018357493681833148,
+      "learning_rate": 9.451726397940945e-05,
+      "loss": 0.0002,
+      "step": 245
+    },
+    {
+      "epoch": 0.18269587820274785,
+      "grad_norm": 0.08694746345281601,
+      "learning_rate": 9.446195102680531e-05,
+      "loss": 0.0009,
+      "step": 246
+    },
+    {
+      "epoch": 0.18343854437430376,
+      "grad_norm": 0.04296145588159561,
+      "learning_rate": 9.440637681111922e-05,
+      "loss": 0.0006,
+      "step": 247
+    },
+    {
+      "epoch": 0.18418121054585965,
+      "grad_norm": 0.006709881592541933,
+      "learning_rate": 9.435054165891109e-05,
+      "loss": 0.0004,
+      "step": 248
+    },
+    {
+      "epoch": 0.18492387671741553,
+      "grad_norm": 0.003305165795609355,
+      "learning_rate": 9.429444589827412e-05,
+      "loss": 0.0002,
+      "step": 249
+    },
+    {
+      "epoch": 0.18566654288897141,
+      "grad_norm": 0.0106744933873415,
+      "learning_rate": 9.423808985883289e-05,
+      "loss": 0.0004,
+      "step": 250
+    },
+    {
+      "epoch": 0.1864092090605273,
+      "grad_norm": 0.2720184326171875,
+      "learning_rate": 9.418147387174139e-05,
+      "loss": 0.0074,
+      "step": 251
+    },
+    {
+      "epoch": 0.18715187523208318,
+      "grad_norm": 0.005303042940795422,
+      "learning_rate": 9.412459826968108e-05,
+      "loss": 0.0003,
+      "step": 252
+    },
+    {
+      "epoch": 0.18789454140363906,
+      "grad_norm": 0.0023732127156108618,
+      "learning_rate": 9.406746338685895e-05,
+      "loss": 0.0001,
+      "step": 253
+    },
+    {
+      "epoch": 0.18863720757519495,
+      "grad_norm": 0.006592648569494486,
+      "learning_rate": 9.401006955900556e-05,
+      "loss": 0.0002,
+      "step": 254
+    },
+    {
+      "epoch": 0.18937987374675083,
+      "grad_norm": 0.0008024009293876588,
+      "learning_rate": 9.395241712337307e-05,
+      "loss": 0.0001,
+      "step": 255
+    },
+    {
+      "epoch": 0.19012253991830672,
+      "grad_norm": 0.002137925708666444,
+      "learning_rate": 9.389450641873323e-05,
+      "loss": 0.0002,
+      "step": 256
+    },
+    {
+      "epoch": 0.1908652060898626,
+      "grad_norm": 0.005445448216050863,
+      "learning_rate": 9.38363377853754e-05,
+      "loss": 0.0002,
+      "step": 257
+    },
+    {
+      "epoch": 0.19160787226141848,
+      "grad_norm": 0.0010594564955681562,
+      "learning_rate": 9.377791156510455e-05,
+      "loss": 0.0001,
+      "step": 258
+    },
+    {
+      "epoch": 0.19235053843297437,
+      "grad_norm": 0.0036199286114424467,
+      "learning_rate": 9.371922810123929e-05,
+      "loss": 0.0002,
+      "step": 259
+    },
+    {
+      "epoch": 0.19309320460453028,
+      "grad_norm": 0.0007854366558603942,
+      "learning_rate": 9.36602877386098e-05,
+      "loss": 0.0001,
+      "step": 260
+    },
+    {
+      "epoch": 0.19383587077608616,
+      "grad_norm": 0.0012193581787869334,
+      "learning_rate": 9.360109082355582e-05,
+      "loss": 0.0001,
+      "step": 261
+    },
+    {
+      "epoch": 0.19457853694764204,
+      "grad_norm": 0.009622081182897091,
+      "learning_rate": 9.354163770392461e-05,
+      "loss": 0.0005,
+      "step": 262
+    },
+    {
+      "epoch": 0.19532120311919793,
+      "grad_norm": 0.0011145096505060792,
+      "learning_rate": 9.348192872906896e-05,
+      "loss": 0.0001,
+      "step": 263
+    },
+    {
+      "epoch": 0.1960638692907538,
+      "grad_norm": 0.004288391210138798,
+      "learning_rate": 9.342196424984504e-05,
+      "loss": 0.0001,
+      "step": 264
+    },
+    {
+      "epoch": 0.1968065354623097,
+      "grad_norm": 0.0054808189161121845,
+      "learning_rate": 9.33617446186104e-05,
+      "loss": 0.0003,
+      "step": 265
+    },
+    {
+      "epoch": 0.19754920163386558,
+      "grad_norm": 0.0028313531074672937,
+      "learning_rate": 9.330127018922194e-05,
+      "loss": 0.0002,
+      "step": 266
+    },
+    {
+      "epoch": 0.19829186780542146,
+      "grad_norm": 0.008807774633169174,
+      "learning_rate": 9.324054131703371e-05,
+      "loss": 0.0004,
+      "step": 267
+    },
+    {
+      "epoch": 0.19903453397697735,
+      "grad_norm": 0.008937092497944832,
+      "learning_rate": 9.317955835889494e-05,
+      "loss": 0.0002,
+      "step": 268
+    },
+    {
+      "epoch": 0.19977720014853323,
+      "grad_norm": 0.001885921461507678,
+      "learning_rate": 9.311832167314787e-05,
+      "loss": 0.0002,
+      "step": 269
+    },
+    {
+      "epoch": 0.2005198663200891,
+      "grad_norm": 0.0014201418962329626,
+      "learning_rate": 9.305683161962569e-05,
+      "loss": 0.0001,
+      "step": 270
+    },
+    {
+      "epoch": 0.201262532491645,
+      "grad_norm": 0.013191280886530876,
+      "learning_rate": 9.299508855965039e-05,
+      "loss": 0.0003,
+      "step": 271
+    },
+    {
+      "epoch": 0.20200519866320088,
+      "grad_norm": 1.5947636365890503,
+      "learning_rate": 9.293309285603067e-05,
+      "loss": 0.153,
+      "step": 272
+    },
+    {
+      "epoch": 0.20274786483475676,
+      "grad_norm": 0.005475195590406656,
+      "learning_rate": 9.287084487305975e-05,
+      "loss": 0.0002,
+      "step": 273
+    },
+    {
+      "epoch": 0.20349053100631267,
+      "grad_norm": 0.0020285584032535553,
+      "learning_rate": 9.280834497651334e-05,
+      "loss": 0.0002,
+      "step": 274
+    },
+    {
+      "epoch": 0.20423319717786856,
+      "grad_norm": 0.0018257065676152706,
+      "learning_rate": 9.274559353364734e-05,
+      "loss": 0.0001,
+      "step": 275
+    },
+    {
+      "epoch": 0.20497586334942444,
+      "grad_norm": 0.01402269210666418,
+      "learning_rate": 9.268259091319582e-05,
+      "loss": 0.0006,
+      "step": 276
+    },
+    {
+      "epoch": 0.20571852952098033,
+      "grad_norm": 0.05442607030272484,
+      "learning_rate": 9.261933748536878e-05,
+      "loss": 0.0006,
+      "step": 277
+    },
+    {
+      "epoch": 0.2064611956925362,
+      "grad_norm": 0.002209064783528447,
+      "learning_rate": 9.255583362184999e-05,
+      "loss": 0.0002,
+      "step": 278
+    },
+    {
+      "epoch": 0.2072038618640921,
+      "grad_norm": 0.019024129956960678,
+      "learning_rate": 9.24920796957948e-05,
+      "loss": 0.0003,
+      "step": 279
+    },
+    {
+      "epoch": 0.20794652803564798,
+      "grad_norm": 0.008843375369906425,
+      "learning_rate": 9.242807608182795e-05,
+      "loss": 0.0004,
+      "step": 280
+    },
+    {
+      "epoch": 0.20868919420720386,
+      "grad_norm": 0.11764897406101227,
+      "learning_rate": 9.23638231560414e-05,
+      "loss": 0.0015,
+      "step": 281
+    },
+    {
+      "epoch": 0.20943186037875974,
+      "grad_norm": 0.004572493955492973,
+      "learning_rate": 9.229932129599205e-05,
+      "loss": 0.0002,
+      "step": 282
+    },
+    {
+      "epoch": 0.21017452655031563,
+      "grad_norm": 0.006193062756210566,
+      "learning_rate": 9.223457088069962e-05,
+      "loss": 0.0003,
+      "step": 283
+    },
+    {
+      "epoch": 0.2109171927218715,
+      "grad_norm": 0.028236044570803642,
+      "learning_rate": 9.21695722906443e-05,
+      "loss": 0.0006,
+      "step": 284
+    },
+    {
+      "epoch": 0.2116598588934274,
+      "grad_norm": 0.012708684429526329,
+      "learning_rate": 9.210432590776461e-05,
+      "loss": 0.0008,
+      "step": 285
+    },
+    {
+      "epoch": 0.21240252506498328,
+      "grad_norm": 0.02218322828412056,
+      "learning_rate": 9.203883211545517e-05,
+      "loss": 0.0008,
+      "step": 286
+    },
+    {
+      "epoch": 0.2131451912365392,
+      "grad_norm": 0.015315636061131954,
+      "learning_rate": 9.197309129856433e-05,
+      "loss": 0.0005,
+      "step": 287
+    },
+    {
+      "epoch": 0.21388785740809507,
+      "grad_norm": 0.011535655707120895,
+      "learning_rate": 9.190710384339203e-05,
+      "loss": 0.0005,
+      "step": 288
+    },
+    {
+      "epoch": 0.21463052357965096,
+      "grad_norm": 0.013051263056695461,
+      "learning_rate": 9.184087013768745e-05,
+      "loss": 0.0003,
+      "step": 289
+    },
+    {
+      "epoch": 0.21537318975120684,
+      "grad_norm": 0.003687142627313733,
+      "learning_rate": 9.177439057064683e-05,
+      "loss": 0.0003,
+      "step": 290
+    },
+    {
+      "epoch": 0.21611585592276272,
+      "grad_norm": 0.007237662561237812,
+      "learning_rate": 9.170766553291103e-05,
+      "loss": 0.0003,
+      "step": 291
+    },
+    {
+      "epoch": 0.2168585220943186,
+      "grad_norm": 0.007928196340799332,
+      "learning_rate": 9.164069541656337e-05,
+      "loss": 0.0004,
+      "step": 292
+    },
+    {
+      "epoch": 0.2176011882658745,
+      "grad_norm": 0.04276131093502045,
+      "learning_rate": 9.157348061512727e-05,
+      "loss": 0.0005,
+      "step": 293
+    },
+    {
+      "epoch": 0.21834385443743037,
+      "grad_norm": 0.029039736837148666,
+      "learning_rate": 9.150602152356395e-05,
+      "loss": 0.0006,
+      "step": 294
+    },
+    {
+      "epoch": 0.21908652060898626,
+      "grad_norm": 0.08352446556091309,
+      "learning_rate": 9.143831853827009e-05,
+      "loss": 0.0015,
+      "step": 295
+    },
+    {
+      "epoch": 0.21982918678054214,
+      "grad_norm": 0.12498286366462708,
+      "learning_rate": 9.137037205707552e-05,
+      "loss": 0.0008,
+      "step": 296
+    },
+    {
+      "epoch": 0.22057185295209802,
+      "grad_norm": 0.012141775339841843,
+      "learning_rate": 9.130218247924092e-05,
+      "loss": 0.0005,
+      "step": 297
+    },
+    {
+      "epoch": 0.2213145191236539,
+      "grad_norm": 0.007531680166721344,
+      "learning_rate": 9.123375020545535e-05,
+      "loss": 0.0003,
+      "step": 298
+    },
+    {
+      "epoch": 0.2220571852952098,
+      "grad_norm": 0.009885065257549286,
+      "learning_rate": 9.116507563783403e-05,
+      "loss": 0.0004,
+      "step": 299
+    },
+    {
+      "epoch": 0.2227998514667657,
+      "grad_norm": 0.26482439041137695,
+      "learning_rate": 9.109615917991591e-05,
+      "loss": 0.0014,
+      "step": 300
+    },
+    {
+      "epoch": 0.2235425176383216,
+      "grad_norm": 0.00270358519628644,
+      "learning_rate": 9.102700123666132e-05,
+      "loss": 0.0002,
+      "step": 301
+    },
+    {
+      "epoch": 0.22428518380987747,
+      "grad_norm": 1.3726491928100586,
+      "learning_rate": 9.09576022144496e-05,
+      "loss": 0.0146,
+      "step": 302
+    },
+    {
+      "epoch": 0.22502784998143335,
+      "grad_norm": 0.2558269500732422,
+      "learning_rate": 9.088796252107665e-05,
+      "loss": 0.0018,
+      "step": 303
+    },
+    {
+      "epoch": 0.22577051615298924,
+      "grad_norm": 0.7685271501541138,
+      "learning_rate": 9.08180825657526e-05,
+      "loss": 0.0064,
+      "step": 304
+    },
+    {
+      "epoch": 0.22651318232454512,
+      "grad_norm": 0.010196542367339134,
+      "learning_rate": 9.07479627590994e-05,
+      "loss": 0.0004,
+      "step": 305
+    },
+    {
+      "epoch": 0.227255848496101,
+      "grad_norm": 0.0029280134476721287,
+      "learning_rate": 9.067760351314838e-05,
+      "loss": 0.0002,
+      "step": 306
+    },
+    {
+      "epoch": 0.2279985146676569,
+      "grad_norm": 0.0070107802748680115,
+      "learning_rate": 9.060700524133785e-05,
+      "loss": 0.0003,
+      "step": 307
+    },
+    {
+      "epoch": 0.22874118083921277,
+      "grad_norm": 0.014984749257564545,
+      "learning_rate": 9.053616835851062e-05,
+      "loss": 0.0003,
+      "step": 308
+    },
+    {
+      "epoch": 0.22948384701076865,
+      "grad_norm": 0.0027229287661612034,
+      "learning_rate": 9.046509328091166e-05,
+      "loss": 0.0002,
+      "step": 309
+    },
+    {
+      "epoch": 0.23022651318232454,
+      "grad_norm": 0.006428726948797703,
+      "learning_rate": 9.039378042618556e-05,
+      "loss": 0.0003,
+      "step": 310
+    },
+    {
+      "epoch": 0.23096917935388042,
+      "grad_norm": 0.007419643457978964,
+      "learning_rate": 9.032223021337414e-05,
+      "loss": 0.0004,
+      "step": 311
+    },
+    {
+      "epoch": 0.2317118455254363,
+      "grad_norm": 0.0074731167405843735,
+      "learning_rate": 9.025044306291392e-05,
+      "loss": 0.0002,
+      "step": 312
+    },
+    {
+      "epoch": 0.2324545116969922,
+      "grad_norm": 0.0038742341566830873,
+      "learning_rate": 9.017841939663374e-05,
+      "loss": 0.0002,
+      "step": 313
+    },
+    {
+      "epoch": 0.2331971778685481,
+      "grad_norm": 0.010770438238978386,
+      "learning_rate": 9.01061596377522e-05,
+      "loss": 0.0004,
+      "step": 314
+    },
+    {
+      "epoch": 0.23393984404010398,
+      "grad_norm": 0.009036507457494736,
+      "learning_rate": 9.003366421087521e-05,
+      "loss": 0.0003,
+      "step": 315
+    },
+    {
+      "epoch": 0.23468251021165987,
+      "grad_norm": 0.008864130824804306,
+      "learning_rate": 8.996093354199349e-05,
+      "loss": 0.0003,
+      "step": 316
+    },
+    {
+      "epoch": 0.23542517638321575,
+      "grad_norm": 0.00652578379958868,
+      "learning_rate": 8.988796805848007e-05,
+      "loss": 0.0003,
+      "step": 317
+    },
+    {
+      "epoch": 0.23616784255477163,
+      "grad_norm": 0.006959845311939716,
+      "learning_rate": 8.981476818908778e-05,
+      "loss": 0.0002,
+      "step": 318
+    },
+    {
+      "epoch": 0.23691050872632752,
+      "grad_norm": 0.0059072221629321575,
+      "learning_rate": 8.974133436394673e-05,
+      "loss": 0.0003,
+      "step": 319
+    },
+    {
+      "epoch": 0.2376531748978834,
+      "grad_norm": 0.005236865486949682,
+      "learning_rate": 8.966766701456177e-05,
+      "loss": 0.0003,
+      "step": 320
+    },
+    {
+      "epoch": 0.23839584106943928,
+      "grad_norm": 0.01116922963410616,
+      "learning_rate": 8.959376657380993e-05,
+      "loss": 0.0003,
+      "step": 321
+    },
+    {
+      "epoch": 0.23913850724099517,
+      "grad_norm": 0.02754572220146656,
+      "learning_rate": 8.951963347593797e-05,
+      "loss": 0.0011,
+      "step": 322
+    },
+    {
+      "epoch": 0.23988117341255105,
+      "grad_norm": 0.05620993301272392,
+      "learning_rate": 8.944526815655974e-05,
+      "loss": 0.0008,
+      "step": 323
+    },
+    {
+      "epoch": 0.24062383958410694,
+      "grad_norm": 0.013477266766130924,
+      "learning_rate": 8.937067105265362e-05,
+      "loss": 0.0005,
+      "step": 324
+    },
+    {
+      "epoch": 0.24136650575566282,
+      "grad_norm": 0.008718527853488922,
+      "learning_rate": 8.929584260256004e-05,
+      "loss": 0.0002,
+      "step": 325
+    },
+    {
+      "epoch": 0.2421091719272187,
+      "grad_norm": 0.0029952102340757847,
+      "learning_rate": 8.922078324597879e-05,
+      "loss": 0.0002,
+      "step": 326
+    },
+    {
+      "epoch": 0.24285183809877461,
+      "grad_norm": 0.19847828149795532,
+      "learning_rate": 8.914549342396652e-05,
+      "loss": 0.0012,
+      "step": 327
+    },
+    {
+      "epoch": 0.2435945042703305,
+      "grad_norm": 0.0039433506317436695,
+      "learning_rate": 8.906997357893412e-05,
+      "loss": 0.0002,
+      "step": 328
+    },
+    {
+      "epoch": 0.24433717044188638,
+      "grad_norm": 0.0013001116458326578,
+      "learning_rate": 8.899422415464409e-05,
+      "loss": 0.0001,
+      "step": 329
+    },
+    {
+      "epoch": 0.24507983661344226,
+      "grad_norm": 0.029752757400274277,
+      "learning_rate": 8.891824559620801e-05,
+      "loss": 0.0008,
+      "step": 330
+    },
+    {
+      "epoch": 0.24582250278499815,
+      "grad_norm": 0.009903517551720142,
+      "learning_rate": 8.884203835008382e-05,
+      "loss": 0.0003,
+      "step": 331
+    },
+    {
+      "epoch": 0.24656516895655403,
+      "grad_norm": 0.0017488920129835606,
+      "learning_rate": 8.87656028640733e-05,
+      "loss": 0.0002,
+      "step": 332
+    },
+    {
+      "epoch": 0.24730783512810992,
+      "grad_norm": 0.001140685984864831,
+      "learning_rate": 8.868893958731937e-05,
+      "loss": 0.0001,
+      "step": 333
+    },
+    {
+      "epoch": 0.2480505012996658,
+      "grad_norm": 0.001769506954587996,
+      "learning_rate": 8.861204897030346e-05,
+      "loss": 0.0002,
+      "step": 334
+    },
+    {
+      "epoch": 0.24879316747122168,
+      "grad_norm": 0.034780845046043396,
+      "learning_rate": 8.853493146484291e-05,
+      "loss": 0.0002,
+      "step": 335
+    },
+    {
+      "epoch": 0.24953583364277757,
+      "grad_norm": 0.0031120367348194122,
+      "learning_rate": 8.845758752408826e-05,
+      "loss": 0.0002,
+      "step": 336
+    },
+    {
+      "epoch": 0.2502784998143335,
+      "grad_norm": 0.0008751750574447215,
+      "learning_rate": 8.838001760252059e-05,
+      "loss": 0.0001,
+      "step": 337
+    },
+    {
+      "epoch": 0.2502784998143335,
+      "eval_loss": 8.811052975943312e-05,
+      "eval_runtime": 190.5265,
+      "eval_samples_per_second": 5.952,
+      "eval_steps_per_second": 2.976,
+      "step": 337
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1346,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 337,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.4043517029462835e+17,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:71eb0b172a5713ebbd9313ea9f77238634c16cba8b27d67301bc94ddfb78d5de
+size 6776