Training in progress, step 201, checkpoint

Browse files

Files changed (13) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state_0.pth +3 -0
last-checkpoint/rng_state_1.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +43 -0
last-checkpoint/trainer_state.json +1456 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/tinyllama-chat
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/tinyllama-chat",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "down_proj",
+    "q_proj",
+    "up_proj",
+    "gate_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b3f9e0c9ddf0c19d21538b0dcaa85caed1162f13837ecdc8df6fe456c7832492
+size 50503544

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dba8b46da36cd1da4c37b78c55be581df0ee095e65cbb8239f2a23c0a9f23b18
+size 101184122

last-checkpoint/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:57b3a1ec45c4bd296b690a371d320cbd885ee364d61c107e3196412b8da59811
+size 14512

last-checkpoint/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:59ee39a969e1aca1d8ab40e93cb569005c329a60cfd3c6febc1ccdf49ae60d91
+size 14512

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ad26784d3cc7b071c58c9c288ad8b72a7313a78575d8f75e9f52060da4b738e
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": false,
+  "model_max_length": 2048,
+  "pad_token": "<unk>",
+  "padding_side": "left",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1456 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.2507015902712816,
+  "eval_steps": 201,
+  "global_step": 201,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0012472715933894605,
+      "grad_norm": NaN,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 0.0,
+      "step": 1
+    },
+    {
+      "epoch": 0.0012472715933894605,
+      "eval_loss": NaN,
+      "eval_runtime": 60.7649,
+      "eval_samples_per_second": 22.233,
+      "eval_steps_per_second": 5.562,
+      "step": 1
+    },
+    {
+      "epoch": 0.002494543186778921,
+      "grad_norm": NaN,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 0.0,
+      "step": 2
+    },
+    {
+      "epoch": 0.0037418147801683817,
+      "grad_norm": NaN,
+      "learning_rate": 1.2e-05,
+      "loss": 0.0,
+      "step": 3
+    },
+    {
+      "epoch": 0.004989086373557842,
+      "grad_norm": NaN,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 0.0,
+      "step": 4
+    },
+    {
+      "epoch": 0.006236357966947303,
+      "grad_norm": NaN,
+      "learning_rate": 2e-05,
+      "loss": 0.0,
+      "step": 5
+    },
+    {
+      "epoch": 0.007483629560336763,
+      "grad_norm": NaN,
+      "learning_rate": 2.4e-05,
+      "loss": 0.0,
+      "step": 6
+    },
+    {
+      "epoch": 0.008730901153726224,
+      "grad_norm": NaN,
+      "learning_rate": 2.8000000000000003e-05,
+      "loss": 0.0,
+      "step": 7
+    },
+    {
+      "epoch": 0.009978172747115684,
+      "grad_norm": NaN,
+      "learning_rate": 3.2000000000000005e-05,
+      "loss": 0.0,
+      "step": 8
+    },
+    {
+      "epoch": 0.011225444340505144,
+      "grad_norm": NaN,
+      "learning_rate": 3.6e-05,
+      "loss": 0.0,
+      "step": 9
+    },
+    {
+      "epoch": 0.012472715933894606,
+      "grad_norm": NaN,
+      "learning_rate": 4e-05,
+      "loss": 0.0,
+      "step": 10
+    },
+    {
+      "epoch": 0.013719987527284067,
+      "grad_norm": NaN,
+      "learning_rate": 4.4000000000000006e-05,
+      "loss": 0.0,
+      "step": 11
+    },
+    {
+      "epoch": 0.014967259120673527,
+      "grad_norm": NaN,
+      "learning_rate": 4.8e-05,
+      "loss": 0.0,
+      "step": 12
+    },
+    {
+      "epoch": 0.016214530714062987,
+      "grad_norm": NaN,
+      "learning_rate": 5.2000000000000004e-05,
+      "loss": 0.0,
+      "step": 13
+    },
+    {
+      "epoch": 0.017461802307452447,
+      "grad_norm": NaN,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 0.0,
+      "step": 14
+    },
+    {
+      "epoch": 0.018709073900841908,
+      "grad_norm": NaN,
+      "learning_rate": 6e-05,
+      "loss": 0.0,
+      "step": 15
+    },
+    {
+      "epoch": 0.019956345494231368,
+      "grad_norm": NaN,
+      "learning_rate": 6.400000000000001e-05,
+      "loss": 0.0,
+      "step": 16
+    },
+    {
+      "epoch": 0.021203617087620828,
+      "grad_norm": NaN,
+      "learning_rate": 6.800000000000001e-05,
+      "loss": 0.0,
+      "step": 17
+    },
+    {
+      "epoch": 0.02245088868101029,
+      "grad_norm": NaN,
+      "learning_rate": 7.2e-05,
+      "loss": 0.0,
+      "step": 18
+    },
+    {
+      "epoch": 0.02369816027439975,
+      "grad_norm": NaN,
+      "learning_rate": 7.6e-05,
+      "loss": 0.0,
+      "step": 19
+    },
+    {
+      "epoch": 0.024945431867789213,
+      "grad_norm": NaN,
+      "learning_rate": 8e-05,
+      "loss": 0.0,
+      "step": 20
+    },
+    {
+      "epoch": 0.026192703461178673,
+      "grad_norm": NaN,
+      "learning_rate": 8.4e-05,
+      "loss": 0.0,
+      "step": 21
+    },
+    {
+      "epoch": 0.027439975054568133,
+      "grad_norm": NaN,
+      "learning_rate": 8.800000000000001e-05,
+      "loss": 0.0,
+      "step": 22
+    },
+    {
+      "epoch": 0.028687246647957593,
+      "grad_norm": NaN,
+      "learning_rate": 9.200000000000001e-05,
+      "loss": 0.0,
+      "step": 23
+    },
+    {
+      "epoch": 0.029934518241347054,
+      "grad_norm": NaN,
+      "learning_rate": 9.6e-05,
+      "loss": 0.0,
+      "step": 24
+    },
+    {
+      "epoch": 0.031181789834736514,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001,
+      "loss": 0.0,
+      "step": 25
+    },
+    {
+      "epoch": 0.032429061428125974,
+      "grad_norm": NaN,
+      "learning_rate": 0.00010400000000000001,
+      "loss": 0.0,
+      "step": 26
+    },
+    {
+      "epoch": 0.03367633302151544,
+      "grad_norm": NaN,
+      "learning_rate": 0.00010800000000000001,
+      "loss": 0.0,
+      "step": 27
+    },
+    {
+      "epoch": 0.034923604614904895,
+      "grad_norm": NaN,
+      "learning_rate": 0.00011200000000000001,
+      "loss": 0.0,
+      "step": 28
+    },
+    {
+      "epoch": 0.03617087620829436,
+      "grad_norm": NaN,
+      "learning_rate": 0.000116,
+      "loss": 0.0,
+      "step": 29
+    },
+    {
+      "epoch": 0.037418147801683815,
+      "grad_norm": NaN,
+      "learning_rate": 0.00012,
+      "loss": 0.0,
+      "step": 30
+    },
+    {
+      "epoch": 0.03866541939507328,
+      "grad_norm": NaN,
+      "learning_rate": 0.000124,
+      "loss": 0.0,
+      "step": 31
+    },
+    {
+      "epoch": 0.039912690988462736,
+      "grad_norm": NaN,
+      "learning_rate": 0.00012800000000000002,
+      "loss": 0.0,
+      "step": 32
+    },
+    {
+      "epoch": 0.0411599625818522,
+      "grad_norm": NaN,
+      "learning_rate": 0.000132,
+      "loss": 0.0,
+      "step": 33
+    },
+    {
+      "epoch": 0.042407234175241657,
+      "grad_norm": NaN,
+      "learning_rate": 0.00013600000000000003,
+      "loss": 0.0,
+      "step": 34
+    },
+    {
+      "epoch": 0.04365450576863112,
+      "grad_norm": NaN,
+      "learning_rate": 0.00014,
+      "loss": 0.0,
+      "step": 35
+    },
+    {
+      "epoch": 0.04490177736202058,
+      "grad_norm": NaN,
+      "learning_rate": 0.000144,
+      "loss": 0.0,
+      "step": 36
+    },
+    {
+      "epoch": 0.04614904895541004,
+      "grad_norm": NaN,
+      "learning_rate": 0.000148,
+      "loss": 0.0,
+      "step": 37
+    },
+    {
+      "epoch": 0.0473963205487995,
+      "grad_norm": NaN,
+      "learning_rate": 0.000152,
+      "loss": 0.0,
+      "step": 38
+    },
+    {
+      "epoch": 0.04864359214218896,
+      "grad_norm": NaN,
+      "learning_rate": 0.00015600000000000002,
+      "loss": 0.0,
+      "step": 39
+    },
+    {
+      "epoch": 0.049890863735578425,
+      "grad_norm": NaN,
+      "learning_rate": 0.00016,
+      "loss": 0.0,
+      "step": 40
+    },
+    {
+      "epoch": 0.05113813532896788,
+      "grad_norm": NaN,
+      "learning_rate": 0.000164,
+      "loss": 0.0,
+      "step": 41
+    },
+    {
+      "epoch": 0.052385406922357346,
+      "grad_norm": NaN,
+      "learning_rate": 0.000168,
+      "loss": 0.0,
+      "step": 42
+    },
+    {
+      "epoch": 0.0536326785157468,
+      "grad_norm": NaN,
+      "learning_rate": 0.000172,
+      "loss": 0.0,
+      "step": 43
+    },
+    {
+      "epoch": 0.054879950109136266,
+      "grad_norm": NaN,
+      "learning_rate": 0.00017600000000000002,
+      "loss": 0.0,
+      "step": 44
+    },
+    {
+      "epoch": 0.05612722170252572,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018,
+      "loss": 0.0,
+      "step": 45
+    },
+    {
+      "epoch": 0.05737449329591519,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018400000000000003,
+      "loss": 0.0,
+      "step": 46
+    },
+    {
+      "epoch": 0.058621764889304644,
+      "grad_norm": NaN,
+      "learning_rate": 0.000188,
+      "loss": 0.0,
+      "step": 47
+    },
+    {
+      "epoch": 0.05986903648269411,
+      "grad_norm": NaN,
+      "learning_rate": 0.000192,
+      "loss": 0.0,
+      "step": 48
+    },
+    {
+      "epoch": 0.061116308076083564,
+      "grad_norm": NaN,
+      "learning_rate": 0.000196,
+      "loss": 0.0,
+      "step": 49
+    },
+    {
+      "epoch": 0.06236357966947303,
+      "grad_norm": NaN,
+      "learning_rate": 0.0002,
+      "loss": 0.0,
+      "step": 50
+    },
+    {
+      "epoch": 0.06361085126286249,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019999912503789813,
+      "loss": 0.0,
+      "step": 51
+    },
+    {
+      "epoch": 0.06485812285625195,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019999650016690364,
+      "loss": 0.0,
+      "step": 52
+    },
+    {
+      "epoch": 0.0661053944496414,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001999921254329498,
+      "loss": 0.0,
+      "step": 53
+    },
+    {
+      "epoch": 0.06735266604303088,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019998600091259113,
+      "loss": 0.0,
+      "step": 54
+    },
+    {
+      "epoch": 0.06859993763642033,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019997812671300214,
+      "loss": 0.0,
+      "step": 55
+    },
+    {
+      "epoch": 0.06984720922980979,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001999685029719753,
+      "loss": 0.0,
+      "step": 56
+    },
+    {
+      "epoch": 0.07109448082319925,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001999571298579188,
+      "loss": 0.0,
+      "step": 57
+    },
+    {
+      "epoch": 0.07234175241658872,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001999440075698535,
+      "loss": 0.0,
+      "step": 58
+    },
+    {
+      "epoch": 0.07358902400997817,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019992913633740957,
+      "loss": 0.0,
+      "step": 59
+    },
+    {
+      "epoch": 0.07483629560336763,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001999125164208222,
+      "loss": 0.0,
+      "step": 60
+    },
+    {
+      "epoch": 0.07608356719675709,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001998941481109274,
+      "loss": 0.0,
+      "step": 61
+    },
+    {
+      "epoch": 0.07733083879014656,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019987403172915666,
+      "loss": 0.0,
+      "step": 62
+    },
+    {
+      "epoch": 0.07857811038353602,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019985216762753139,
+      "loss": 0.0,
+      "step": 63
+    },
+    {
+      "epoch": 0.07982538197692547,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001998285561886568,
+      "loss": 0.0,
+      "step": 64
+    },
+    {
+      "epoch": 0.08107265357031494,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019980319782571523,
+      "loss": 0.0,
+      "step": 65
+    },
+    {
+      "epoch": 0.0823199251637044,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019977609298245873,
+      "loss": 0.0,
+      "step": 66
+    },
+    {
+      "epoch": 0.08356719675709386,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019974724213320157,
+      "loss": 0.0,
+      "step": 67
+    },
+    {
+      "epoch": 0.08481446835048331,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019971664578281173,
+      "loss": 0.0,
+      "step": 68
+    },
+    {
+      "epoch": 0.08606173994387278,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019968430446670212,
+      "loss": 0.0,
+      "step": 69
+    },
+    {
+      "epoch": 0.08730901153726224,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001996502187508213,
+      "loss": 0.0,
+      "step": 70
+    },
+    {
+      "epoch": 0.0885562831306517,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019961438923164345,
+      "loss": 0.0,
+      "step": 71
+    },
+    {
+      "epoch": 0.08980355472404115,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019957681653615797,
+      "loss": 0.0,
+      "step": 72
+    },
+    {
+      "epoch": 0.09105082631743062,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001995375013218586,
+      "loss": 0.0,
+      "step": 73
+    },
+    {
+      "epoch": 0.09229809791082008,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019949644427673177,
+      "loss": 0.0,
+      "step": 74
+    },
+    {
+      "epoch": 0.09354536950420954,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019945364611924463,
+      "loss": 0.0,
+      "step": 75
+    },
+    {
+      "epoch": 0.094792641097599,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001994091075983325,
+      "loss": 0.0,
+      "step": 76
+    },
+    {
+      "epoch": 0.09603991269098847,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019936282949338578,
+      "loss": 0.0,
+      "step": 77
+    },
+    {
+      "epoch": 0.09728718428437792,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019931481261423618,
+      "loss": 0.0,
+      "step": 78
+    },
+    {
+      "epoch": 0.09853445587776738,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019926505780114276,
+      "loss": 0.0,
+      "step": 79
+    },
+    {
+      "epoch": 0.09978172747115685,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001992135659247769,
+      "loss": 0.0,
+      "step": 80
+    },
+    {
+      "epoch": 0.10102899906454631,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019916033788620755,
+      "loss": 0.0,
+      "step": 81
+    },
+    {
+      "epoch": 0.10227627065793576,
+      "grad_norm": NaN,
+      "learning_rate": 0.000199105374616885,
+      "loss": 0.0,
+      "step": 82
+    },
+    {
+      "epoch": 0.10352354225132522,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019904867707862476,
+      "loss": 0.0,
+      "step": 83
+    },
+    {
+      "epoch": 0.10477081384471469,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001989902462635908,
+      "loss": 0.0,
+      "step": 84
+    },
+    {
+      "epoch": 0.10601808543810415,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019893008319427812,
+      "loss": 0.0,
+      "step": 85
+    },
+    {
+      "epoch": 0.1072653570314936,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019886818892349482,
+      "loss": 0.0,
+      "step": 86
+    },
+    {
+      "epoch": 0.10851262862488306,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019880456453434369,
+      "loss": 0.0,
+      "step": 87
+    },
+    {
+      "epoch": 0.10975990021827253,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019873921114020333,
+      "loss": 0.0,
+      "step": 88
+    },
+    {
+      "epoch": 0.11100717181166199,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019867212988470864,
+      "loss": 0.0,
+      "step": 89
+    },
+    {
+      "epoch": 0.11225444340505145,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001986033219417307,
+      "loss": 0.0,
+      "step": 90
+    },
+    {
+      "epoch": 0.11350171499844092,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019853278851535638,
+      "loss": 0.0,
+      "step": 91
+    },
+    {
+      "epoch": 0.11474898659183037,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019846053083986717,
+      "loss": 0.0,
+      "step": 92
+    },
+    {
+      "epoch": 0.11599625818521983,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019838655017971767,
+      "loss": 0.0,
+      "step": 93
+    },
+    {
+      "epoch": 0.11724352977860929,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019831084782951326,
+      "loss": 0.0,
+      "step": 94
+    },
+    {
+      "epoch": 0.11849080137199876,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019823342511398776,
+      "loss": 0.0,
+      "step": 95
+    },
+    {
+      "epoch": 0.11973807296538821,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019815428338798002,
+      "loss": 0.0,
+      "step": 96
+    },
+    {
+      "epoch": 0.12098534455877767,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001980734240364102,
+      "loss": 0.0,
+      "step": 97
+    },
+    {
+      "epoch": 0.12223261615216713,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019799084847425572,
+      "loss": 0.0,
+      "step": 98
+    },
+    {
+      "epoch": 0.1234798877455566,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001979065581465263,
+      "loss": 0.0,
+      "step": 99
+    },
+    {
+      "epoch": 0.12472715933894606,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019782055452823878,
+      "loss": 0.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.1259744309323355,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019773283912439133,
+      "loss": 0.0,
+      "step": 101
+    },
+    {
+      "epoch": 0.12722170252572498,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019764341346993698,
+      "loss": 0.0,
+      "step": 102
+    },
+    {
+      "epoch": 0.12846897411911443,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019755227912975697,
+      "loss": 0.0,
+      "step": 103
+    },
+    {
+      "epoch": 0.1297162457125039,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001974594376986331,
+      "loss": 0.0,
+      "step": 104
+    },
+    {
+      "epoch": 0.13096351730589337,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019736489080122006,
+      "loss": 0.0,
+      "step": 105
+    },
+    {
+      "epoch": 0.1322107888992828,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019726864009201694,
+      "loss": 0.0,
+      "step": 106
+    },
+    {
+      "epoch": 0.13345806049267228,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019717068725533818,
+      "loss": 0.0,
+      "step": 107
+    },
+    {
+      "epoch": 0.13470533208606175,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019707103400528415,
+      "loss": 0.0,
+      "step": 108
+    },
+    {
+      "epoch": 0.1359526036794512,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001969696820857112,
+      "loss": 0.0,
+      "step": 109
+    },
+    {
+      "epoch": 0.13719987527284067,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001968666332702011,
+      "loss": 0.0,
+      "step": 110
+    },
+    {
+      "epoch": 0.1384471468662301,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019676188936203006,
+      "loss": 0.0,
+      "step": 111
+    },
+    {
+      "epoch": 0.13969441845961958,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019665545219413701,
+      "loss": 0.0,
+      "step": 112
+    },
+    {
+      "epoch": 0.14094169005300905,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019654732362909177,
+      "loss": 0.0,
+      "step": 113
+    },
+    {
+      "epoch": 0.1421889616463985,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019643750555906224,
+      "loss": 0.0,
+      "step": 114
+    },
+    {
+      "epoch": 0.14343623323978796,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019632599990578143,
+      "loss": 0.0,
+      "step": 115
+    },
+    {
+      "epoch": 0.14468350483317743,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019621280862051373,
+      "loss": 0.0,
+      "step": 116
+    },
+    {
+      "epoch": 0.14593077642656688,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019609793368402086,
+      "loss": 0.0,
+      "step": 117
+    },
+    {
+      "epoch": 0.14717804801995635,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001959813771065271,
+      "loss": 0.0,
+      "step": 118
+    },
+    {
+      "epoch": 0.14842531961334582,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019586314092768424,
+      "loss": 0.0,
+      "step": 119
+    },
+    {
+      "epoch": 0.14967259120673526,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019574322721653583,
+      "loss": 0.0,
+      "step": 120
+    },
+    {
+      "epoch": 0.15091986280012473,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019562163807148084,
+      "loss": 0.0,
+      "step": 121
+    },
+    {
+      "epoch": 0.15216713439351418,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001954983756202372,
+      "loss": 0.0,
+      "step": 122
+    },
+    {
+      "epoch": 0.15341440598690365,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001953734420198044,
+      "loss": 0.0,
+      "step": 123
+    },
+    {
+      "epoch": 0.15466167758029312,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001952468394564257,
+      "loss": 0.0,
+      "step": 124
+    },
+    {
+      "epoch": 0.15590894917368256,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019511857014555,
+      "loss": 0.0,
+      "step": 125
+    },
+    {
+      "epoch": 0.15715622076707203,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019498863633179308,
+      "loss": 0.0,
+      "step": 126
+    },
+    {
+      "epoch": 0.1584034923604615,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019485704028889813,
+      "loss": 0.0,
+      "step": 127
+    },
+    {
+      "epoch": 0.15965076395385094,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001947237843196962,
+      "loss": 0.0,
+      "step": 128
+    },
+    {
+      "epoch": 0.16089803554724041,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001945888707560657,
+      "loss": 0.0,
+      "step": 129
+    },
+    {
+      "epoch": 0.16214530714062989,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001944523019588918,
+      "loss": 0.0,
+      "step": 130
+    },
+    {
+      "epoch": 0.16339257873401933,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019431408031802486,
+      "loss": 0.0,
+      "step": 131
+    },
+    {
+      "epoch": 0.1646398503274088,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019417420825223891,
+      "loss": 0.0,
+      "step": 132
+    },
+    {
+      "epoch": 0.16588712192079824,
+      "grad_norm": NaN,
+      "learning_rate": 0.000194032688209189,
+      "loss": 0.0,
+      "step": 133
+    },
+    {
+      "epoch": 0.1671343935141877,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019388952266536868,
+      "loss": 0.0,
+      "step": 134
+    },
+    {
+      "epoch": 0.16838166510757718,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019374471412606642,
+      "loss": 0.0,
+      "step": 135
+    },
+    {
+      "epoch": 0.16962893670096663,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019359826512532194,
+      "loss": 0.0,
+      "step": 136
+    },
+    {
+      "epoch": 0.1708762082943561,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019345017822588168,
+      "loss": 0.0,
+      "step": 137
+    },
+    {
+      "epoch": 0.17212347988774557,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001933004560191542,
+      "loss": 0.0,
+      "step": 138
+    },
+    {
+      "epoch": 0.173370751481135,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019314910112516463,
+      "loss": 0.0,
+      "step": 139
+    },
+    {
+      "epoch": 0.17461802307452448,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019299611619250881,
+      "loss": 0.0,
+      "step": 140
+    },
+    {
+      "epoch": 0.17586529466791395,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019284150389830721,
+      "loss": 0.0,
+      "step": 141
+    },
+    {
+      "epoch": 0.1771125662613034,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019268526694815773,
+      "loss": 0.0,
+      "step": 142
+    },
+    {
+      "epoch": 0.17835983785469287,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001925274080760886,
+      "loss": 0.0,
+      "step": 143
+    },
+    {
+      "epoch": 0.1796071094480823,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019236793004451044,
+      "loss": 0.0,
+      "step": 144
+    },
+    {
+      "epoch": 0.18085438104147178,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019220683564416787,
+      "loss": 0.0,
+      "step": 145
+    },
+    {
+      "epoch": 0.18210165263486125,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019204412769409086,
+      "loss": 0.0,
+      "step": 146
+    },
+    {
+      "epoch": 0.1833489242282507,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019187980904154515,
+      "loss": 0.0,
+      "step": 147
+    },
+    {
+      "epoch": 0.18459619582164016,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019171388256198268,
+      "loss": 0.0,
+      "step": 148
+    },
+    {
+      "epoch": 0.18584346741502963,
+      "grad_norm": NaN,
+      "learning_rate": 0.000191546351158991,
+      "loss": 0.0,
+      "step": 149
+    },
+    {
+      "epoch": 0.18709073900841908,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019137721776424274,
+      "loss": 0.0,
+      "step": 150
+    },
+    {
+      "epoch": 0.18833801060180855,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001912064853374441,
+      "loss": 0.0,
+      "step": 151
+    },
+    {
+      "epoch": 0.189585282195198,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001910341568662831,
+      "loss": 0.0,
+      "step": 152
+    },
+    {
+      "epoch": 0.19083255378858746,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019086023536637737,
+      "loss": 0.0,
+      "step": 153
+    },
+    {
+      "epoch": 0.19207982538197693,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001906847238812214,
+      "loss": 0.0,
+      "step": 154
+    },
+    {
+      "epoch": 0.19332709697536637,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001905076254821331,
+      "loss": 0.0,
+      "step": 155
+    },
+    {
+      "epoch": 0.19457436856875585,
+      "grad_norm": NaN,
+      "learning_rate": 0.00019032894326820023,
+      "loss": 0.0,
+      "step": 156
+    },
+    {
+      "epoch": 0.19582164016214532,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001901486803662261,
+      "loss": 0.0,
+      "step": 157
+    },
+    {
+      "epoch": 0.19706891175553476,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018996683993067483,
+      "loss": 0.0,
+      "step": 158
+    },
+    {
+      "epoch": 0.19831618334892423,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018978342514361626,
+      "loss": 0.0,
+      "step": 159
+    },
+    {
+      "epoch": 0.1995634549423137,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018959843921467014,
+      "loss": 0.0,
+      "step": 160
+    },
+    {
+      "epoch": 0.20081072653570314,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018941188538094999,
+      "loss": 0.0,
+      "step": 161
+    },
+    {
+      "epoch": 0.20205799812909261,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001892237669070065,
+      "loss": 0.0,
+      "step": 162
+    },
+    {
+      "epoch": 0.20330526972248206,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001890340870847704,
+      "loss": 0.0,
+      "step": 163
+    },
+    {
+      "epoch": 0.20455254131587153,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018884284923349477,
+      "loss": 0.0,
+      "step": 164
+    },
+    {
+      "epoch": 0.205799812909261,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018865005669969708,
+      "loss": 0.0,
+      "step": 165
+    },
+    {
+      "epoch": 0.20704708450265044,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018845571285710058,
+      "loss": 0.0,
+      "step": 166
+    },
+    {
+      "epoch": 0.2082943560960399,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018825982110657515,
+      "loss": 0.0,
+      "step": 167
+    },
+    {
+      "epoch": 0.20954162768942938,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018806238487607794,
+      "loss": 0.0,
+      "step": 168
+    },
+    {
+      "epoch": 0.21078889928281883,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001878634076205934,
+      "loss": 0.0,
+      "step": 169
+    },
+    {
+      "epoch": 0.2120361708762083,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018766289282207263,
+      "loss": 0.0,
+      "step": 170
+    },
+    {
+      "epoch": 0.21328344246959777,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018746084398937266,
+      "loss": 0.0,
+      "step": 171
+    },
+    {
+      "epoch": 0.2145307140629872,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018725726465819488,
+      "loss": 0.0,
+      "step": 172
+    },
+    {
+      "epoch": 0.21577798565637668,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018705215839102328,
+      "loss": 0.0,
+      "step": 173
+    },
+    {
+      "epoch": 0.21702525724976612,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001868455287770621,
+      "loss": 0.0,
+      "step": 174
+    },
+    {
+      "epoch": 0.2182725288431556,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018663737943217296,
+      "loss": 0.0,
+      "step": 175
+    },
+    {
+      "epoch": 0.21951980043654507,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018642771399881162,
+      "loss": 0.0,
+      "step": 176
+    },
+    {
+      "epoch": 0.2207670720299345,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018621653614596425,
+      "loss": 0.0,
+      "step": 177
+    },
+    {
+      "epoch": 0.22201434362332398,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018600384956908323,
+      "loss": 0.0,
+      "step": 178
+    },
+    {
+      "epoch": 0.22326161521671345,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018578965799002236,
+      "loss": 0.0,
+      "step": 179
+    },
+    {
+      "epoch": 0.2245088868101029,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018557396515697202,
+      "loss": 0.0,
+      "step": 180
+    },
+    {
+      "epoch": 0.22575615840349236,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001853567748443933,
+      "loss": 0.0,
+      "step": 181
+    },
+    {
+      "epoch": 0.22700342999688183,
+      "grad_norm": NaN,
+      "learning_rate": 0.000185138090852952,
+      "loss": 0.0,
+      "step": 182
+    },
+    {
+      "epoch": 0.22825070159027128,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001849179170094522,
+      "loss": 0.0,
+      "step": 183
+    },
+    {
+      "epoch": 0.22949797318366075,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018469625716676933,
+      "loss": 0.0,
+      "step": 184
+    },
+    {
+      "epoch": 0.2307452447770502,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018447311520378262,
+      "loss": 0.0,
+      "step": 185
+    },
+    {
+      "epoch": 0.23199251637043966,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001842484950253073,
+      "loss": 0.0,
+      "step": 186
+    },
+    {
+      "epoch": 0.23323978796382913,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018402240056202614,
+      "loss": 0.0,
+      "step": 187
+    },
+    {
+      "epoch": 0.23448705955721857,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018379483577042103,
+      "loss": 0.0,
+      "step": 188
+    },
+    {
+      "epoch": 0.23573433115060805,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018356580463270322,
+      "loss": 0.0,
+      "step": 189
+    },
+    {
+      "epoch": 0.23698160274399752,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018333531115674408,
+      "loss": 0.0,
+      "step": 190
+    },
+    {
+      "epoch": 0.23822887433738696,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001831033593760047,
+      "loss": 0.0,
+      "step": 191
+    },
+    {
+      "epoch": 0.23947614593077643,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018286995334946545,
+      "loss": 0.0,
+      "step": 192
+    },
+    {
+      "epoch": 0.2407234175241659,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001826350971615549,
+      "loss": 0.0,
+      "step": 193
+    },
+    {
+      "epoch": 0.24197068911755534,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018239879492207831,
+      "loss": 0.0,
+      "step": 194
+    },
+    {
+      "epoch": 0.2432179607109448,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018216105076614576,
+      "loss": 0.0,
+      "step": 195
+    },
+    {
+      "epoch": 0.24446523230433426,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018192186885409973,
+      "loss": 0.0,
+      "step": 196
+    },
+    {
+      "epoch": 0.24571250389772373,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001816812533714425,
+      "loss": 0.0,
+      "step": 197
+    },
+    {
+      "epoch": 0.2469597754911132,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018143920852876257,
+      "loss": 0.0,
+      "step": 198
+    },
+    {
+      "epoch": 0.24820704708450264,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001811957385616612,
+      "loss": 0.0,
+      "step": 199
+    },
+    {
+      "epoch": 0.2494543186778921,
+      "grad_norm": NaN,
+      "learning_rate": 0.0001809508477306783,
+      "loss": 0.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.2507015902712816,
+      "grad_norm": NaN,
+      "learning_rate": 0.00018070454032121787,
+      "loss": 0.0,
+      "step": 201
+    },
+    {
+      "epoch": 0.2507015902712816,
+      "eval_loss": NaN,
+      "eval_runtime": 60.4112,
+      "eval_samples_per_second": 22.363,
+      "eval_steps_per_second": 5.595,
+      "step": 201
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 801,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 201,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 8.281271774373478e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22dc4fd20af497b2522de45c8c9cdbff9070a4e769a7cbfb8b2750bdf1364b0a
+size 6776