Training in progress, step 239, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +40 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +4 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +205 -0
last-checkpoint/trainer_state.json +1706 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: katuni4ka/tiny-random-dbrx
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "katuni4ka/tiny-random-dbrx",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "embed_tokens",
+    "lm_head"
+  ],
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "layer",
+    "k_proj",
+    "Wqkv",
+    "up_proj",
+    "down_proj",
+    "v_proj",
+    "o_proj",
+    "out_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:666bd9787a4664d35d5142dd6760d4c207f349bc72fe1fe4357ec1a4ad279364
+size 1623800

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|im_end|>": 100279,
+  "<|im_start|>": 100278
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e27fc1dce7eb977f35385d642958b3172e70df0a3e96876f908eea2a6cca594
+size 3255543

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27f8f184937877ad3d41e45182827881783af0fadc7418018ae57bb3b469d8de
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d0b056489e836af682bcbf2356e0c80ac614bc3abf91d51e1fe2af8e3b96245
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|pad|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,205 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "100256": {
+      "content": "<||_unused_0_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100257": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100258": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100259": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100260": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100261": {
+      "content": "<||_unused_1_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100262": {
+      "content": "<||_unused_2_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100263": {
+      "content": "<||_unused_3_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100264": {
+      "content": "<||_unused_4_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100265": {
+      "content": "<||_unused_5_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100266": {
+      "content": "<||_unused_6_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100267": {
+      "content": "<||_unused_7_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100268": {
+      "content": "<||_unused_8_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100269": {
+      "content": "<||_unused_9_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100270": {
+      "content": "<||_unused_10_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100271": {
+      "content": "<||_unused_11_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100272": {
+      "content": "<||_unused_12_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100273": {
+      "content": "<||_unused_13_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100274": {
+      "content": "<||_unused_14_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100275": {
+      "content": "<||_unused_15_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100276": {
+      "content": "<|endofprompt|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100277": {
+      "content": "<|pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100278": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100279": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 32768,
+  "pad_token": "<|pad|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1706 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.03263021366646188,
+  "eval_steps": 500,
+  "global_step": 239,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0001365280906546522,
+      "grad_norm": 0.0064392657950520515,
+      "learning_rate": 1.0000000000000002e-06,
+      "loss": 46.0,
+      "step": 1
+    },
+    {
+      "epoch": 0.0002730561813093044,
+      "grad_norm": 0.005096447188407183,
+      "learning_rate": 2.0000000000000003e-06,
+      "loss": 46.0,
+      "step": 2
+    },
+    {
+      "epoch": 0.00040958427196395656,
+      "grad_norm": 0.00540162855759263,
+      "learning_rate": 3e-06,
+      "loss": 46.0,
+      "step": 3
+    },
+    {
+      "epoch": 0.0005461123626186088,
+      "grad_norm": 0.004730381537228823,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 46.0,
+      "step": 4
+    },
+    {
+      "epoch": 0.0006826404532732609,
+      "grad_norm": 0.005004895851016045,
+      "learning_rate": 5e-06,
+      "loss": 46.0,
+      "step": 5
+    },
+    {
+      "epoch": 0.0008191685439279131,
+      "grad_norm": 0.0047608171589672565,
+      "learning_rate": 6e-06,
+      "loss": 46.0,
+      "step": 6
+    },
+    {
+      "epoch": 0.0009556966345825653,
+      "grad_norm": 0.004943865351378918,
+      "learning_rate": 7.000000000000001e-06,
+      "loss": 46.0,
+      "step": 7
+    },
+    {
+      "epoch": 0.0010922247252372176,
+      "grad_norm": 0.005371114704757929,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 46.0,
+      "step": 8
+    },
+    {
+      "epoch": 0.0012287528158918697,
+      "grad_norm": 0.005523690953850746,
+      "learning_rate": 9e-06,
+      "loss": 46.0,
+      "step": 9
+    },
+    {
+      "epoch": 0.0013652809065465218,
+      "grad_norm": 0.005615883972495794,
+      "learning_rate": 1e-05,
+      "loss": 46.0,
+      "step": 10
+    },
+    {
+      "epoch": 0.0015018089972011742,
+      "grad_norm": 0.0048523093573749065,
+      "learning_rate": 1.1000000000000001e-05,
+      "loss": 46.0,
+      "step": 11
+    },
+    {
+      "epoch": 0.0016383370878558263,
+      "grad_norm": 0.0064697773195803165,
+      "learning_rate": 1.2e-05,
+      "loss": 46.0,
+      "step": 12
+    },
+    {
+      "epoch": 0.0017748651785104786,
+      "grad_norm": 0.006073013413697481,
+      "learning_rate": 1.3000000000000001e-05,
+      "loss": 46.0,
+      "step": 13
+    },
+    {
+      "epoch": 0.0019113932691651307,
+      "grad_norm": 0.006469789892435074,
+      "learning_rate": 1.4000000000000001e-05,
+      "loss": 46.0,
+      "step": 14
+    },
+    {
+      "epoch": 0.0020479213598197828,
+      "grad_norm": 0.006134071387350559,
+      "learning_rate": 1.5e-05,
+      "loss": 46.0,
+      "step": 15
+    },
+    {
+      "epoch": 0.0021844494504744353,
+      "grad_norm": 0.006042553577572107,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 46.0,
+      "step": 16
+    },
+    {
+      "epoch": 0.0023209775411290874,
+      "grad_norm": 0.005889922846108675,
+      "learning_rate": 1.7000000000000003e-05,
+      "loss": 46.0,
+      "step": 17
+    },
+    {
+      "epoch": 0.0024575056317837395,
+      "grad_norm": 0.007019102107733488,
+      "learning_rate": 1.8e-05,
+      "loss": 46.0,
+      "step": 18
+    },
+    {
+      "epoch": 0.0025940337224383916,
+      "grad_norm": 0.006500387564301491,
+      "learning_rate": 1.9e-05,
+      "loss": 46.0,
+      "step": 19
+    },
+    {
+      "epoch": 0.0027305618130930437,
+      "grad_norm": 0.006012055091559887,
+      "learning_rate": 2e-05,
+      "loss": 46.0,
+      "step": 20
+    },
+    {
+      "epoch": 0.0028670899037476962,
+      "grad_norm": 0.00662240432575345,
+      "learning_rate": 2.1e-05,
+      "loss": 46.0,
+      "step": 21
+    },
+    {
+      "epoch": 0.0030036179944023483,
+      "grad_norm": 0.006500392220914364,
+      "learning_rate": 2.2000000000000003e-05,
+      "loss": 46.0,
+      "step": 22
+    },
+    {
+      "epoch": 0.0031401460850570004,
+      "grad_norm": 0.0059815868735313416,
+      "learning_rate": 2.3000000000000003e-05,
+      "loss": 46.0,
+      "step": 23
+    },
+    {
+      "epoch": 0.0032766741757116525,
+      "grad_norm": 0.007568406872451305,
+      "learning_rate": 2.4e-05,
+      "loss": 46.0,
+      "step": 24
+    },
+    {
+      "epoch": 0.003413202266366305,
+      "grad_norm": 0.007171654608100653,
+      "learning_rate": 2.5e-05,
+      "loss": 46.0,
+      "step": 25
+    },
+    {
+      "epoch": 0.003549730357020957,
+      "grad_norm": 0.006286699324846268,
+      "learning_rate": 2.6000000000000002e-05,
+      "loss": 46.0,
+      "step": 26
+    },
+    {
+      "epoch": 0.0036862584476756092,
+      "grad_norm": 0.006256168242543936,
+      "learning_rate": 2.7000000000000002e-05,
+      "loss": 46.0,
+      "step": 27
+    },
+    {
+      "epoch": 0.0038227865383302613,
+      "grad_norm": 0.007476857863366604,
+      "learning_rate": 2.8000000000000003e-05,
+      "loss": 46.0,
+      "step": 28
+    },
+    {
+      "epoch": 0.003959314628984914,
+      "grad_norm": 0.005951044149696827,
+      "learning_rate": 2.9e-05,
+      "loss": 46.0,
+      "step": 29
+    },
+    {
+      "epoch": 0.0040958427196395655,
+      "grad_norm": 0.006622393615543842,
+      "learning_rate": 3e-05,
+      "loss": 46.0,
+      "step": 30
+    },
+    {
+      "epoch": 0.004232370810294218,
+      "grad_norm": 0.007354812230914831,
+      "learning_rate": 3.1e-05,
+      "loss": 46.0,
+      "step": 31
+    },
+    {
+      "epoch": 0.004368898900948871,
+      "grad_norm": 0.0065308124758303165,
+      "learning_rate": 3.2000000000000005e-05,
+      "loss": 46.0,
+      "step": 32
+    },
+    {
+      "epoch": 0.004505426991603522,
+      "grad_norm": 0.0068970052525401115,
+      "learning_rate": 3.3e-05,
+      "loss": 46.0,
+      "step": 33
+    },
+    {
+      "epoch": 0.004641955082258175,
+      "grad_norm": 0.00781263131648302,
+      "learning_rate": 3.4000000000000007e-05,
+      "loss": 46.0,
+      "step": 34
+    },
+    {
+      "epoch": 0.0047784831729128265,
+      "grad_norm": 0.00711080152541399,
+      "learning_rate": 3.5e-05,
+      "loss": 46.0,
+      "step": 35
+    },
+    {
+      "epoch": 0.004915011263567479,
+      "grad_norm": 0.00689734797924757,
+      "learning_rate": 3.6e-05,
+      "loss": 46.0,
+      "step": 36
+    },
+    {
+      "epoch": 0.0050515393542221315,
+      "grad_norm": 0.007202244829386473,
+      "learning_rate": 3.7e-05,
+      "loss": 46.0,
+      "step": 37
+    },
+    {
+      "epoch": 0.005188067444876783,
+      "grad_norm": 0.0074768345803022385,
+      "learning_rate": 3.8e-05,
+      "loss": 46.0,
+      "step": 38
+    },
+    {
+      "epoch": 0.005324595535531436,
+      "grad_norm": 0.007263243198394775,
+      "learning_rate": 3.9000000000000006e-05,
+      "loss": 46.0,
+      "step": 39
+    },
+    {
+      "epoch": 0.005461123626186087,
+      "grad_norm": 0.007598937954753637,
+      "learning_rate": 4e-05,
+      "loss": 46.0,
+      "step": 40
+    },
+    {
+      "epoch": 0.00559765171684074,
+      "grad_norm": 0.006866552401334047,
+      "learning_rate": 4.1e-05,
+      "loss": 46.0,
+      "step": 41
+    },
+    {
+      "epoch": 0.0057341798074953924,
+      "grad_norm": 0.007080144714564085,
+      "learning_rate": 4.2e-05,
+      "loss": 46.0,
+      "step": 42
+    },
+    {
+      "epoch": 0.005870707898150044,
+      "grad_norm": 0.007629483472555876,
+      "learning_rate": 4.3e-05,
+      "loss": 46.0,
+      "step": 43
+    },
+    {
+      "epoch": 0.006007235988804697,
+      "grad_norm": 0.007354784291237593,
+      "learning_rate": 4.4000000000000006e-05,
+      "loss": 46.0,
+      "step": 44
+    },
+    {
+      "epoch": 0.006143764079459349,
+      "grad_norm": 0.008544961921870708,
+      "learning_rate": 4.5e-05,
+      "loss": 46.0,
+      "step": 45
+    },
+    {
+      "epoch": 0.006280292170114001,
+      "grad_norm": 0.008300847373902798,
+      "learning_rate": 4.600000000000001e-05,
+      "loss": 46.0,
+      "step": 46
+    },
+    {
+      "epoch": 0.006416820260768653,
+      "grad_norm": 0.009948834776878357,
+      "learning_rate": 4.7e-05,
+      "loss": 46.0,
+      "step": 47
+    },
+    {
+      "epoch": 0.006553348351423305,
+      "grad_norm": 0.011779936961829662,
+      "learning_rate": 4.8e-05,
+      "loss": 46.0,
+      "step": 48
+    },
+    {
+      "epoch": 0.0066898764420779576,
+      "grad_norm": 0.012878548353910446,
+      "learning_rate": 4.9e-05,
+      "loss": 46.0,
+      "step": 49
+    },
+    {
+      "epoch": 0.00682640453273261,
+      "grad_norm": 0.025269048288464546,
+      "learning_rate": 5e-05,
+      "loss": 46.0,
+      "step": 50
+    },
+    {
+      "epoch": 0.006962932623387262,
+      "grad_norm": 0.0076904622837901115,
+      "learning_rate": 5.1000000000000006e-05,
+      "loss": 46.0,
+      "step": 51
+    },
+    {
+      "epoch": 0.007099460714041914,
+      "grad_norm": 0.005523694213479757,
+      "learning_rate": 5.2000000000000004e-05,
+      "loss": 46.0,
+      "step": 52
+    },
+    {
+      "epoch": 0.007235988804696566,
+      "grad_norm": 0.004943859297782183,
+      "learning_rate": 5.300000000000001e-05,
+      "loss": 46.0,
+      "step": 53
+    },
+    {
+      "epoch": 0.0073725168953512185,
+      "grad_norm": 0.005584749858826399,
+      "learning_rate": 5.4000000000000005e-05,
+      "loss": 46.0,
+      "step": 54
+    },
+    {
+      "epoch": 0.007509044986005871,
+      "grad_norm": 0.005279551260173321,
+      "learning_rate": 5.500000000000001e-05,
+      "loss": 46.0,
+      "step": 55
+    },
+    {
+      "epoch": 0.007645573076660523,
+      "grad_norm": 0.00512697221711278,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 46.0,
+      "step": 56
+    },
+    {
+      "epoch": 0.007782101167315175,
+      "grad_norm": 0.005096450448036194,
+      "learning_rate": 5.6999999999999996e-05,
+      "loss": 46.0,
+      "step": 57
+    },
+    {
+      "epoch": 0.007918629257969828,
+      "grad_norm": 0.00512696523219347,
+      "learning_rate": 5.8e-05,
+      "loss": 46.0,
+      "step": 58
+    },
+    {
+      "epoch": 0.00805515734862448,
+      "grad_norm": 0.005279564298689365,
+      "learning_rate": 5.9e-05,
+      "loss": 46.0,
+      "step": 59
+    },
+    {
+      "epoch": 0.008191685439279131,
+      "grad_norm": 0.005493259057402611,
+      "learning_rate": 6e-05,
+      "loss": 46.0,
+      "step": 60
+    },
+    {
+      "epoch": 0.008328213529933784,
+      "grad_norm": 0.005584775935858488,
+      "learning_rate": 6.1e-05,
+      "loss": 46.0,
+      "step": 61
+    },
+    {
+      "epoch": 0.008464741620588436,
+      "grad_norm": 0.004974375944584608,
+      "learning_rate": 6.2e-05,
+      "loss": 46.0,
+      "step": 62
+    },
+    {
+      "epoch": 0.008601269711243089,
+      "grad_norm": 0.005279638338834047,
+      "learning_rate": 6.3e-05,
+      "loss": 46.0,
+      "step": 63
+    },
+    {
+      "epoch": 0.008737797801897741,
+      "grad_norm": 0.005645803641527891,
+      "learning_rate": 6.400000000000001e-05,
+      "loss": 46.0,
+      "step": 64
+    },
+    {
+      "epoch": 0.008874325892552392,
+      "grad_norm": 0.0064697787165641785,
+      "learning_rate": 6.500000000000001e-05,
+      "loss": 46.0,
+      "step": 65
+    },
+    {
+      "epoch": 0.009010853983207045,
+      "grad_norm": 0.005798387341201305,
+      "learning_rate": 6.6e-05,
+      "loss": 46.0,
+      "step": 66
+    },
+    {
+      "epoch": 0.009147382073861697,
+      "grad_norm": 0.006103536579757929,
+      "learning_rate": 6.7e-05,
+      "loss": 46.0,
+      "step": 67
+    },
+    {
+      "epoch": 0.00928391016451635,
+      "grad_norm": 0.007263305597007275,
+      "learning_rate": 6.800000000000001e-05,
+      "loss": 46.0,
+      "step": 68
+    },
+    {
+      "epoch": 0.009420438255171002,
+      "grad_norm": 0.006164577789604664,
+      "learning_rate": 6.9e-05,
+      "loss": 46.0,
+      "step": 69
+    },
+    {
+      "epoch": 0.009556966345825653,
+      "grad_norm": 0.006591958459466696,
+      "learning_rate": 7e-05,
+      "loss": 46.0,
+      "step": 70
+    },
+    {
+      "epoch": 0.009693494436480305,
+      "grad_norm": 0.006713982205837965,
+      "learning_rate": 7.1e-05,
+      "loss": 46.0,
+      "step": 71
+    },
+    {
+      "epoch": 0.009830022527134958,
+      "grad_norm": 0.006500311195850372,
+      "learning_rate": 7.2e-05,
+      "loss": 46.0,
+      "step": 72
+    },
+    {
+      "epoch": 0.00996655061778961,
+      "grad_norm": 0.010498268529772758,
+      "learning_rate": 7.3e-05,
+      "loss": 46.0,
+      "step": 73
+    },
+    {
+      "epoch": 0.010103078708444263,
+      "grad_norm": 0.006866553332656622,
+      "learning_rate": 7.4e-05,
+      "loss": 46.0,
+      "step": 74
+    },
+    {
+      "epoch": 0.010239606799098914,
+      "grad_norm": 0.006439250893890858,
+      "learning_rate": 7.500000000000001e-05,
+      "loss": 46.0,
+      "step": 75
+    },
+    {
+      "epoch": 0.010376134889753566,
+      "grad_norm": 0.006378244608640671,
+      "learning_rate": 7.6e-05,
+      "loss": 46.0,
+      "step": 76
+    },
+    {
+      "epoch": 0.010512662980408219,
+      "grad_norm": 0.006500313989818096,
+      "learning_rate": 7.7e-05,
+      "loss": 46.0,
+      "step": 77
+    },
+    {
+      "epoch": 0.010649191071062871,
+      "grad_norm": 0.006805569399148226,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 46.0,
+      "step": 78
+    },
+    {
+      "epoch": 0.010785719161717524,
+      "grad_norm": 0.005890019237995148,
+      "learning_rate": 7.900000000000001e-05,
+      "loss": 46.0,
+      "step": 79
+    },
+    {
+      "epoch": 0.010922247252372175,
+      "grad_norm": 0.007568598259240389,
+      "learning_rate": 8e-05,
+      "loss": 46.0,
+      "step": 80
+    },
+    {
+      "epoch": 0.011058775343026827,
+      "grad_norm": 0.006286646705120802,
+      "learning_rate": 8.1e-05,
+      "loss": 46.0,
+      "step": 81
+    },
+    {
+      "epoch": 0.01119530343368148,
+      "grad_norm": 0.007446358446031809,
+      "learning_rate": 8.2e-05,
+      "loss": 46.0,
+      "step": 82
+    },
+    {
+      "epoch": 0.011331831524336132,
+      "grad_norm": 0.00604271562770009,
+      "learning_rate": 8.3e-05,
+      "loss": 46.0,
+      "step": 83
+    },
+    {
+      "epoch": 0.011468359614990785,
+      "grad_norm": 0.007476830389350653,
+      "learning_rate": 8.4e-05,
+      "loss": 46.0,
+      "step": 84
+    },
+    {
+      "epoch": 0.011604887705645437,
+      "grad_norm": 0.0071412562392652035,
+      "learning_rate": 8.5e-05,
+      "loss": 46.0,
+      "step": 85
+    },
+    {
+      "epoch": 0.011741415796300088,
+      "grad_norm": 0.007598943542689085,
+      "learning_rate": 8.6e-05,
+      "loss": 46.0,
+      "step": 86
+    },
+    {
+      "epoch": 0.01187794388695474,
+      "grad_norm": 0.007599305361509323,
+      "learning_rate": 8.7e-05,
+      "loss": 46.0,
+      "step": 87
+    },
+    {
+      "epoch": 0.012014471977609393,
+      "grad_norm": 0.007049628999084234,
+      "learning_rate": 8.800000000000001e-05,
+      "loss": 46.0,
+      "step": 88
+    },
+    {
+      "epoch": 0.012151000068264046,
+      "grad_norm": 0.007233199663460255,
+      "learning_rate": 8.900000000000001e-05,
+      "loss": 46.0,
+      "step": 89
+    },
+    {
+      "epoch": 0.012287528158918698,
+      "grad_norm": 0.007324621547013521,
+      "learning_rate": 9e-05,
+      "loss": 46.0,
+      "step": 90
+    },
+    {
+      "epoch": 0.012424056249573349,
+      "grad_norm": 0.007782080676406622,
+      "learning_rate": 9.1e-05,
+      "loss": 46.0,
+      "step": 91
+    },
+    {
+      "epoch": 0.012560584340228002,
+      "grad_norm": 0.007446406874805689,
+      "learning_rate": 9.200000000000001e-05,
+      "loss": 46.0,
+      "step": 92
+    },
+    {
+      "epoch": 0.012697112430882654,
+      "grad_norm": 0.00799572840332985,
+      "learning_rate": 9.300000000000001e-05,
+      "loss": 46.0,
+      "step": 93
+    },
+    {
+      "epoch": 0.012833640521537307,
+      "grad_norm": 0.007782103028148413,
+      "learning_rate": 9.4e-05,
+      "loss": 46.0,
+      "step": 94
+    },
+    {
+      "epoch": 0.01297016861219196,
+      "grad_norm": 0.00921639148145914,
+      "learning_rate": 9.5e-05,
+      "loss": 46.0,
+      "step": 95
+    },
+    {
+      "epoch": 0.01310669670284661,
+      "grad_norm": 0.00836196169257164,
+      "learning_rate": 9.6e-05,
+      "loss": 46.0,
+      "step": 96
+    },
+    {
+      "epoch": 0.013243224793501263,
+      "grad_norm": 0.009582675993442535,
+      "learning_rate": 9.7e-05,
+      "loss": 46.0,
+      "step": 97
+    },
+    {
+      "epoch": 0.013379752884155915,
+      "grad_norm": 0.009887848980724812,
+      "learning_rate": 9.8e-05,
+      "loss": 46.0,
+      "step": 98
+    },
+    {
+      "epoch": 0.013516280974810568,
+      "grad_norm": 0.01336693949997425,
+      "learning_rate": 9.900000000000001e-05,
+      "loss": 46.0,
+      "step": 99
+    },
+    {
+      "epoch": 0.01365280906546522,
+      "grad_norm": 0.025635506957769394,
+      "learning_rate": 0.0001,
+      "loss": 46.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.013789337156119871,
+      "grad_norm": 0.009216404519975185,
+      "learning_rate": 9.999999527192591e-05,
+      "loss": 46.0,
+      "step": 101
+    },
+    {
+      "epoch": 0.013925865246774524,
+      "grad_norm": 0.005340642761439085,
+      "learning_rate": 9.999998108770457e-05,
+      "loss": 46.0,
+      "step": 102
+    },
+    {
+      "epoch": 0.014062393337429176,
+      "grad_norm": 0.0050964620895683765,
+      "learning_rate": 9.999995744733863e-05,
+      "loss": 46.0,
+      "step": 103
+    },
+    {
+      "epoch": 0.014198921428083829,
+      "grad_norm": 0.005096455104649067,
+      "learning_rate": 9.999992435083259e-05,
+      "loss": 46.0,
+      "step": 104
+    },
+    {
+      "epoch": 0.014335449518738481,
+      "grad_norm": 0.005065944045782089,
+      "learning_rate": 9.999988179819268e-05,
+      "loss": 46.0,
+      "step": 105
+    },
+    {
+      "epoch": 0.014471977609393132,
+      "grad_norm": 0.005523856729269028,
+      "learning_rate": 9.999982978942697e-05,
+      "loss": 46.0,
+      "step": 106
+    },
+    {
+      "epoch": 0.014608505700047784,
+      "grad_norm": 0.005218552425503731,
+      "learning_rate": 9.999976832454529e-05,
+      "loss": 46.0,
+      "step": 107
+    },
+    {
+      "epoch": 0.014745033790702437,
+      "grad_norm": 0.00527958245947957,
+      "learning_rate": 9.999969740355926e-05,
+      "loss": 46.0,
+      "step": 108
+    },
+    {
+      "epoch": 0.01488156188135709,
+      "grad_norm": 0.004852349869906902,
+      "learning_rate": 9.999961702648229e-05,
+      "loss": 46.0,
+      "step": 109
+    },
+    {
+      "epoch": 0.015018089972011742,
+      "grad_norm": 0.004943876527249813,
+      "learning_rate": 9.999952719332959e-05,
+      "loss": 46.0,
+      "step": 110
+    },
+    {
+      "epoch": 0.015154618062666393,
+      "grad_norm": 0.005554288160055876,
+      "learning_rate": 9.999942790411816e-05,
+      "loss": 46.0,
+      "step": 111
+    },
+    {
+      "epoch": 0.015291146153321045,
+      "grad_norm": 0.004821829032152891,
+      "learning_rate": 9.999931915886675e-05,
+      "loss": 46.0,
+      "step": 112
+    },
+    {
+      "epoch": 0.015427674243975698,
+      "grad_norm": 0.005737371277064085,
+      "learning_rate": 9.999920095759594e-05,
+      "loss": 46.0,
+      "step": 113
+    },
+    {
+      "epoch": 0.01556420233463035,
+      "grad_norm": 0.005706910975277424,
+      "learning_rate": 9.999907330032809e-05,
+      "loss": 46.0,
+      "step": 114
+    },
+    {
+      "epoch": 0.015700730425285,
+      "grad_norm": 0.005920883733779192,
+      "learning_rate": 9.999893618708734e-05,
+      "loss": 46.0,
+      "step": 115
+    },
+    {
+      "epoch": 0.015837258515939655,
+      "grad_norm": 0.006409116089344025,
+      "learning_rate": 9.999878961789962e-05,
+      "loss": 46.0,
+      "step": 116
+    },
+    {
+      "epoch": 0.015973786606594306,
+      "grad_norm": 0.005768393166363239,
+      "learning_rate": 9.999863359279264e-05,
+      "loss": 46.0,
+      "step": 117
+    },
+    {
+      "epoch": 0.01611031469724896,
+      "grad_norm": 0.006988877430558205,
+      "learning_rate": 9.999846811179592e-05,
+      "loss": 46.0,
+      "step": 118
+    },
+    {
+      "epoch": 0.01624684278790361,
+      "grad_norm": 0.006043075118213892,
+      "learning_rate": 9.999829317494075e-05,
+      "loss": 46.0,
+      "step": 119
+    },
+    {
+      "epoch": 0.016383370878558262,
+      "grad_norm": 0.006500791292637587,
+      "learning_rate": 9.999810878226022e-05,
+      "loss": 46.0,
+      "step": 120
+    },
+    {
+      "epoch": 0.016519898969212916,
+      "grad_norm": 0.006592370104044676,
+      "learning_rate": 9.999791493378921e-05,
+      "loss": 46.0,
+      "step": 121
+    },
+    {
+      "epoch": 0.016656427059867567,
+      "grad_norm": 0.006134727504104376,
+      "learning_rate": 9.999771162956436e-05,
+      "loss": 46.0,
+      "step": 122
+    },
+    {
+      "epoch": 0.01679295515052222,
+      "grad_norm": 0.006105243694037199,
+      "learning_rate": 9.999749886962413e-05,
+      "loss": 46.0,
+      "step": 123
+    },
+    {
+      "epoch": 0.016929483241176872,
+      "grad_norm": 0.006196007132530212,
+      "learning_rate": 9.999727665400875e-05,
+      "loss": 46.0,
+      "step": 124
+    },
+    {
+      "epoch": 0.017066011331831523,
+      "grad_norm": 0.006256686523556709,
+      "learning_rate": 9.999704498276029e-05,
+      "loss": 46.0,
+      "step": 125
+    },
+    {
+      "epoch": 0.017202539422486177,
+      "grad_norm": 0.006929056718945503,
+      "learning_rate": 9.99968038559225e-05,
+      "loss": 46.0,
+      "step": 126
+    },
+    {
+      "epoch": 0.017339067513140828,
+      "grad_norm": 0.006196240894496441,
+      "learning_rate": 9.999655327354102e-05,
+      "loss": 46.0,
+      "step": 127
+    },
+    {
+      "epoch": 0.017475595603795482,
+      "grad_norm": 0.006714789662510157,
+      "learning_rate": 9.999629323566323e-05,
+      "loss": 46.0,
+      "step": 128
+    },
+    {
+      "epoch": 0.017612123694450133,
+      "grad_norm": 0.006685623899102211,
+      "learning_rate": 9.999602374233832e-05,
+      "loss": 46.0,
+      "step": 129
+    },
+    {
+      "epoch": 0.017748651785104784,
+      "grad_norm": 0.0064108301885426044,
+      "learning_rate": 9.999574479361724e-05,
+      "loss": 46.0,
+      "step": 130
+    },
+    {
+      "epoch": 0.017885179875759438,
+      "grad_norm": 0.00650253938511014,
+      "learning_rate": 9.999545638955276e-05,
+      "loss": 46.0,
+      "step": 131
+    },
+    {
+      "epoch": 0.01802170796641409,
+      "grad_norm": 0.0060740187764167786,
+      "learning_rate": 9.99951585301994e-05,
+      "loss": 46.0,
+      "step": 132
+    },
+    {
+      "epoch": 0.018158236057068743,
+      "grad_norm": 0.0071415649726986885,
+      "learning_rate": 9.999485121561354e-05,
+      "loss": 46.0,
+      "step": 133
+    },
+    {
+      "epoch": 0.018294764147723394,
+      "grad_norm": 0.006379471626132727,
+      "learning_rate": 9.999453444585326e-05,
+      "loss": 46.0,
+      "step": 134
+    },
+    {
+      "epoch": 0.018431292238378045,
+      "grad_norm": 0.006318703293800354,
+      "learning_rate": 9.999420822097848e-05,
+      "loss": 46.0,
+      "step": 135
+    },
+    {
+      "epoch": 0.0185678203290327,
+      "grad_norm": 0.006868820637464523,
+      "learning_rate": 9.99938725410509e-05,
+      "loss": 46.0,
+      "step": 136
+    },
+    {
+      "epoch": 0.01870434841968735,
+      "grad_norm": 0.007661610376089811,
+      "learning_rate": 9.999352740613399e-05,
+      "loss": 46.0,
+      "step": 137
+    },
+    {
+      "epoch": 0.018840876510342004,
+      "grad_norm": 0.006655195727944374,
+      "learning_rate": 9.999317281629304e-05,
+      "loss": 46.0,
+      "step": 138
+    },
+    {
+      "epoch": 0.018977404600996655,
+      "grad_norm": 0.007112228311598301,
+      "learning_rate": 9.999280877159512e-05,
+      "loss": 46.0,
+      "step": 139
+    },
+    {
+      "epoch": 0.019113932691651306,
+      "grad_norm": 0.006929009687155485,
+      "learning_rate": 9.999243527210905e-05,
+      "loss": 46.0,
+      "step": 140
+    },
+    {
+      "epoch": 0.01925046078230596,
+      "grad_norm": 0.006534802261739969,
+      "learning_rate": 9.999205231790547e-05,
+      "loss": 46.0,
+      "step": 141
+    },
+    {
+      "epoch": 0.01938698887296061,
+      "grad_norm": 0.007388260681182146,
+      "learning_rate": 9.999165990905683e-05,
+      "loss": 46.0,
+      "step": 142
+    },
+    {
+      "epoch": 0.019523516963615265,
+      "grad_norm": 0.008180912584066391,
+      "learning_rate": 9.999125804563732e-05,
+      "loss": 46.0,
+      "step": 143
+    },
+    {
+      "epoch": 0.019660045054269916,
+      "grad_norm": 0.007751926779747009,
+      "learning_rate": 9.999084672772297e-05,
+      "loss": 46.0,
+      "step": 144
+    },
+    {
+      "epoch": 0.019796573144924567,
+      "grad_norm": 0.007935077883303165,
+      "learning_rate": 9.999042595539155e-05,
+      "loss": 46.0,
+      "step": 145
+    },
+    {
+      "epoch": 0.01993310123557922,
+      "grad_norm": 0.009522893466055393,
+      "learning_rate": 9.998999572872261e-05,
+      "loss": 46.0,
+      "step": 146
+    },
+    {
+      "epoch": 0.020069629326233872,
+      "grad_norm": 0.010623243637382984,
+      "learning_rate": 9.998955604779759e-05,
+      "loss": 46.0,
+      "step": 147
+    },
+    {
+      "epoch": 0.020206157416888526,
+      "grad_norm": 0.01245472114533186,
+      "learning_rate": 9.998910691269955e-05,
+      "loss": 46.0,
+      "step": 148
+    },
+    {
+      "epoch": 0.020342685507543177,
+      "grad_norm": 0.017580725252628326,
+      "learning_rate": 9.99886483235135e-05,
+      "loss": 46.0,
+      "step": 149
+    },
+    {
+      "epoch": 0.020479213598197828,
+      "grad_norm": 0.03349756821990013,
+      "learning_rate": 9.998818028032617e-05,
+      "loss": 46.0,
+      "step": 150
+    },
+    {
+      "epoch": 0.020615741688852482,
+      "grad_norm": 0.008304811082780361,
+      "learning_rate": 9.998770278322604e-05,
+      "loss": 46.0,
+      "step": 151
+    },
+    {
+      "epoch": 0.020752269779507133,
+      "grad_norm": 0.004884021822363138,
+      "learning_rate": 9.998721583230345e-05,
+      "loss": 46.0,
+      "step": 152
+    },
+    {
+      "epoch": 0.020888797870161787,
+      "grad_norm": 0.005798705387860537,
+      "learning_rate": 9.998671942765047e-05,
+      "loss": 46.0,
+      "step": 153
+    },
+    {
+      "epoch": 0.021025325960816438,
+      "grad_norm": 0.004822134971618652,
+      "learning_rate": 9.998621356936098e-05,
+      "loss": 46.0,
+      "step": 154
+    },
+    {
+      "epoch": 0.02116185405147109,
+      "grad_norm": 0.0050971838645637035,
+      "learning_rate": 9.998569825753065e-05,
+      "loss": 46.0,
+      "step": 155
+    },
+    {
+      "epoch": 0.021298382142125743,
+      "grad_norm": 0.005127623211592436,
+      "learning_rate": 9.998517349225698e-05,
+      "loss": 46.0,
+      "step": 156
+    },
+    {
+      "epoch": 0.021434910232780394,
+      "grad_norm": 0.004974557552486658,
+      "learning_rate": 9.998463927363915e-05,
+      "loss": 46.0,
+      "step": 157
+    },
+    {
+      "epoch": 0.021571438323435048,
+      "grad_norm": 0.004913656506687403,
+      "learning_rate": 9.998409560177824e-05,
+      "loss": 46.0,
+      "step": 158
+    },
+    {
+      "epoch": 0.0217079664140897,
+      "grad_norm": 0.005739128682762384,
+      "learning_rate": 9.998354247677705e-05,
+      "loss": 46.0,
+      "step": 159
+    },
+    {
+      "epoch": 0.02184449450474435,
+      "grad_norm": 0.006135366391390562,
+      "learning_rate": 9.998297989874019e-05,
+      "loss": 46.0,
+      "step": 160
+    },
+    {
+      "epoch": 0.021981022595399004,
+      "grad_norm": 0.0056165060959756374,
+      "learning_rate": 9.998240786777407e-05,
+      "loss": 46.0,
+      "step": 161
+    },
+    {
+      "epoch": 0.022117550686053655,
+      "grad_norm": 0.0049142674542963505,
+      "learning_rate": 9.998182638398685e-05,
+      "loss": 46.0,
+      "step": 162
+    },
+    {
+      "epoch": 0.02225407877670831,
+      "grad_norm": 0.004945337772369385,
+      "learning_rate": 9.998123544748852e-05,
+      "loss": 46.0,
+      "step": 163
+    },
+    {
+      "epoch": 0.02239060686736296,
+      "grad_norm": 0.006471709348261356,
+      "learning_rate": 9.998063505839083e-05,
+      "loss": 46.0,
+      "step": 164
+    },
+    {
+      "epoch": 0.02252713495801761,
+      "grad_norm": 0.006015671882778406,
+      "learning_rate": 9.998002521680734e-05,
+      "loss": 46.0,
+      "step": 165
+    },
+    {
+      "epoch": 0.022663663048672265,
+      "grad_norm": 0.0063506849110126495,
+      "learning_rate": 9.997940592285338e-05,
+      "loss": 46.0,
+      "step": 166
+    },
+    {
+      "epoch": 0.022800191139326915,
+      "grad_norm": 0.00549672357738018,
+      "learning_rate": 9.997877717664607e-05,
+      "loss": 46.0,
+      "step": 167
+    },
+    {
+      "epoch": 0.02293671922998157,
+      "grad_norm": 0.005861447658389807,
+      "learning_rate": 9.997813897830433e-05,
+      "loss": 46.0,
+      "step": 168
+    },
+    {
+      "epoch": 0.02307324732063622,
+      "grad_norm": 0.005863008089363575,
+      "learning_rate": 9.997749132794882e-05,
+      "loss": 46.0,
+      "step": 169
+    },
+    {
+      "epoch": 0.023209775411290875,
+      "grad_norm": 0.0062310416251420975,
+      "learning_rate": 9.997683422570207e-05,
+      "loss": 46.0,
+      "step": 170
+    },
+    {
+      "epoch": 0.023346303501945526,
+      "grad_norm": 0.006629048381000757,
+      "learning_rate": 9.997616767168836e-05,
+      "loss": 46.0,
+      "step": 171
+    },
+    {
+      "epoch": 0.023482831592600176,
+      "grad_norm": 0.00669481186196208,
+      "learning_rate": 9.997549166603371e-05,
+      "loss": 46.0,
+      "step": 172
+    },
+    {
+      "epoch": 0.02361935968325483,
+      "grad_norm": 0.0066991448402404785,
+      "learning_rate": 9.997480620886599e-05,
+      "loss": 46.0,
+      "step": 173
+    },
+    {
+      "epoch": 0.02375588777390948,
+      "grad_norm": 0.006779018323868513,
+      "learning_rate": 9.997411130031482e-05,
+      "loss": 46.0,
+      "step": 174
+    },
+    {
+      "epoch": 0.023892415864564136,
+      "grad_norm": 0.005892460234463215,
+      "learning_rate": 9.997340694051164e-05,
+      "loss": 46.0,
+      "step": 175
+    },
+    {
+      "epoch": 0.024028943955218787,
+      "grad_norm": 0.006875636056065559,
+      "learning_rate": 9.997269312958965e-05,
+      "loss": 46.0,
+      "step": 176
+    },
+    {
+      "epoch": 0.024165472045873437,
+      "grad_norm": 0.006511778105050325,
+      "learning_rate": 9.997196986768387e-05,
+      "loss": 46.0,
+      "step": 177
+    },
+    {
+      "epoch": 0.02430200013652809,
+      "grad_norm": 0.006331036798655987,
+      "learning_rate": 9.997123715493106e-05,
+      "loss": 46.0,
+      "step": 178
+    },
+    {
+      "epoch": 0.024438528227182742,
+      "grad_norm": 0.007284753955900669,
+      "learning_rate": 9.99704949914698e-05,
+      "loss": 46.0,
+      "step": 179
+    },
+    {
+      "epoch": 0.024575056317837397,
+      "grad_norm": 0.006993473507463932,
+      "learning_rate": 9.996974337744046e-05,
+      "loss": 46.0,
+      "step": 180
+    },
+    {
+      "epoch": 0.024711584408492047,
+      "grad_norm": 0.007088113576173782,
+      "learning_rate": 9.996898231298519e-05,
+      "loss": 46.0,
+      "step": 181
+    },
+    {
+      "epoch": 0.024848112499146698,
+      "grad_norm": 0.0066872392781078815,
+      "learning_rate": 9.996821179824789e-05,
+      "loss": 46.0,
+      "step": 182
+    },
+    {
+      "epoch": 0.024984640589801353,
+      "grad_norm": 0.006381938699632883,
+      "learning_rate": 9.996743183337432e-05,
+      "loss": 46.0,
+      "step": 183
+    },
+    {
+      "epoch": 0.025121168680456003,
+      "grad_norm": 0.007271216716617346,
+      "learning_rate": 9.996664241851197e-05,
+      "loss": 46.0,
+      "step": 184
+    },
+    {
+      "epoch": 0.025257696771110658,
+      "grad_norm": 0.006416036281734705,
+      "learning_rate": 9.996584355381016e-05,
+      "loss": 46.0,
+      "step": 185
+    },
+    {
+      "epoch": 0.02539422486176531,
+      "grad_norm": 0.006877813022583723,
+      "learning_rate": 9.996503523941994e-05,
+      "loss": 46.0,
+      "step": 186
+    },
+    {
+      "epoch": 0.02553075295241996,
+      "grad_norm": 0.007740365341305733,
+      "learning_rate": 9.996421747549419e-05,
+      "loss": 46.0,
+      "step": 187
+    },
+    {
+      "epoch": 0.025667281043074613,
+      "grad_norm": 0.007230641320347786,
+      "learning_rate": 9.996339026218759e-05,
+      "loss": 46.0,
+      "step": 188
+    },
+    {
+      "epoch": 0.025803809133729264,
+      "grad_norm": 0.00781883206218481,
+      "learning_rate": 9.996255359965656e-05,
+      "loss": 46.0,
+      "step": 189
+    },
+    {
+      "epoch": 0.02594033722438392,
+      "grad_norm": 0.006809736602008343,
+      "learning_rate": 9.996170748805935e-05,
+      "loss": 46.0,
+      "step": 190
+    },
+    {
+      "epoch": 0.02607686531503857,
+      "grad_norm": 0.007462185341864824,
+      "learning_rate": 9.996085192755596e-05,
+      "loss": 46.0,
+      "step": 191
+    },
+    {
+      "epoch": 0.02621339340569322,
+      "grad_norm": 0.008371432311832905,
+      "learning_rate": 9.995998691830821e-05,
+      "loss": 46.0,
+      "step": 192
+    },
+    {
+      "epoch": 0.026349921496347874,
+      "grad_norm": 0.007817413657903671,
+      "learning_rate": 9.995911246047971e-05,
+      "loss": 46.0,
+      "step": 193
+    },
+    {
+      "epoch": 0.026486449587002525,
+      "grad_norm": 0.008794574998319149,
+      "learning_rate": 9.995822855423579e-05,
+      "loss": 46.0,
+      "step": 194
+    },
+    {
+      "epoch": 0.02662297767765718,
+      "grad_norm": 0.009037042036652565,
+      "learning_rate": 9.995733519974366e-05,
+      "loss": 46.0,
+      "step": 195
+    },
+    {
+      "epoch": 0.02675950576831183,
+      "grad_norm": 0.009956594556570053,
+      "learning_rate": 9.995643239717227e-05,
+      "loss": 46.0,
+      "step": 196
+    },
+    {
+      "epoch": 0.02689603385896648,
+      "grad_norm": 0.011966821737587452,
+      "learning_rate": 9.995552014669235e-05,
+      "loss": 46.0,
+      "step": 197
+    },
+    {
+      "epoch": 0.027032561949621135,
+      "grad_norm": 0.012152734212577343,
+      "learning_rate": 9.995459844847643e-05,
+      "loss": 46.0,
+      "step": 198
+    },
+    {
+      "epoch": 0.027169090040275786,
+      "grad_norm": 0.01459161750972271,
+      "learning_rate": 9.995366730269881e-05,
+      "loss": 46.0,
+      "step": 199
+    },
+    {
+      "epoch": 0.02730561813093044,
+      "grad_norm": 0.03182811290025711,
+      "learning_rate": 9.995272670953561e-05,
+      "loss": 46.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.02744214622158509,
+      "grad_norm": 0.007998433895409107,
+      "learning_rate": 9.995177666916472e-05,
+      "loss": 46.0,
+      "step": 201
+    },
+    {
+      "epoch": 0.027578674312239742,
+      "grad_norm": 0.005067504942417145,
+      "learning_rate": 9.99508171817658e-05,
+      "loss": 46.0,
+      "step": 202
+    },
+    {
+      "epoch": 0.027715202402894396,
+      "grad_norm": 0.005280999932438135,
+      "learning_rate": 9.994984824752032e-05,
+      "loss": 46.0,
+      "step": 203
+    },
+    {
+      "epoch": 0.027851730493549047,
+      "grad_norm": 0.004853568505495787,
+      "learning_rate": 9.994886986661153e-05,
+      "loss": 46.0,
+      "step": 204
+    },
+    {
+      "epoch": 0.0279882585842037,
+      "grad_norm": 0.005801225081086159,
+      "learning_rate": 9.994788203922447e-05,
+      "loss": 46.0,
+      "step": 205
+    },
+    {
+      "epoch": 0.028124786674858352,
+      "grad_norm": 0.005099669564515352,
+      "learning_rate": 9.994688476554592e-05,
+      "loss": 46.0,
+      "step": 206
+    },
+    {
+      "epoch": 0.028261314765513003,
+      "grad_norm": 0.005158513318747282,
+      "learning_rate": 9.994587804576453e-05,
+      "loss": 46.0,
+      "step": 207
+    },
+    {
+      "epoch": 0.028397842856167657,
+      "grad_norm": 0.005342944525182247,
+      "learning_rate": 9.994486188007071e-05,
+      "loss": 46.0,
+      "step": 208
+    },
+    {
+      "epoch": 0.028534370946822308,
+      "grad_norm": 0.006135035306215286,
+      "learning_rate": 9.994383626865658e-05,
+      "loss": 46.0,
+      "step": 209
+    },
+    {
+      "epoch": 0.028670899037476962,
+      "grad_norm": 0.006725645624101162,
+      "learning_rate": 9.994280121171615e-05,
+      "loss": 46.0,
+      "step": 210
+    },
+    {
+      "epoch": 0.028807427128131613,
+      "grad_norm": 0.005914472043514252,
+      "learning_rate": 9.994175670944517e-05,
+      "loss": 46.0,
+      "step": 211
+    },
+    {
+      "epoch": 0.028943955218786264,
+      "grad_norm": 0.005872462410479784,
+      "learning_rate": 9.994070276204116e-05,
+      "loss": 46.0,
+      "step": 212
+    },
+    {
+      "epoch": 0.029080483309440918,
+      "grad_norm": 0.005198659375309944,
+      "learning_rate": 9.993963936970346e-05,
+      "loss": 46.0,
+      "step": 213
+    },
+    {
+      "epoch": 0.02921701140009557,
+      "grad_norm": 0.006306622643023729,
+      "learning_rate": 9.993856653263319e-05,
+      "loss": 46.0,
+      "step": 214
+    },
+    {
+      "epoch": 0.029353539490750223,
+      "grad_norm": 0.006014724727720022,
+      "learning_rate": 9.993748425103322e-05,
+      "loss": 46.0,
+      "step": 215
+    },
+    {
+      "epoch": 0.029490067581404874,
+      "grad_norm": 0.0065292054787278175,
+      "learning_rate": 9.993639252510824e-05,
+      "loss": 46.0,
+      "step": 216
+    },
+    {
+      "epoch": 0.029626595672059525,
+      "grad_norm": 0.006609591655433178,
+      "learning_rate": 9.993529135506476e-05,
+      "loss": 46.0,
+      "step": 217
+    },
+    {
+      "epoch": 0.02976312376271418,
+      "grad_norm": 0.006543149705976248,
+      "learning_rate": 9.993418074111101e-05,
+      "loss": 46.0,
+      "step": 218
+    },
+    {
+      "epoch": 0.02989965185336883,
+      "grad_norm": 0.005964506883174181,
+      "learning_rate": 9.9933060683457e-05,
+      "loss": 46.0,
+      "step": 219
+    },
+    {
+      "epoch": 0.030036179944023484,
+      "grad_norm": 0.006460642442107201,
+      "learning_rate": 9.993193118231462e-05,
+      "loss": 46.0,
+      "step": 220
+    },
+    {
+      "epoch": 0.030172708034678135,
+      "grad_norm": 0.005484839901328087,
+      "learning_rate": 9.993079223789744e-05,
+      "loss": 46.0,
+      "step": 221
+    },
+    {
+      "epoch": 0.030309236125332786,
+      "grad_norm": 0.007502452004700899,
+      "learning_rate": 9.992964385042088e-05,
+      "loss": 46.0,
+      "step": 222
+    },
+    {
+      "epoch": 0.03044576421598744,
+      "grad_norm": 0.006395469885319471,
+      "learning_rate": 9.992848602010212e-05,
+      "loss": 46.0,
+      "step": 223
+    },
+    {
+      "epoch": 0.03058229230664209,
+      "grad_norm": 0.006767972372472286,
+      "learning_rate": 9.992731874716013e-05,
+      "loss": 46.0,
+      "step": 224
+    },
+    {
+      "epoch": 0.030718820397296745,
+      "grad_norm": 0.006445927079766989,
+      "learning_rate": 9.992614203181568e-05,
+      "loss": 46.0,
+      "step": 225
+    },
+    {
+      "epoch": 0.030855348487951396,
+      "grad_norm": 0.00642132293432951,
+      "learning_rate": 9.992495587429129e-05,
+      "loss": 46.0,
+      "step": 226
+    },
+    {
+      "epoch": 0.030991876578606047,
+      "grad_norm": 0.006829009857028723,
+      "learning_rate": 9.992376027481131e-05,
+      "loss": 46.0,
+      "step": 227
+    },
+    {
+      "epoch": 0.0311284046692607,
+      "grad_norm": 0.0070356884971261024,
+      "learning_rate": 9.992255523360186e-05,
+      "loss": 46.0,
+      "step": 228
+    },
+    {
+      "epoch": 0.031264932759915355,
+      "grad_norm": 0.006753654219210148,
+      "learning_rate": 9.992134075089084e-05,
+      "loss": 46.0,
+      "step": 229
+    },
+    {
+      "epoch": 0.03140146085057,
+      "grad_norm": 0.0069990940392017365,
+      "learning_rate": 9.992011682690791e-05,
+      "loss": 46.0,
+      "step": 230
+    },
+    {
+      "epoch": 0.03153798894122466,
+      "grad_norm": 0.006793106906116009,
+      "learning_rate": 9.991888346188456e-05,
+      "loss": 46.0,
+      "step": 231
+    },
+    {
+      "epoch": 0.03167451703187931,
+      "grad_norm": 0.0064877672120928764,
+      "learning_rate": 9.991764065605406e-05,
+      "loss": 46.0,
+      "step": 232
+    },
+    {
+      "epoch": 0.03181104512253396,
+      "grad_norm": 0.006404773332178593,
+      "learning_rate": 9.991638840965143e-05,
+      "loss": 46.0,
+      "step": 233
+    },
+    {
+      "epoch": 0.03194757321318861,
+      "grad_norm": 0.006637753453105688,
+      "learning_rate": 9.991512672291352e-05,
+      "loss": 46.0,
+      "step": 234
+    },
+    {
+      "epoch": 0.03208410130384327,
+      "grad_norm": 0.007091572508215904,
+      "learning_rate": 9.991385559607892e-05,
+      "loss": 46.0,
+      "step": 235
+    },
+    {
+      "epoch": 0.03222062939449792,
+      "grad_norm": 0.007286660838872194,
+      "learning_rate": 9.991257502938804e-05,
+      "loss": 46.0,
+      "step": 236
+    },
+    {
+      "epoch": 0.03235715748515257,
+      "grad_norm": 0.007123738061636686,
+      "learning_rate": 9.991128502308308e-05,
+      "loss": 46.0,
+      "step": 237
+    },
+    {
+      "epoch": 0.03249368557580722,
+      "grad_norm": 0.00724747683852911,
+      "learning_rate": 9.990998557740801e-05,
+      "loss": 46.0,
+      "step": 238
+    },
+    {
+      "epoch": 0.03263021366646188,
+      "grad_norm": 0.006686553359031677,
+      "learning_rate": 9.990867669260854e-05,
+      "loss": 46.0,
+      "step": 239
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 7324,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 239,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 22525992566784.0,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c01eaa5c89377b6f36b549725fe0f70db6ad9697635b5f0bf190ec77cff069ac
+size 6840

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff