Upload folder using huggingface_hub

by CreatorPhan - opened Oct 12, 2023

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+787

-312

Files changed (13) hide show

README.md +198 -0
adapter_model.bin +1 -1
checkpoint-100/README.md +205 -1
checkpoint-100/adapter_model.bin +1 -1
checkpoint-100/optimizer.pt +1 -1
checkpoint-100/rng_state.pth +1 -1
checkpoint-100/scheduler.pt +1 -1
checkpoint-100/tokenizer_config.json +35 -0
checkpoint-100/trainer_state.json +304 -304
checkpoint-100/training_args.bin +1 -1
runs/Oct12_18-12-36_63a985a0dcf5/events.out.tfevents.1697134361.63a985a0dcf5.4074.0 +3 -0
tokenizer_config.json +35 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -1,6 +1,203 @@
 ---
 library_name: peft
 ---
 ## Training procedure
@@ -15,6 +212,7 @@ The following `bitsandbytes` quantization config was used during training:
 - bnb_4bit_quant_type: fp4
 - bnb_4bit_use_double_quant: False
 - bnb_4bit_compute_dtype: float32
 ### Framework versions

 ---
 library_name: peft
+base_model: bigscience/bloomz-3b
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Data Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
 ## Training procedure
 - bnb_4bit_quant_type: fp4
 - bnb_4bit_use_double_quant: False
 - bnb_4bit_compute_dtype: float32
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f054a50c1c97a69a96573ad64c2b580cacf6e043599b8e7a34409e206f02b3e6
 size 39409357

 version https://git-lfs.github.com/spec/v1
+oid sha256:2e40b7b6ec13002db8ba2108993b9af6829adb99a2fddff2d6b009c40f622d55
 size 39409357

checkpoint-100/README.md CHANGED Viewed

@@ -1,6 +1,203 @@
 ---
 library_name: peft
 ---
 ## Training procedure
@@ -16,6 +213,13 @@ The following `bitsandbytes` quantization config was used during training:
 - bnb_4bit_use_double_quant: False
 - bnb_4bit_compute_dtype: float32
 The following `bitsandbytes` quantization config was used during training:
 - quant_method: bitsandbytes
 - load_in_8bit: True
@@ -27,8 +231,8 @@ The following `bitsandbytes` quantization config was used during training:
 - bnb_4bit_quant_type: fp4
 - bnb_4bit_use_double_quant: False
 - bnb_4bit_compute_dtype: float32
 ### Framework versions
-- PEFT 0.6.0.dev0
 - PEFT 0.6.0.dev0

 ---
 library_name: peft
+base_model: bigscience/bloomz-3b
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Data Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
 ## Training procedure
 - bnb_4bit_use_double_quant: False
 - bnb_4bit_compute_dtype: float32
+### Framework versions
+- PEFT 0.6.0.dev0
+## Training procedure
 The following `bitsandbytes` quantization config was used during training:
 - quant_method: bitsandbytes
 - load_in_8bit: True
 - bnb_4bit_quant_type: fp4
 - bnb_4bit_use_double_quant: False
 - bnb_4bit_compute_dtype: float32
 ### Framework versions
 - PEFT 0.6.0.dev0

checkpoint-100/adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e10ec585cc785ec6c1ba52b342ed481e0c422b9ac31d8e20a8a23da9b83db472
 size 39409357

 version https://git-lfs.github.com/spec/v1
+oid sha256:c394307bbd8fa249a03539583b0146418f3ec081d9e00bd4d47de6b6362685a1
 size 39409357

checkpoint-100/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aa455e33c4a5d31afb7a6b507415d81b58428f2df2ae11cfc9a5c88741af4575
 size 78844421

 version https://git-lfs.github.com/spec/v1
+oid sha256:b5c435a8d185fca70ecaa721a10760c7ba3f9eab4a917a3864841f3f9d5ee652
 size 78844421

checkpoint-100/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bc401e4179bc1e7efa275a89930a0253550e2251df7c7fb11bdb457cda3e88aa
 size 14575

 version https://git-lfs.github.com/spec/v1
+oid sha256:36fc71bd44bd7f04f2599c5dface64c517de1a7ab7bac3600f3f6470c6c72673
 size 14575

checkpoint-100/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f3c6698901762d2f6230a1becbf5ebc118ea9c781a5ef978d76e0b649f4f5e37
 size 627

 version https://git-lfs.github.com/spec/v1
+oid sha256:623e145e5ab24cb1507f4f210040814726c2c3abec15b64a36227aa6dd37bb5a
 size 627

checkpoint-100/tokenizer_config.json CHANGED Viewed

@@ -1,5 +1,40 @@
 {
   "add_prefix_space": false,
   "bos_token": "<s>",
   "clean_up_tokenization_spaces": false,
   "eos_token": "</s>",

 {
   "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
   "bos_token": "<s>",
   "clean_up_tokenization_spaces": false,
   "eos_token": "</s>",

checkpoint-100/trainer_state.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 32.0,
   "eval_steps": 500,
   "global_step": 100,
   "is_hyper_param_search": false,
@@ -9,611 +9,611 @@
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 0.32,
-      "learning_rate": 0.00019895833333333332,
-      "loss": 2.436,
       "step": 1
     },
     {
-      "epoch": 0.64,
-      "learning_rate": 0.0001979166666666667,
-      "loss": 2.134,
       "step": 2
     },
     {
-      "epoch": 0.96,
-      "learning_rate": 0.000196875,
-      "loss": 1.8754,
       "step": 3
     },
     {
-      "epoch": 1.28,
-      "learning_rate": 0.00019583333333333334,
-      "loss": 1.7027,
       "step": 4
     },
     {
-      "epoch": 1.6,
-      "learning_rate": 0.00019479166666666668,
-      "loss": 1.5335,
       "step": 5
     },
     {
-      "epoch": 1.92,
-      "learning_rate": 0.00019375000000000002,
-      "loss": 1.4574,
       "step": 6
     },
     {
-      "epoch": 2.24,
-      "learning_rate": 0.00019270833333333333,
-      "loss": 1.3616,
       "step": 7
     },
     {
-      "epoch": 2.56,
-      "learning_rate": 0.00019166666666666667,
-      "loss": 1.2486,
       "step": 8
     },
     {
-      "epoch": 2.88,
-      "learning_rate": 0.000190625,
-      "loss": 1.2004,
       "step": 9
     },
     {
-      "epoch": 3.2,
-      "learning_rate": 0.00018958333333333332,
-      "loss": 1.1875,
       "step": 10
     },
     {
-      "epoch": 3.52,
-      "learning_rate": 0.0001885416666666667,
-      "loss": 1.146,
       "step": 11
     },
     {
-      "epoch": 3.84,
-      "learning_rate": 0.0001875,
-      "loss": 1.1426,
       "step": 12
     },
     {
-      "epoch": 4.16,
-      "learning_rate": 0.00018645833333333334,
-      "loss": 1.081,
       "step": 13
     },
     {
-      "epoch": 4.48,
-      "learning_rate": 0.00018541666666666668,
-      "loss": 1.0904,
       "step": 14
     },
     {
-      "epoch": 4.8,
-      "learning_rate": 0.000184375,
-      "loss": 1.0793,
       "step": 15
     },
     {
-      "epoch": 5.12,
-      "learning_rate": 0.00018333333333333334,
-      "loss": 1.0554,
       "step": 16
     },
     {
-      "epoch": 5.44,
-      "learning_rate": 0.00018229166666666667,
-      "loss": 1.0499,
       "step": 17
     },
     {
-      "epoch": 5.76,
-      "learning_rate": 0.00018125000000000001,
-      "loss": 1.0378,
       "step": 18
     },
     {
-      "epoch": 6.08,
-      "learning_rate": 0.00018020833333333333,
-      "loss": 1.0245,
       "step": 19
     },
     {
-      "epoch": 6.4,
-      "learning_rate": 0.0001791666666666667,
-      "loss": 1.0436,
       "step": 20
     },
     {
-      "epoch": 6.72,
-      "learning_rate": 0.000178125,
-      "loss": 0.9766,
       "step": 21
     },
     {
-      "epoch": 7.04,
-      "learning_rate": 0.00017708333333333335,
-      "loss": 0.9664,
       "step": 22
     },
     {
-      "epoch": 7.36,
-      "learning_rate": 0.00017604166666666669,
-      "loss": 0.9977,
       "step": 23
     },
     {
-      "epoch": 7.68,
-      "learning_rate": 0.000175,
-      "loss": 0.9671,
       "step": 24
     },
     {
-      "epoch": 8.0,
-      "learning_rate": 0.00017395833333333334,
-      "loss": 0.9229,
       "step": 25
     },
     {
-      "epoch": 8.32,
-      "learning_rate": 0.00017291666666666668,
-      "loss": 0.957,
       "step": 26
     },
     {
-      "epoch": 8.64,
-      "learning_rate": 0.00017187500000000002,
-      "loss": 0.9213,
       "step": 27
     },
     {
-      "epoch": 8.96,
-      "learning_rate": 0.00017083333333333333,
-      "loss": 0.9147,
       "step": 28
     },
     {
-      "epoch": 9.28,
-      "learning_rate": 0.00016979166666666667,
-      "loss": 0.8976,
       "step": 29
     },
     {
-      "epoch": 9.6,
-      "learning_rate": 0.00016875,
-      "loss": 0.8961,
       "step": 30
     },
     {
-      "epoch": 9.92,
-      "learning_rate": 0.00016770833333333332,
-      "loss": 0.8934,
       "step": 31
     },
     {
-      "epoch": 10.24,
-      "learning_rate": 0.0001666666666666667,
-      "loss": 0.865,
       "step": 32
     },
     {
-      "epoch": 10.56,
-      "learning_rate": 0.000165625,
-      "loss": 0.8952,
       "step": 33
     },
     {
-      "epoch": 10.88,
-      "learning_rate": 0.00016458333333333334,
-      "loss": 0.832,
       "step": 34
     },
     {
-      "epoch": 11.2,
-      "learning_rate": 0.00016354166666666668,
-      "loss": 0.8241,
       "step": 35
     },
     {
-      "epoch": 11.52,
-      "learning_rate": 0.00016250000000000002,
-      "loss": 0.834,
       "step": 36
     },
     {
-      "epoch": 11.84,
-      "learning_rate": 0.00016145833333333333,
-      "loss": 0.8305,
       "step": 37
     },
     {
-      "epoch": 12.16,
-      "learning_rate": 0.00016041666666666667,
-      "loss": 0.7752,
       "step": 38
     },
     {
-      "epoch": 12.48,
-      "learning_rate": 0.000159375,
-      "loss": 0.8084,
       "step": 39
     },
     {
-      "epoch": 12.8,
-      "learning_rate": 0.00015833333333333332,
-      "loss": 0.7757,
       "step": 40
     },
     {
-      "epoch": 13.12,
-      "learning_rate": 0.0001572916666666667,
-      "loss": 0.7724,
       "step": 41
     },
     {
-      "epoch": 13.44,
-      "learning_rate": 0.00015625,
-      "loss": 0.7478,
       "step": 42
     },
     {
-      "epoch": 13.76,
-      "learning_rate": 0.00015520833333333334,
-      "loss": 0.7291,
       "step": 43
     },
     {
-      "epoch": 14.08,
-      "learning_rate": 0.00015416666666666668,
-      "loss": 0.7444,
       "step": 44
     },
     {
-      "epoch": 14.4,
-      "learning_rate": 0.000153125,
-      "loss": 0.732,
       "step": 45
     },
     {
-      "epoch": 14.72,
-      "learning_rate": 0.00015208333333333333,
-      "loss": 0.6892,
       "step": 46
     },
     {
-      "epoch": 15.04,
-      "learning_rate": 0.00015104166666666667,
-      "loss": 0.6804,
       "step": 47
     },
     {
-      "epoch": 15.36,
-      "learning_rate": 0.00015000000000000001,
-      "loss": 0.668,
       "step": 48
     },
     {
-      "epoch": 15.68,
-      "learning_rate": 0.00014895833333333333,
-      "loss": 0.6568,
       "step": 49
     },
     {
-      "epoch": 16.0,
-      "learning_rate": 0.0001479166666666667,
-      "loss": 0.6475,
       "step": 50
     },
     {
-      "epoch": 16.32,
-      "learning_rate": 0.000146875,
-      "loss": 0.6317,
       "step": 51
     },
     {
-      "epoch": 16.64,
-      "learning_rate": 0.00014583333333333335,
-      "loss": 0.5976,
       "step": 52
     },
     {
-      "epoch": 16.96,
-      "learning_rate": 0.00014479166666666669,
-      "loss": 0.6074,
       "step": 53
     },
     {
-      "epoch": 17.28,
-      "learning_rate": 0.00014375,
-      "loss": 0.5905,
       "step": 54
     },
     {
-      "epoch": 17.6,
-      "learning_rate": 0.00014270833333333334,
-      "loss": 0.5564,
       "step": 55
     },
     {
-      "epoch": 17.92,
-      "learning_rate": 0.00014166666666666668,
-      "loss": 0.5773,
       "step": 56
     },
     {
-      "epoch": 18.24,
-      "learning_rate": 0.00014062500000000002,
-      "loss": 0.5337,
       "step": 57
     },
     {
-      "epoch": 18.56,
-      "learning_rate": 0.00013958333333333333,
-      "loss": 0.5227,
       "step": 58
     },
     {
-      "epoch": 18.88,
-      "learning_rate": 0.00013854166666666667,
-      "loss": 0.5251,
       "step": 59
     },
     {
-      "epoch": 19.2,
-      "learning_rate": 0.0001375,
-      "loss": 0.503,
       "step": 60
     },
     {
-      "epoch": 19.52,
-      "learning_rate": 0.00013645833333333332,
-      "loss": 0.486,
       "step": 61
     },
     {
-      "epoch": 19.84,
-      "learning_rate": 0.0001354166666666667,
-      "loss": 0.4632,
       "step": 62
     },
     {
-      "epoch": 20.16,
-      "learning_rate": 0.000134375,
-      "loss": 0.4734,
       "step": 63
     },
     {
-      "epoch": 20.48,
-      "learning_rate": 0.00013333333333333334,
-      "loss": 0.4212,
       "step": 64
     },
     {
-      "epoch": 20.8,
-      "learning_rate": 0.00013229166666666668,
-      "loss": 0.4255,
       "step": 65
     },
     {
-      "epoch": 21.12,
-      "learning_rate": 0.00013125000000000002,
-      "loss": 0.4231,
       "step": 66
     },
     {
-      "epoch": 21.44,
-      "learning_rate": 0.00013020833333333333,
-      "loss": 0.392,
       "step": 67
     },
     {
-      "epoch": 21.76,
-      "learning_rate": 0.00012916666666666667,
-      "loss": 0.3924,
       "step": 68
     },
     {
-      "epoch": 22.08,
-      "learning_rate": 0.000128125,
-      "loss": 0.3787,
       "step": 69
     },
     {
-      "epoch": 22.4,
-      "learning_rate": 0.00012708333333333332,
-      "loss": 0.3562,
       "step": 70
     },
     {
-      "epoch": 22.72,
-      "learning_rate": 0.0001260416666666667,
-      "loss": 0.3474,
       "step": 71
     },
     {
-      "epoch": 23.04,
-      "learning_rate": 0.000125,
-      "loss": 0.338,
       "step": 72
     },
     {
-      "epoch": 23.36,
-      "learning_rate": 0.00012395833333333334,
-      "loss": 0.326,
       "step": 73
     },
     {
-      "epoch": 23.68,
-      "learning_rate": 0.00012291666666666668,
-      "loss": 0.3049,
       "step": 74
     },
     {
-      "epoch": 24.0,
-      "learning_rate": 0.00012187500000000001,
-      "loss": 0.3032,
       "step": 75
     },
     {
-      "epoch": 24.32,
-      "learning_rate": 0.00012083333333333333,
-      "loss": 0.2957,
       "step": 76
     },
     {
-      "epoch": 24.64,
-      "learning_rate": 0.00011979166666666667,
-      "loss": 0.2771,
       "step": 77
     },
     {
-      "epoch": 24.96,
-      "learning_rate": 0.00011875,
-      "loss": 0.2706,
       "step": 78
     },
     {
-      "epoch": 25.28,
-      "learning_rate": 0.00011770833333333333,
-      "loss": 0.2611,
       "step": 79
     },
     {
-      "epoch": 25.6,
-      "learning_rate": 0.00011666666666666668,
-      "loss": 0.2515,
       "step": 80
     },
     {
-      "epoch": 25.92,
-      "learning_rate": 0.000115625,
-      "loss": 0.2353,
       "step": 81
     },
     {
-      "epoch": 26.24,
-      "learning_rate": 0.00011458333333333333,
-      "loss": 0.2323,
       "step": 82
     },
     {
-      "epoch": 26.56,
-      "learning_rate": 0.00011354166666666668,
-      "loss": 0.2394,
       "step": 83
     },
     {
-      "epoch": 26.88,
-      "learning_rate": 0.00011250000000000001,
-      "loss": 0.2154,
       "step": 84
     },
     {
-      "epoch": 27.2,
-      "learning_rate": 0.00011145833333333334,
-      "loss": 0.2045,
       "step": 85
     },
     {
-      "epoch": 27.52,
-      "learning_rate": 0.00011041666666666668,
-      "loss": 0.2133,
       "step": 86
     },
     {
-      "epoch": 27.84,
-      "learning_rate": 0.000109375,
-      "loss": 0.1995,
       "step": 87
     },
     {
-      "epoch": 28.16,
-      "learning_rate": 0.00010833333333333333,
-      "loss": 0.1938,
       "step": 88
     },
     {
-      "epoch": 28.48,
-      "learning_rate": 0.00010729166666666668,
-      "loss": 0.1881,
       "step": 89
     },
     {
-      "epoch": 28.8,
-      "learning_rate": 0.00010625000000000001,
-      "loss": 0.1814,
       "step": 90
     },
     {
-      "epoch": 29.12,
-      "learning_rate": 0.00010520833333333333,
-      "loss": 0.1707,
       "step": 91
     },
     {
-      "epoch": 29.44,
-      "learning_rate": 0.00010416666666666667,
-      "loss": 0.1716,
       "step": 92
     },
     {
-      "epoch": 29.76,
-      "learning_rate": 0.000103125,
-      "loss": 0.1706,
       "step": 93
     },
     {
-      "epoch": 30.08,
-      "learning_rate": 0.00010208333333333333,
-      "loss": 0.1685,
       "step": 94
     },
     {
-      "epoch": 30.4,
-      "learning_rate": 0.00010104166666666668,
-      "loss": 0.1621,
       "step": 95
     },
     {
-      "epoch": 30.72,
-      "learning_rate": 0.0001,
-      "loss": 0.1548,
       "step": 96
     },
     {
-      "epoch": 31.04,
-      "learning_rate": 9.895833333333334e-05,
-      "loss": 0.1525,
       "step": 97
     },
     {
-      "epoch": 31.36,
-      "learning_rate": 9.791666666666667e-05,
-      "loss": 0.1488,
       "step": 98
     },
     {
-      "epoch": 31.68,
-      "learning_rate": 9.687500000000001e-05,
-      "loss": 0.1423,
       "step": 99
     },
     {
-      "epoch": 32.0,
-      "learning_rate": 9.583333333333334e-05,
-      "loss": 0.1452,
       "step": 100
     }
   ],
   "logging_steps": 1,
-  "max_steps": 192,
-  "num_train_epochs": 64,
   "save_steps": 100,
-  "total_flos": 2.4750253661952e+16,
   "trial_name": null,
   "trial_params": null
 }

 {
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 7.111111111111111,
   "eval_steps": 500,
   "global_step": 100,
   "is_hyper_param_search": false,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.07,
+      "learning_rate": 0.0001985714285714286,
+      "loss": 2.5072,
       "step": 1
     },
     {
+      "epoch": 0.14,
+      "learning_rate": 0.00019714285714285716,
+      "loss": 2.194,
       "step": 2
     },
     {
+      "epoch": 0.21,
+      "learning_rate": 0.00019571428571428572,
+      "loss": 1.9685,
       "step": 3
     },
     {
+      "epoch": 0.28,
+      "learning_rate": 0.0001942857142857143,
+      "loss": 1.7577,
       "step": 4
     },
     {
+      "epoch": 0.36,
+      "learning_rate": 0.00019285714285714286,
+      "loss": 1.6095,
       "step": 5
     },
     {
+      "epoch": 0.43,
+      "learning_rate": 0.00019142857142857145,
+      "loss": 1.5448,
       "step": 6
     },
     {
+      "epoch": 0.5,
+      "learning_rate": 0.00019,
+      "loss": 1.453,
       "step": 7
     },
     {
+      "epoch": 0.57,
+      "learning_rate": 0.00018857142857142857,
+      "loss": 1.41,
       "step": 8
     },
     {
+      "epoch": 0.64,
+      "learning_rate": 0.00018714285714285716,
+      "loss": 1.3054,
       "step": 9
     },
     {
+      "epoch": 0.71,
+      "learning_rate": 0.00018571428571428572,
+      "loss": 1.2634,
       "step": 10
     },
     {
+      "epoch": 0.78,
+      "learning_rate": 0.00018428571428571428,
+      "loss": 1.2269,
       "step": 11
     },
     {
+      "epoch": 0.85,
+      "learning_rate": 0.00018285714285714286,
+      "loss": 1.2405,
       "step": 12
     },
     {
+      "epoch": 0.92,
+      "learning_rate": 0.00018142857142857142,
+      "loss": 1.2436,
       "step": 13
     },
     {
+      "epoch": 1.0,
+      "learning_rate": 0.00018,
+      "loss": 1.2063,
       "step": 14
     },
     {
+      "epoch": 1.07,
+      "learning_rate": 0.0001785714285714286,
+      "loss": 1.1789,
       "step": 15
     },
     {
+      "epoch": 1.14,
+      "learning_rate": 0.00017714285714285713,
+      "loss": 1.2007,
       "step": 16
     },
     {
+      "epoch": 1.21,
+      "learning_rate": 0.00017571428571428572,
+      "loss": 1.1616,
       "step": 17
     },
     {
+      "epoch": 1.28,
+      "learning_rate": 0.0001742857142857143,
+      "loss": 1.157,
       "step": 18
     },
     {
+      "epoch": 1.35,
+      "learning_rate": 0.00017285714285714287,
+      "loss": 1.1555,
       "step": 19
     },
     {
+      "epoch": 1.42,
+      "learning_rate": 0.00017142857142857143,
+      "loss": 1.1559,
       "step": 20
     },
     {
+      "epoch": 1.49,
+      "learning_rate": 0.00017,
+      "loss": 1.1487,
       "step": 21
     },
     {
+      "epoch": 1.56,
+      "learning_rate": 0.00016857142857142857,
+      "loss": 1.1729,
       "step": 22
     },
     {
+      "epoch": 1.64,
+      "learning_rate": 0.00016714285714285716,
+      "loss": 1.1251,
       "step": 23
     },
     {
+      "epoch": 1.71,
+      "learning_rate": 0.00016571428571428575,
+      "loss": 1.1181,
       "step": 24
     },
     {
+      "epoch": 1.78,
+      "learning_rate": 0.00016428571428571428,
+      "loss": 1.1144,
       "step": 25
     },
     {
+      "epoch": 1.85,
+      "learning_rate": 0.00016285714285714287,
+      "loss": 1.1416,
       "step": 26
     },
     {
+      "epoch": 1.92,
+      "learning_rate": 0.00016142857142857145,
+      "loss": 1.0965,
       "step": 27
     },
     {
+      "epoch": 1.99,
+      "learning_rate": 0.00016,
+      "loss": 1.0936,
       "step": 28
     },
     {
+      "epoch": 2.06,
+      "learning_rate": 0.00015857142857142857,
+      "loss": 1.0839,
       "step": 29
     },
     {
+      "epoch": 2.13,
+      "learning_rate": 0.00015714285714285716,
+      "loss": 1.127,
       "step": 30
     },
     {
+      "epoch": 2.2,
+      "learning_rate": 0.00015571428571428572,
+      "loss": 1.0886,
       "step": 31
     },
     {
+      "epoch": 2.28,
+      "learning_rate": 0.0001542857142857143,
+      "loss": 1.0447,
       "step": 32
     },
     {
+      "epoch": 2.35,
+      "learning_rate": 0.00015285714285714287,
+      "loss": 1.0513,
       "step": 33
     },
     {
+      "epoch": 2.42,
+      "learning_rate": 0.00015142857142857143,
+      "loss": 1.098,
       "step": 34
     },
     {
+      "epoch": 2.49,
+      "learning_rate": 0.00015000000000000001,
+      "loss": 1.0628,
       "step": 35
     },
     {
+      "epoch": 2.56,
+      "learning_rate": 0.00014857142857142857,
+      "loss": 1.0814,
       "step": 36
     },
     {
+      "epoch": 2.63,
+      "learning_rate": 0.00014714285714285716,
+      "loss": 1.0638,
       "step": 37
     },
     {
+      "epoch": 2.7,
+      "learning_rate": 0.00014571428571428572,
+      "loss": 1.0652,
       "step": 38
     },
     {
+      "epoch": 2.77,
+      "learning_rate": 0.00014428571428571428,
+      "loss": 1.0463,
       "step": 39
     },
     {
+      "epoch": 2.84,
+      "learning_rate": 0.00014285714285714287,
+      "loss": 1.0349,
       "step": 40
     },
     {
+      "epoch": 2.92,
+      "learning_rate": 0.00014142857142857145,
+      "loss": 1.0165,
       "step": 41
     },
     {
+      "epoch": 2.99,
+      "learning_rate": 0.00014,
+      "loss": 1.0905,
       "step": 42
     },
     {
+      "epoch": 3.06,
+      "learning_rate": 0.00013857142857142857,
+      "loss": 1.0297,
       "step": 43
     },
     {
+      "epoch": 3.13,
+      "learning_rate": 0.00013714285714285716,
+      "loss": 1.0061,
       "step": 44
     },
     {
+      "epoch": 3.2,
+      "learning_rate": 0.00013571428571428572,
+      "loss": 1.0019,
       "step": 45
     },
     {
+      "epoch": 3.27,
+      "learning_rate": 0.00013428571428571428,
+      "loss": 0.9555,
       "step": 46
     },
     {
+      "epoch": 3.34,
+      "learning_rate": 0.00013285714285714287,
+      "loss": 1.038,
       "step": 47
     },
     {
+      "epoch": 3.41,
+      "learning_rate": 0.00013142857142857143,
+      "loss": 0.9932,
       "step": 48
     },
     {
+      "epoch": 3.48,
+      "learning_rate": 0.00013000000000000002,
+      "loss": 1.0451,
       "step": 49
     },
     {
+      "epoch": 3.56,
+      "learning_rate": 0.00012857142857142858,
+      "loss": 1.008,
       "step": 50
     },
     {
+      "epoch": 3.63,
+      "learning_rate": 0.00012714285714285714,
+      "loss": 1.0362,
       "step": 51
     },
     {
+      "epoch": 3.7,
+      "learning_rate": 0.00012571428571428572,
+      "loss": 1.0007,
       "step": 52
     },
     {
+      "epoch": 3.77,
+      "learning_rate": 0.00012428571428571428,
+      "loss": 1.0038,
       "step": 53
     },
     {
+      "epoch": 3.84,
+      "learning_rate": 0.00012285714285714287,
+      "loss": 1.0057,
       "step": 54
     },
     {
+      "epoch": 3.91,
+      "learning_rate": 0.00012142857142857143,
+      "loss": 1.0172,
       "step": 55
     },
     {
+      "epoch": 3.98,
+      "learning_rate": 0.00012,
+      "loss": 0.982,
       "step": 56
     },
     {
+      "epoch": 4.05,
+      "learning_rate": 0.00011857142857142858,
+      "loss": 0.9838,
       "step": 57
     },
     {
+      "epoch": 4.12,
+      "learning_rate": 0.00011714285714285715,
+      "loss": 0.9677,
       "step": 58
     },
     {
+      "epoch": 4.2,
+      "learning_rate": 0.00011571428571428574,
+      "loss": 0.9815,
       "step": 59
     },
     {
+      "epoch": 4.27,
+      "learning_rate": 0.00011428571428571428,
+      "loss": 0.9711,
       "step": 60
     },
     {
+      "epoch": 4.34,
+      "learning_rate": 0.00011285714285714286,
+      "loss": 1.0086,
       "step": 61
     },
     {
+      "epoch": 4.41,
+      "learning_rate": 0.00011142857142857144,
+      "loss": 0.9485,
       "step": 62
     },
     {
+      "epoch": 4.48,
+      "learning_rate": 0.00011000000000000002,
+      "loss": 0.9342,
       "step": 63
     },
     {
+      "epoch": 4.55,
+      "learning_rate": 0.00010857142857142856,
+      "loss": 0.9887,
       "step": 64
     },
     {
+      "epoch": 4.62,
+      "learning_rate": 0.00010714285714285715,
+      "loss": 0.9614,
       "step": 65
     },
     {
+      "epoch": 4.69,
+      "learning_rate": 0.00010571428571428572,
+      "loss": 0.9644,
       "step": 66
     },
     {
+      "epoch": 4.76,
+      "learning_rate": 0.0001042857142857143,
+      "loss": 0.9267,
       "step": 67
     },
     {
+      "epoch": 4.84,
+      "learning_rate": 0.00010285714285714286,
+      "loss": 0.954,
       "step": 68
     },
     {
+      "epoch": 4.91,
+      "learning_rate": 0.00010142857142857143,
+      "loss": 0.919,
       "step": 69
     },
     {
+      "epoch": 4.98,
+      "learning_rate": 0.0001,
+      "loss": 0.9478,
       "step": 70
     },
     {
+      "epoch": 5.05,
+      "learning_rate": 9.857142857142858e-05,
+      "loss": 0.9559,
       "step": 71
     },
     {
+      "epoch": 5.12,
+      "learning_rate": 9.714285714285715e-05,
+      "loss": 0.9596,
       "step": 72
     },
     {
+      "epoch": 5.19,
+      "learning_rate": 9.571428571428573e-05,
+      "loss": 0.9151,
       "step": 73
     },
     {
+      "epoch": 5.26,
+      "learning_rate": 9.428571428571429e-05,
+      "loss": 0.9059,
       "step": 74
     },
     {
+      "epoch": 5.33,
+      "learning_rate": 9.285714285714286e-05,
+      "loss": 0.8717,
       "step": 75
     },
     {
+      "epoch": 5.4,
+      "learning_rate": 9.142857142857143e-05,
+      "loss": 0.8912,
       "step": 76
     },
     {
+      "epoch": 5.48,
+      "learning_rate": 9e-05,
+      "loss": 0.9166,
       "step": 77
     },
     {
+      "epoch": 5.55,
+      "learning_rate": 8.857142857142857e-05,
+      "loss": 0.9362,
       "step": 78
     },
     {
+      "epoch": 5.62,
+      "learning_rate": 8.714285714285715e-05,
+      "loss": 0.8969,
       "step": 79
     },
     {
+      "epoch": 5.69,
+      "learning_rate": 8.571428571428571e-05,
+      "loss": 0.898,
       "step": 80
     },
     {
+      "epoch": 5.76,
+      "learning_rate": 8.428571428571429e-05,
+      "loss": 0.8626,
       "step": 81
     },
     {
+      "epoch": 5.83,
+      "learning_rate": 8.285714285714287e-05,
+      "loss": 0.9353,
       "step": 82
     },
     {
+      "epoch": 5.9,
+      "learning_rate": 8.142857142857143e-05,
+      "loss": 0.9353,
       "step": 83
     },
     {
+      "epoch": 5.97,
+      "learning_rate": 8e-05,
+      "loss": 0.9277,
       "step": 84
     },
     {
+      "epoch": 6.04,
+      "learning_rate": 7.857142857142858e-05,
+      "loss": 0.8856,
       "step": 85
     },
     {
+      "epoch": 6.12,
+      "learning_rate": 7.714285714285715e-05,
+      "loss": 0.8771,
       "step": 86
     },
     {
+      "epoch": 6.19,
+      "learning_rate": 7.571428571428571e-05,
+      "loss": 0.8634,
       "step": 87
     },
     {
+      "epoch": 6.26,
+      "learning_rate": 7.428571428571429e-05,
+      "loss": 0.8655,
       "step": 88
     },
     {
+      "epoch": 6.33,
+      "learning_rate": 7.285714285714286e-05,
+      "loss": 0.856,
       "step": 89
     },
     {
+      "epoch": 6.4,
+      "learning_rate": 7.142857142857143e-05,
+      "loss": 0.8929,
       "step": 90
     },
     {
+      "epoch": 6.47,
+      "learning_rate": 7e-05,
+      "loss": 0.8844,
       "step": 91
     },
     {
+      "epoch": 6.54,
+      "learning_rate": 6.857142857142858e-05,
+      "loss": 0.8951,
       "step": 92
     },
     {
+      "epoch": 6.61,
+      "learning_rate": 6.714285714285714e-05,
+      "loss": 0.8385,
       "step": 93
     },
     {
+      "epoch": 6.68,
+      "learning_rate": 6.571428571428571e-05,
+      "loss": 0.873,
       "step": 94
     },
     {
+      "epoch": 6.76,
+      "learning_rate": 6.428571428571429e-05,
+      "loss": 0.9033,
       "step": 95
     },
     {
+      "epoch": 6.83,
+      "learning_rate": 6.285714285714286e-05,
+      "loss": 0.8643,
       "step": 96
     },
     {
+      "epoch": 6.9,
+      "learning_rate": 6.142857142857143e-05,
+      "loss": 0.8894,
       "step": 97
     },
     {
+      "epoch": 6.97,
+      "learning_rate": 6e-05,
+      "loss": 0.8436,
       "step": 98
     },
     {
+      "epoch": 7.04,
+      "learning_rate": 5.8571428571428575e-05,
+      "loss": 0.8362,
       "step": 99
     },
     {
+      "epoch": 7.11,
+      "learning_rate": 5.714285714285714e-05,
+      "loss": 0.8162,
       "step": 100
     }
   ],
   "logging_steps": 1,
+  "max_steps": 140,
+  "num_train_epochs": 10,
   "save_steps": 100,
+  "total_flos": 1.837898937498624e+16,
   "trial_name": null,
   "trial_params": null
 }

checkpoint-100/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:18fa67295c2f705605ed2e7ff81543bd20a36db35f54e417d9b1ea047663c02f
 size 4027

 version https://git-lfs.github.com/spec/v1
+oid sha256:2f64eef1b40d4774a448e5873470138b7a2cb17cf32f63605c071f25bf135444
 size 4027

runs/Oct12_18-12-36_63a985a0dcf5/events.out.tfevents.1697134361.63a985a0dcf5.4074.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:12f388aa7a381716fcd82edc36b07c4432ae5880f46a9837eed7f8b93b7b65e3
+size 26628

tokenizer_config.json CHANGED Viewed

@@ -1,5 +1,40 @@
 {
   "add_prefix_space": false,
   "bos_token": "<s>",
   "clean_up_tokenization_spaces": false,
   "eos_token": "</s>",

 {
   "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
   "bos_token": "<s>",
   "clean_up_tokenization_spaces": false,
   "eos_token": "</s>",

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:18fa67295c2f705605ed2e7ff81543bd20a36db35f54e417d9b1ea047663c02f
 size 4027

 version https://git-lfs.github.com/spec/v1
+oid sha256:2f64eef1b40d4774a448e5873470138b7a2cb17cf32f63605c071f25bf135444
 size 4027