afaji commited on
Commit
c54d873
1 Parent(s): 933dd29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -18
README.md CHANGED
@@ -82,24 +82,6 @@ You can view other LaMini model series as follow. Note that not all models are p
82
  </tbody>
83
  </table>
84
 
85
- ## Training Procedure
86
- We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M.
87
-
88
- ### Training Hyperparameters
89
-
90
- The following hyperparameters were used during training:
91
- - learning_rate: 0.0005
92
- - train_batch_size: 128
93
- - eval_batch_size: 64
94
- - seed: 42
95
- - gradient_accumulation_steps: 4
96
- - total_train_batch_size: 512
97
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
98
- - lr_scheduler_type: linear
99
- - num_epochs: 5
100
-
101
- ## Evaluation
102
- We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
103
 
104
  ## Use
105
 
@@ -122,6 +104,25 @@ generated_text = generator(input_prompt, max_length=512, do_sample=True)[0]['gen
122
  print("Response": generated_text)
123
  ```
124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  ## Limitations
126
 
127
  More information needed
 
82
  </tbody>
83
  </table>
84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  ## Use
87
 
 
104
  print("Response": generated_text)
105
  ```
106
 
107
+ ## Training Procedure
108
+ We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M.
109
+
110
+ ### Training Hyperparameters
111
+
112
+ The following hyperparameters were used during training:
113
+ - learning_rate: 0.0005
114
+ - train_batch_size: 128
115
+ - eval_batch_size: 64
116
+ - seed: 42
117
+ - gradient_accumulation_steps: 4
118
+ - total_train_batch_size: 512
119
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
120
+ - lr_scheduler_type: linear
121
+ - num_epochs: 5
122
+
123
+ ## Evaluation
124
+ We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
125
+
126
  ## Limitations
127
 
128
  More information needed