cmarkea
/

bloomz-560m-sft-chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Cyrile commited on Sep 14, 2023

Commit

8092663

•

1 Parent(s): ea037fd

Update README.md

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -50,6 +50,20 @@ Here is the table summarizing the architecture used for training, along with the
 |   [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat)   | 1 x A100 40GB |        140        |                 13                  |
 | [bloomz-7b1-mt-sft-chat](https://huggingface.co/cmarkea/bloomz-7b1-mt-sft-chat) | 4 x A100 40GB |        268        |                  8                  |
 Experimentations
 ----------------
 Since the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point:

 |   [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat)   | 1 x A100 40GB |        140        |                 13                  |
 | [bloomz-7b1-mt-sft-chat](https://huggingface.co/cmarkea/bloomz-7b1-mt-sft-chat) | 4 x A100 40GB |        268        |                  8                  |
+|     Hyperparameter    |    Value   |
+|:---------------------:|:----------:|
+|       label smoothing | 0.05       |
+|              optimize | AdamW      |
+|                 betas | 0.9, 0.999 |
+|               AMSGrad | True       |
+|         learning rate | 5e-6       |
+|       anneal strategy | cos        |
+|            div factor | 100        |
+|      final div factor | 0.1        |
+|            batch size | 16         |
+| gradient accumulation | 25         |
+|            max length | 1500       |
 Experimentations
 ----------------
 Since the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point: