Cyrile commited on
Commit
8092663
1 Parent(s): ea037fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -50,6 +50,20 @@ Here is the table summarizing the architecture used for training, along with the
50
  | [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat) | 1 x A100 40GB | 140 | 13 |
51
  | [bloomz-7b1-mt-sft-chat](https://huggingface.co/cmarkea/bloomz-7b1-mt-sft-chat) | 4 x A100 40GB | 268 | 8 |
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  Experimentations
54
  ----------------
55
  Since the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point:
 
50
  | [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat) | 1 x A100 40GB | 140 | 13 |
51
  | [bloomz-7b1-mt-sft-chat](https://huggingface.co/cmarkea/bloomz-7b1-mt-sft-chat) | 4 x A100 40GB | 268 | 8 |
52
 
53
+ | Hyperparameter | Value |
54
+ |:---------------------:|:----------:|
55
+ | label smoothing | 0.05 |
56
+ | optimize | AdamW |
57
+ | betas | 0.9, 0.999 |
58
+ | AMSGrad | True |
59
+ | learning rate | 5e-6 |
60
+ | anneal strategy | cos |
61
+ | div factor | 100 |
62
+ | final div factor | 0.1 |
63
+ | batch size | 16 |
64
+ | gradient accumulation | 25 |
65
+ | max length | 1500 |
66
+
67
  Experimentations
68
  ----------------
69
  Since the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point: