Cyrile commited on
Commit
d71114e
1 Parent(s): 0e0abea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -52,16 +52,16 @@ Here is the table summarizing the architecture used for training, along with the
52
 
53
  | Hyperparameter | Value |
54
  |:---------------------:|:----------:|
55
- | label smoothing | 0.05 |
56
- | optimize | AdamW |
57
- | betas | 0.9, 0.999 |
58
- | learning rate | 5e-6 |
59
- | anneal strategy | cos |
60
- | div factor | 100 |
61
- | final div factor | 0.1 |
62
- | batch size | 2 |
63
  | gradient accumulation | 200 |
64
- | max length | 2048 |
65
 
66
  Experimentations
67
  ----------------
 
52
 
53
  | Hyperparameter | Value |
54
  |:---------------------:|:----------:|
55
+ | label smoothing | 0.05 |
56
+ | optimize | AdamW |
57
+ | betas | 0.9, 0.999 |
58
+ | learning rate | 5e-6 |
59
+ | anneal strategy | cos |
60
+ | div factor | 100 |
61
+ | final div factor | 0.1 |
62
+ | batch size | 2 |
63
  | gradient accumulation | 200 |
64
+ | max length | 2048 |
65
 
66
  Experimentations
67
  ----------------