puneeshkhanna commited on
Commit
109c33b
1 Parent(s): a1cf49e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -19,6 +19,8 @@ tags:
19
 
20
  # Model Details
21
 
 
 
22
  ## Model Description
23
 
24
  - **Developed by:** [https://www.tii.ae](https://www.tii.ae)
@@ -104,17 +106,24 @@ print(tokenizer.decode(outputs[0]))
104
 
105
  ## Training Data
106
 
 
 
107
  ## Training Procedure
108
 
 
 
109
  ### Training Hyperparameters
110
 
111
- | **Hyperparameter** | **Value** | **Comment** |
112
- |--------------------|------------|-------------------------------------------|
113
- | Precision | `bfloat16` | |
114
- | Optimizer | AdamW | |
115
- | Max learning rate | | Following a WSD (warmup-stable-decay) learning rate schedule |
116
- | Weight decay | | |
117
- | Batch size | | |
 
 
 
118
 
119
  # Evaluation
120
 
 
19
 
20
  # Model Details
21
 
22
+ ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
23
+
24
  ## Model Description
25
 
26
  - **Developed by:** [https://www.tii.ae](https://www.tii.ae)
 
106
 
107
  ## Training Data
108
 
109
+ Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data.
110
+
111
  ## Training Procedure
112
 
113
+ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
114
+
115
  ### Training Hyperparameters
116
 
117
+ | **Hyperparameter** | **Value** | **Comment** |
118
+ |--------------------|------------|---------------------------------------|
119
+ | Precision | `bfloat16` | |
120
+ | Optimizer | AdamW | |
121
+ | Max learning rate | 6e-4 | Following a WSD (warmup-stable-decay) |
122
+ | | | learning rate scheduler |
123
+ | Weight decay | 1e-1 | |
124
+ | z-loss | 1e-4 | |
125
+ | Batch size | Variable | Batch size was gradually increased |
126
+ | | | during the training |
127
 
128
  # Evaluation
129