tiiuae
/

Falcon3-7B-Base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

puneeshkhanna commited on 5 days ago

Commit

109c33b

•

1 Parent(s): a1cf49e

Update README.md

Files changed (1) hide show

README.md +16 -7

README.md CHANGED Viewed

@@ -19,6 +19,8 @@ tags:
 # Model Details
 ## Model Description
 - **Developed by:** [https://www.tii.ae](https://www.tii.ae)
@@ -104,17 +106,24 @@ print(tokenizer.decode(outputs[0]))
 ## Training Data
 ## Training Procedure
 ### Training Hyperparameters
-| **Hyperparameter** | **Value**  | **Comment**                               |
-|--------------------|------------|-------------------------------------------|
-| Precision          | `bfloat16` |                                           |
-| Optimizer          | AdamW      |                                           |
-| Max learning rate  |      | Following a WSD (warmup-stable-decay) learning rate schedule |
-| Weight decay       |        |                                           |
-| Batch size         |        |                                           |
 # Evaluation

 # Model Details
+⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
 ## Model Description
 - **Developed by:** [https://www.tii.ae](https://www.tii.ae)
 ## Training Data
+Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data.
 ## Training Procedure
+Falcon3-7B is trained on 256 H100 nodes (world size 2048).
 ### Training Hyperparameters
+| **Hyperparameter** | **Value**  | **Comment**                           |
+|--------------------|------------|---------------------------------------|
+| Precision          | `bfloat16` |                                       |
+| Optimizer          | AdamW      |                                       |
+| Max learning rate  | 6e-4       | Following a WSD (warmup-stable-decay) |
+|                    |            | learning rate scheduler               |
+| Weight decay       | 1e-1       |                                       |
+| z-loss             | 1e-4       |                                       |
+| Batch size         | Variable   | Batch size was gradually increased    |
+|                    |            | during the training                   |
 # Evaluation