inceptionai
/

Jais-family-256m-chat

@@ -73,7 +73,7 @@ Below is sample code to use the model. Note that the model requires a custom mod
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
-model_path = "inceptionai/jais-family-30b-16k-chat"
 prompt_eng = "### Instruction:Your name is 'Jais', and you are named after Jebel Jais, the highest mountain in UAE. You were made by 'Inception' in the UAE. You are a helpful, respectful, and honest assistant. Always answer as helpfully as possible, while being safe. Complete the conversation between [|Human|] and [|AI|]:\n### Input: [|Human|] {Question}\n[|AI|]\n### Response :"
 prompt_ar = "### Instruction:اسمك \"جيس\" وسميت على اسم جبل جيس اعلى جبل في الامارات. تم بنائك بواسطة Inception في الإمارات. أنت مساعد مفيد ومحترم وصادق. أجب دائمًا بأكبر قدر ممكن من المساعدة، مع الحفاظ على البقاء أمناً. أكمل المحادثة بين [|Human|] و[|AI|] :\n### Input:[|Human|] {Question}\n[|AI|]\n### Response :"
@@ -165,20 +165,6 @@ During the adapted pre-training of the (`jais-adapted-*`) models, we first initi
 During instruction tuning, each training example consists of a single-turn or multi-turn prompt and it's response. Instead of one example per sequence, examples are packed together while the loss is masked on the prompt tokens. This approach speeds up training by allowing more examples to be processed per batch.
-### Training Hyperparameters:
-#### Jais-family-30b-16k-chat
-| Hyperparameter | Value                           |
-|----------------|-------------------------------------------|
-| Precision      | fp32                                      |
-| Optimizer      | AdamW                                     |
-| Learning rate  | 0 to 0.0016(<=192 warmup steps)<br>0.0016 to 0.00016(>69 and <=11342 steps)|
-| Weight decay   | 0.1                                       |
-| Batch size     | 120|
-| Context Length | 16384|
-| Steps          | 11342                                      |
 ### Compute Infrastructure
 The training process was performed on the Condor Galaxy (CG) supercomputer platform. A CG contains 64 Cerebras CS-2 Wafer-Scale Engines (WSE-2) with 40 GB of SRAM, and achieves a total of 960 PetaFLOP/s.
@@ -377,6 +363,6 @@ Through this release, we aim to make LLMs more accessible to Arabic NLP research
     title={Jais Family Model Card},
     author={Inception},
     year={2024},
-    url = {https://huggingface.co/inceptionai/jais-family-30b-16k-chat/blob/main/README.md}
 }
 ```

 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
+model_path = "inceptionai/Jais-family-256m-chat"
 prompt_eng = "### Instruction:Your name is 'Jais', and you are named after Jebel Jais, the highest mountain in UAE. You were made by 'Inception' in the UAE. You are a helpful, respectful, and honest assistant. Always answer as helpfully as possible, while being safe. Complete the conversation between [|Human|] and [|AI|]:\n### Input: [|Human|] {Question}\n[|AI|]\n### Response :"
 prompt_ar = "### Instruction:اسمك \"جيس\" وسميت على اسم جبل جيس اعلى جبل في الامارات. تم بنائك بواسطة Inception في الإمارات. أنت مساعد مفيد ومحترم وصادق. أجب دائمًا بأكبر قدر ممكن من المساعدة، مع الحفاظ على البقاء أمناً. أكمل المحادثة بين [|Human|] و[|AI|] :\n### Input:[|Human|] {Question}\n[|AI|]\n### Response :"
 During instruction tuning, each training example consists of a single-turn or multi-turn prompt and it's response. Instead of one example per sequence, examples are packed together while the loss is masked on the prompt tokens. This approach speeds up training by allowing more examples to be processed per batch.
 ### Compute Infrastructure
 The training process was performed on the Condor Galaxy (CG) supercomputer platform. A CG contains 64 Cerebras CS-2 Wafer-Scale Engines (WSE-2) with 40 GB of SRAM, and achieves a total of 960 PetaFLOP/s.
     title={Jais Family Model Card},
     author={Inception},
     year={2024},
+    url = {https://huggingface.co/inceptionai/Jais-family-256m-chat/blob/main/README.md}
 }
 ```