aidal commited on
Commit
1c63217
1 Parent(s): 63ccaad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -24,7 +24,7 @@ language:
24
 
25
  # Model description
26
 
27
- >Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. It delivers throughput gains over traditional Transformer-based models, while outperforming or matching the leading models of its size class on most common benchmarks.Jamba is the first production-scale Mamba implementation, which opens up interesting research and application opportunities. While this initial experimentation shows encouraging gains, we expect these to be further enhanced with future optimizations and explorations.This model card is for the base version of Jamba. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.
28
  ----
29
 
30
  # Example output:
@@ -41,11 +41,11 @@ language:
41
 
42
  | model | dataset | score |
43
  |---------------|-------------------|--------|
44
- | base-model-7b | ARC-easy |41.92% |
45
- | base-model-7b | ARC-easy |39.12% |
46
- | fa-model-7b | ARC-easy |37.89% |
47
- | base-model-7b | ARC-challenge |37.12% |
48
- | fa-model-7b | ARC-challenge |39.29% |
49
 
50
  ----
51
  # How to use
@@ -63,6 +63,11 @@ print(tokenizer.decode(outputs[0]))
63
  # Training and finetuning
64
  - **Extend tokenzer:** The base Mistral tokenizer does not support Persian. As an initial step, we trained a SentencePiece tokenizer on the Farsi Wikipedia corpus and subsequently integrated it with the Mistral tokenizer.
65
  - **Pre-training:** In the following step, we expanded the embedding layer of the base model to match the size of the Persian tokenizer. We then employed the LoRA method to train the model on three distinct datasets: Wikipedia-Farsi, an Islamic book collection, and content from Khamenei.ir.
 
 
 
 
 
66
  - **Instruction Fine-tuning:** For the final step, we fine-tuned the model using the LoRA method on a translated version of the Stanford-alpaca to enhance the model's question-answering capabilities.
67
  This diagram illustrates the steps described above:
68
 
 
24
 
25
  # Model description
26
 
27
+ >Persian-mistral is the fintuned version of mistral-7b that design for persian QA and nlp tasks
28
  ----
29
 
30
  # Example output:
 
41
 
42
  | model | dataset | score |
43
  |---------------|-------------------|--------|
44
+ | base-model-7b | ARC-easy |41.92% |
45
+ | base-model-7b | ARC-easy |39.12% |
46
+ | fa-model-7b | ARC-easy |37.89% |
47
+ | base-model-7b | ARC-challenge |37.12% |
48
+ | fa-model-7b | ARC-challenge |39.29% |
49
 
50
  ----
51
  # How to use
 
63
  # Training and finetuning
64
  - **Extend tokenzer:** The base Mistral tokenizer does not support Persian. As an initial step, we trained a SentencePiece tokenizer on the Farsi Wikipedia corpus and subsequently integrated it with the Mistral tokenizer.
65
  - **Pre-training:** In the following step, we expanded the embedding layer of the base model to match the size of the Persian tokenizer. We then employed the LoRA method to train the model on three distinct datasets: Wikipedia-Farsi, an Islamic book collection, and content from Khamenei.ir.
66
+ - <p align="center">
67
+ <picture>
68
+ <img alt="Hugging Face Transformers Library" src="https://i.postimg.cc/LXSD4HnZ/Stakehozlder-Map-1-page-0001-modified.png" width="400" height="500" style="max-width: 100%;">
69
+ </picture>
70
+ </p>
71
  - **Instruction Fine-tuning:** For the final step, we fine-tuned the model using the LoRA method on a translated version of the Stanford-alpaca to enhance the model's question-answering capabilities.
72
  This diagram illustrates the steps described above:
73