rpand002 commited on
Commit
0d47217
·
verified ·
1 Parent(s): 59bd5eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -13,7 +13,7 @@ base_model:
13
  # Granite-3.1-8B-Instruct
14
 
15
  **Model Summary:**
16
- Granite-3.1-8B-Instruct is a 8B parameter model finetuned from *Granite-3.1-8B-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.
17
 
18
  - **Developers:** Granite Team, IBM
19
  - **GitHub Repository:** [ibm-granite/granite-3.1-language-models](https://github.com/ibm-granite/granite-3.1-language-models)
@@ -37,6 +37,7 @@ The model is designed to respond to general instructions and can be used to buil
37
  * Code related tasks
38
  * Function-calling tasks
39
  * Multilingual dialog use cases
 
40
 
41
  **Generation:**
42
  This is a simple example of how to use Granite-3.1-8B-Instruct model.
@@ -98,7 +99,7 @@ Granite-3.1-8B-Instruct is based on a decoder-only dense transformer architectur
98
  | # Training tokens | 12T | **12T** | 10T | 10T |
99
 
100
  **Training Data:**
101
- Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the [Granite Technical Report]() and [Accompanying Author List]().
102
 
103
  **Infrastructure:**
104
  We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
 
13
  # Granite-3.1-8B-Instruct
14
 
15
  **Model Summary:**
16
+ Granite-3.1-8B-Instruct is a 8B parameter long-context instruct model finetuned from *Granite-3.1-8B-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.
17
 
18
  - **Developers:** Granite Team, IBM
19
  - **GitHub Repository:** [ibm-granite/granite-3.1-language-models](https://github.com/ibm-granite/granite-3.1-language-models)
 
37
  * Code related tasks
38
  * Function-calling tasks
39
  * Multilingual dialog use cases
40
+ * Long-context tasks including long document/meeting summarization, long document QA, etc.
41
 
42
  **Generation:**
43
  This is a simple example of how to use Granite-3.1-8B-Instruct model.
 
99
  | # Training tokens | 12T | **12T** | 10T | 10T |
100
 
101
  **Training Data:**
102
+ Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities including long-context tasks, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the [Granite Technical Report]() and [Accompanying Author List]().
103
 
104
  **Infrastructure:**
105
  We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.