ibm-granite
/

granite-3.0-1b-a400m-base

@@ -152,6 +152,16 @@ model-index:
       type: pass@1
       value: 29.84
       veriefied: false
   - task:
       type: text-generation
     dataset:
@@ -180,7 +190,7 @@ model-index:
     metrics:
     - name: pass@1
       type: pass@1
-      value: 22.82
       veriefied: false
   - task:
       type: text-generation
@@ -191,18 +201,9 @@ model-index:
     - name: pass@1
       type: pass@1
       value: 8.96
-      veriefied: false
-  - task:
-      type: text-generation
-    dataset:
-        type: multilingual
-        name: MGSM
-    metrics:
-    - name: pass@1
-      type: pass@1
-      value: 8.20
       veriefied: false
 ---
 <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
 # Granite-3.0-1B-A400M-Base
@@ -211,17 +212,18 @@ model-index:
 **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
 - **Developers:** IBM Research
-- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
-- **Paper:** [Granite Language Models](https://) <!--     TO DO: Update github repo ling whe it is ready -->
 - **Release Date**: October 21st, 2024
-- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 ## Supported Languages
-English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified)
 ## Usage
 ### Intended use
-Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, all Granite language model can serve as baseline to create specialized models for specific application scenarios.
 ### Generation
 This is a simple example of how to use **Granite-3.0-1B-A400M-Base** model.
@@ -279,9 +281,7 @@ print(output)
 <!-- TO DO: To be completed once the paper is ready -->
 ## Training Data
-This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
-* Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
-* Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 ## Infrastructure
 We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.

       type: pass@1
       value: 29.84
       veriefied: false
+  - task:
+      type: text-generation
+    dataset:
+        type: reasoning
+        name: MUSR
+    metrics:
+    - name: pass@1
+      type: pass@1
+      value: 33.99
+      veriefied: false
   - task:
       type: text-generation
     dataset:
     metrics:
     - name: pass@1
       type: pass@1
+      value: 19.26
       veriefied: false
   - task:
       type: text-generation
     - name: pass@1
       type: pass@1
       value: 8.96
       veriefied: false
 ---
 <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
 # Granite-3.0-1B-A400M-Base
 **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
 - **Developers:** IBM Research
+- **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
+- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
+- **Paper:** [Granite 3.0 Language Models]()
 - **Release Date**: October 21st, 2024
+- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ## Supported Languages
+English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese
 ## Usage
 ### Intended use
+Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios.
 ### Generation
 This is a simple example of how to use **Granite-3.0-1B-A400M-Base** model.
 <!-- TO DO: To be completed once the paper is ready -->
 ## Training Data
+This model is trained on a mix of open-source and proprietary datasets.
 ## Infrastructure
 We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.