amezasor commited on
Commit
588f56e
1 Parent(s): 8f3d6d6

updates: eval results and github repo link

Browse files
Files changed (1) hide show
  1. README.md +19 -19
README.md CHANGED
@@ -152,6 +152,16 @@ model-index:
152
  type: pass@1
153
  value: 29.84
154
  veriefied: false
 
 
 
 
 
 
 
 
 
 
155
  - task:
156
  type: text-generation
157
  dataset:
@@ -180,7 +190,7 @@ model-index:
180
  metrics:
181
  - name: pass@1
182
  type: pass@1
183
- value: 22.82
184
  veriefied: false
185
  - task:
186
  type: text-generation
@@ -191,18 +201,9 @@ model-index:
191
  - name: pass@1
192
  type: pass@1
193
  value: 8.96
194
- veriefied: false
195
- - task:
196
- type: text-generation
197
- dataset:
198
- type: multilingual
199
- name: MGSM
200
- metrics:
201
- - name: pass@1
202
- type: pass@1
203
- value: 8.20
204
  veriefied: false
205
  ---
 
206
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
207
 
208
  # Granite-3.0-1B-A400M-Base
@@ -211,17 +212,18 @@ model-index:
211
  **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
212
 
213
  - **Developers:** IBM Research
214
- - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
215
- - **Paper:** [Granite Language Models](https://) <!-- TO DO: Update github repo ling whe it is ready -->
 
216
  - **Release Date**: October 21st, 2024
217
- - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
218
 
219
  ## Supported Languages
220
- English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified)
221
 
222
  ## Usage
223
  ### Intended use
224
- Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, all Granite language model can serve as baseline to create specialized models for specific application scenarios.
225
 
226
  ### Generation
227
  This is a simple example of how to use **Granite-3.0-1B-A400M-Base** model.
@@ -279,9 +281,7 @@ print(output)
279
 
280
  <!-- TO DO: To be completed once the paper is ready -->
281
  ## Training Data
282
- This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
283
- * Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
284
- * Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
285
 
286
  ## Infrastructure
287
  We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
 
152
  type: pass@1
153
  value: 29.84
154
  veriefied: false
155
+ - task:
156
+ type: text-generation
157
+ dataset:
158
+ type: reasoning
159
+ name: MUSR
160
+ metrics:
161
+ - name: pass@1
162
+ type: pass@1
163
+ value: 33.99
164
+ veriefied: false
165
  - task:
166
  type: text-generation
167
  dataset:
 
190
  metrics:
191
  - name: pass@1
192
  type: pass@1
193
+ value: 19.26
194
  veriefied: false
195
  - task:
196
  type: text-generation
 
201
  - name: pass@1
202
  type: pass@1
203
  value: 8.96
 
 
 
 
 
 
 
 
 
 
204
  veriefied: false
205
  ---
206
+
207
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
208
 
209
  # Granite-3.0-1B-A400M-Base
 
212
  **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
213
 
214
  - **Developers:** IBM Research
215
+ - **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
216
+ - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
217
+ - **Paper:** [Granite 3.0 Language Models]()
218
  - **Release Date**: October 21st, 2024
219
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
220
 
221
  ## Supported Languages
222
+ English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese
223
 
224
  ## Usage
225
  ### Intended use
226
+ Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios.
227
 
228
  ### Generation
229
  This is a simple example of how to use **Granite-3.0-1B-A400M-Base** model.
 
281
 
282
  <!-- TO DO: To be completed once the paper is ready -->
283
  ## Training Data
284
+ This model is trained on a mix of open-source and proprietary datasets.
 
 
285
 
286
  ## Infrastructure
287
  We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.