updates: eval results and github repo link
Browse files
README.md
CHANGED
@@ -152,6 +152,16 @@ model-index:
|
|
152 |
type: pass@1
|
153 |
value: 29.84
|
154 |
veriefied: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
155 |
- task:
|
156 |
type: text-generation
|
157 |
dataset:
|
@@ -180,7 +190,7 @@ model-index:
|
|
180 |
metrics:
|
181 |
- name: pass@1
|
182 |
type: pass@1
|
183 |
-
value:
|
184 |
veriefied: false
|
185 |
- task:
|
186 |
type: text-generation
|
@@ -191,18 +201,9 @@ model-index:
|
|
191 |
- name: pass@1
|
192 |
type: pass@1
|
193 |
value: 8.96
|
194 |
-
veriefied: false
|
195 |
-
- task:
|
196 |
-
type: text-generation
|
197 |
-
dataset:
|
198 |
-
type: multilingual
|
199 |
-
name: MGSM
|
200 |
-
metrics:
|
201 |
-
- name: pass@1
|
202 |
-
type: pass@1
|
203 |
-
value: 8.20
|
204 |
veriefied: false
|
205 |
---
|
|
|
206 |
<!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
|
207 |
|
208 |
# Granite-3.0-1B-A400M-Base
|
@@ -211,17 +212,18 @@ model-index:
|
|
211 |
**Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
|
212 |
|
213 |
- **Developers:** IBM Research
|
214 |
-
- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
|
215 |
-
- **
|
|
|
216 |
- **Release Date**: October 21st, 2024
|
217 |
-
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
218 |
|
219 |
## Supported Languages
|
220 |
-
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese
|
221 |
|
222 |
## Usage
|
223 |
### Intended use
|
224 |
-
Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover,
|
225 |
|
226 |
### Generation
|
227 |
This is a simple example of how to use **Granite-3.0-1B-A400M-Base** model.
|
@@ -279,9 +281,7 @@ print(output)
|
|
279 |
|
280 |
<!-- TO DO: To be completed once the paper is ready -->
|
281 |
## Training Data
|
282 |
-
This model is trained on a mix of open-source and proprietary
|
283 |
-
* Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
284 |
-
* Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
285 |
|
286 |
## Infrastructure
|
287 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
|
|
152 |
type: pass@1
|
153 |
value: 29.84
|
154 |
veriefied: false
|
155 |
+
- task:
|
156 |
+
type: text-generation
|
157 |
+
dataset:
|
158 |
+
type: reasoning
|
159 |
+
name: MUSR
|
160 |
+
metrics:
|
161 |
+
- name: pass@1
|
162 |
+
type: pass@1
|
163 |
+
value: 33.99
|
164 |
+
veriefied: false
|
165 |
- task:
|
166 |
type: text-generation
|
167 |
dataset:
|
|
|
190 |
metrics:
|
191 |
- name: pass@1
|
192 |
type: pass@1
|
193 |
+
value: 19.26
|
194 |
veriefied: false
|
195 |
- task:
|
196 |
type: text-generation
|
|
|
201 |
- name: pass@1
|
202 |
type: pass@1
|
203 |
value: 8.96
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
204 |
veriefied: false
|
205 |
---
|
206 |
+
|
207 |
<!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
|
208 |
|
209 |
# Granite-3.0-1B-A400M-Base
|
|
|
212 |
**Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
|
213 |
|
214 |
- **Developers:** IBM Research
|
215 |
+
- **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
|
216 |
+
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
|
217 |
+
- **Paper:** [Granite 3.0 Language Models]()
|
218 |
- **Release Date**: October 21st, 2024
|
219 |
+
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
220 |
|
221 |
## Supported Languages
|
222 |
+
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese
|
223 |
|
224 |
## Usage
|
225 |
### Intended use
|
226 |
+
Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios.
|
227 |
|
228 |
### Generation
|
229 |
This is a simple example of how to use **Granite-3.0-1B-A400M-Base** model.
|
|
|
281 |
|
282 |
<!-- TO DO: To be completed once the paper is ready -->
|
283 |
## Training Data
|
284 |
+
This model is trained on a mix of open-source and proprietary datasets.
|
|
|
|
|
285 |
|
286 |
## Infrastructure
|
287 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|