eval results correction
Browse files
README.md
CHANGED
@@ -95,7 +95,7 @@ model-index:
|
|
95 |
- task:
|
96 |
type: text-generation
|
97 |
dataset:
|
98 |
-
type:
|
99 |
name: TruthfulQA
|
100 |
metrics:
|
101 |
- name: pass@1
|
@@ -106,6 +106,26 @@ model-index:
|
|
106 |
type: text-generation
|
107 |
dataset:
|
108 |
type: reading-comprehension
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
name: ARC-C
|
110 |
metrics:
|
111 |
- name: pass@1
|
@@ -115,7 +135,7 @@ model-index:
|
|
115 |
- task:
|
116 |
type: text-generation
|
117 |
dataset:
|
118 |
-
type:
|
119 |
name: GPQA
|
120 |
metrics:
|
121 |
- name: pass@1
|
@@ -125,7 +145,7 @@ model-index:
|
|
125 |
- task:
|
126 |
type: text-generation
|
127 |
dataset:
|
128 |
-
type:
|
129 |
name: BBH
|
130 |
metrics:
|
131 |
- name: pass@1
|
@@ -184,24 +204,12 @@ model-index:
|
|
184 |
veriefied: false
|
185 |
---
|
186 |
<!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
|
187 |
-
<!-- ![image/png](granite-3_0-language-models_Group_1.png) -->
|
188 |
|
189 |
# Granite-3.0-1B-A400M-Base
|
190 |
|
191 |
## Model Summary
|
192 |
**Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
|
193 |
|
194 |
-
<!-- **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). The particular characteristics of this model, includig a Mixture of Experts(MoE) architecture, small size, and open-source nature, make it an ideal baseline for finetuning other models that require large model capacity while maintaining computational efficiency. **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks. -->
|
195 |
-
|
196 |
-
<!-- Use Cases:
|
197 |
-
Dense LLMs: Suitable for scenarios where fast inference with a smaller model size is prioritized, such as real-time applications or deployment on resource-constrained devices.
|
198 |
-
MoE LLMs: Ideal for situations where large model capacity is needed while maintaining computational efficiency, like handling complex tasks or large datasets with high computational demands -->
|
199 |
-
|
200 |
-
<!-- ====Features==== -->
|
201 |
-
<!-- MoE will be faster
|
202 |
-
Demployment resources (memory): same -->
|
203 |
-
|
204 |
-
|
205 |
- **Developers:** IBM Research
|
206 |
- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
|
207 |
- **Paper:** [Granite Language Models](https://) <!-- TO DO: Update github repo ling whe it is ready -->
|
@@ -273,7 +281,6 @@ print(output)
|
|
273 |
## Training Data
|
274 |
This model is trained on a mix of open-source and proprietary datasets.
|
275 |
|
276 |
-
<!-- CHECK: removed Vela, only talk about blue-vela-->
|
277 |
## Infrastructure
|
278 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
279 |
|
|
|
95 |
- task:
|
96 |
type: text-generation
|
97 |
dataset:
|
98 |
+
type: commonsense
|
99 |
name: TruthfulQA
|
100 |
metrics:
|
101 |
- name: pass@1
|
|
|
106 |
type: text-generation
|
107 |
dataset:
|
108 |
type: reading-comprehension
|
109 |
+
name: BoolQ
|
110 |
+
metrics:
|
111 |
+
- name: pass@1
|
112 |
+
type: pass@1
|
113 |
+
value: 65.44
|
114 |
+
veriefied: false
|
115 |
+
- task:
|
116 |
+
type: text-generation
|
117 |
+
dataset:
|
118 |
+
type: reading-comprehension
|
119 |
+
name: SQuAD v2
|
120 |
+
metrics:
|
121 |
+
- name: pass@1
|
122 |
+
type: pass@1
|
123 |
+
value: 17.78
|
124 |
+
veriefied: false
|
125 |
+
- task:
|
126 |
+
type: text-generation
|
127 |
+
dataset:
|
128 |
+
type: reasoning
|
129 |
name: ARC-C
|
130 |
metrics:
|
131 |
- name: pass@1
|
|
|
135 |
- task:
|
136 |
type: text-generation
|
137 |
dataset:
|
138 |
+
type: reasoning
|
139 |
name: GPQA
|
140 |
metrics:
|
141 |
- name: pass@1
|
|
|
145 |
- task:
|
146 |
type: text-generation
|
147 |
dataset:
|
148 |
+
type: reasoning
|
149 |
name: BBH
|
150 |
metrics:
|
151 |
- name: pass@1
|
|
|
204 |
veriefied: false
|
205 |
---
|
206 |
<!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
|
|
|
207 |
|
208 |
# Granite-3.0-1B-A400M-Base
|
209 |
|
210 |
## Model Summary
|
211 |
**Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
|
212 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
213 |
- **Developers:** IBM Research
|
214 |
- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
|
215 |
- **Paper:** [Granite Language Models](https://) <!-- TO DO: Update github repo ling whe it is ready -->
|
|
|
281 |
## Training Data
|
282 |
This model is trained on a mix of open-source and proprietary datasets.
|
283 |
|
|
|
284 |
## Infrastructure
|
285 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
286 |
|