Update README.md
Browse files
README.md
CHANGED
@@ -332,7 +332,7 @@ Granite-3.0-8B-Instruct is based on a decoder-only dense transformer architectur
|
|
332 |
| # Training tokens | 12T | **12T** | 10T | 10T |
|
333 |
|
334 |
**Training Data:**
|
335 |
-
Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) very small amounts of human-curated data.
|
336 |
|
337 |
**Infrastructure:**
|
338 |
We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources.
|
|
|
332 |
| # Training tokens | 12T | **12T** | 10T | 10T |
|
333 |
|
334 |
**Training Data:**
|
335 |
+
Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf).
|
336 |
|
337 |
**Infrastructure:**
|
338 |
We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources.
|