Update README.md
Browse files
README.md
CHANGED
@@ -240,7 +240,7 @@ for i in output:
|
|
240 |
```
|
241 |
|
242 |
## Training Data
|
243 |
-
Starting from the base Granite model, this model was further pretrained on repository-level code data with per-language oversampling, allowing it to effectively utilize up to 128K tokens of context. This continued training stage focused on a curated selection of programming languages, such as Python, C, C++, Go, Java, JavaScript, and TypeScript.
|
244 |
|
245 |
## Infrastructure
|
246 |
We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
|
|
240 |
```
|
241 |
|
242 |
## Training Data
|
243 |
+
Starting from the base Granite model, this model was further pretrained on repository-level code data with per-language context-length oversampling, allowing it to effectively utilize up to 128K tokens of context. This continued training stage focused on a curated selection of programming languages, such as Python, C, C++, Go, Java, JavaScript, and TypeScript.
|
244 |
|
245 |
## Infrastructure
|
246 |
We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.
|