ibm-granite
/

granite-3b-code-base-128k

Text Generation

text-generation-inference

Model card Files Files and versions Community

rpand002 commited on Jul 16, 2024

Commit

d929d31

·

verified ·

1 Parent(s): 601d23d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -240,7 +240,7 @@ for i in output:
 ```
 ## Training Data
-Starting from the base Granite model, this model was further pretrained on repository-level code data with per-language oversampling, allowing it to effectively utilize up to 128K tokens of context. This continued training stage focused on a curated selection of programming languages, such as Python, C, C++, Go, Java, JavaScript, and TypeScript.
 ## Infrastructure
 We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.

 ```
 ## Training Data
+Starting from the base Granite model, this model was further pretrained on repository-level code data with per-language context-length oversampling, allowing it to effectively utilize up to 128K tokens of context. This continued training stage focused on a curated selection of programming languages, such as Python, C, C++, Go, Java, JavaScript, and TypeScript.
 ## Infrastructure
 We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.