NVIDIA framework and contribution updates. (#3)
Browse files- NVIDIA framework and contribution updates. (a92aab64fe4f3cde7c7d82842b8c533be0f0de26)
Co-authored-by: Krzysztof Pawelec <Criztov@users.noreply.huggingface.co>
README.md
CHANGED
@@ -81,7 +81,8 @@ model-index:
|
|
81 |
|
82 |
## Model Summary
|
83 |
|
84 |
-
StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train), with opt-out requests excluded. The model uses [Grouped Query Attention](https://arxiv.org/abs/2305.13245), [a context window of 16,384 tokens](https://arxiv.org/abs/2205.14135) with [a sliding window attention of 4,096 tokens](https://arxiv.org/abs/2004.05150v2), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 4+ trillion tokens.
|
|
|
85 |
|
86 |
- **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
|
87 |
- **Paper:** [Link](https://huggingface.co/datasets/bigcode/the-stack-v2/)
|
@@ -186,11 +187,11 @@ The model has been trained on source code from 600+ programming languages. The p
|
|
186 |
|
187 |
## Hardware
|
188 |
|
189 |
-
- **GPUs:** 1024
|
190 |
|
191 |
## Software
|
192 |
|
193 |
-
- **Framework:** [NeMo](https://
|
194 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
195 |
|
196 |
# License
|
|
|
81 |
|
82 |
## Model Summary
|
83 |
|
84 |
+
StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train), with opt-out requests excluded. The model uses [Grouped Query Attention](https://arxiv.org/abs/2305.13245), [a context window of 16,384 tokens](https://arxiv.org/abs/2205.14135) with [a sliding window attention of 4,096 tokens](https://arxiv.org/abs/2004.05150v2), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 4+ trillion tokens.
|
85 |
+
The model was trained with [NVIDIA NeMo™ Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/) using the [NVIDIA Eos Supercomputer](https://blogs.nvidia.com/blog/eos/) built with [NVIDIA DGX H100](https://www.nvidia.com/en-us/data-center/dgx-h100/) systems.
|
86 |
|
87 |
- **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
|
88 |
- **Paper:** [Link](https://huggingface.co/datasets/bigcode/the-stack-v2/)
|
|
|
187 |
|
188 |
## Hardware
|
189 |
|
190 |
+
- **GPUs:** 1024 x H100
|
191 |
|
192 |
## Software
|
193 |
|
194 |
+
- **Framework:** [NeMo Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/)
|
195 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
196 |
|
197 |
# License
|