fill some TODOs
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ TODO
|
|
29 |
|
30 |
## Model Summary
|
31 |
|
32 |
-
The
|
33 |
|
34 |
- **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
|
35 |
- **Paper:** TODO
|
@@ -47,11 +47,11 @@ The model was trained on GitHub code as well as additional selected data sources
|
|
47 |
# pip install -q transformers # TODO: from main
|
48 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
49 |
|
50 |
-
checkpoint = "bigcode/
|
51 |
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
52 |
|
53 |
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
54 |
-
model = AutoModelForCausalLM.from_pretrained(checkpoint
|
55 |
|
56 |
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
|
57 |
outputs = model.generate(inputs)
|
@@ -71,18 +71,17 @@ The model has been trained on source code from 600+ programming languages. The p
|
|
71 |
## Model
|
72 |
|
73 |
- **Architecture:** Transformer decoder with grouped-query and sliding window attention and Fill-in-the-Middle objective
|
74 |
-
- **Pretraining steps:**
|
75 |
- **Pretraining tokens:** 4+ trillion
|
76 |
- **Precision:** bfloat16
|
77 |
|
78 |
## Hardware
|
79 |
|
80 |
- **GPUs:** 1024 A100
|
81 |
-
- **Training time:** TODO
|
82 |
|
83 |
## Software
|
84 |
|
85 |
-
- **Framework:** [
|
86 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
87 |
|
88 |
# License
|
|
|
29 |
|
30 |
## Model Summary
|
31 |
|
32 |
+
The StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train), with opt-out requests excluded. The model uses [Grouped Query Attention](https://arxiv.org/abs/2305.13245), [a context window of 16,384 tokens](https://arxiv.org/abs/2205.14135) with [a sliding window attention of 4,096 tokens](https://arxiv.org/abs/2004.05150v2), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 4+ trillion tokens.
|
33 |
|
34 |
- **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
|
35 |
- **Paper:** TODO
|
|
|
47 |
# pip install -q transformers # TODO: from main
|
48 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
49 |
|
50 |
+
checkpoint = "bigcode/starcoder2-15b"
|
51 |
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
52 |
|
53 |
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
54 |
+
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
|
55 |
|
56 |
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
|
57 |
outputs = model.generate(inputs)
|
|
|
71 |
## Model
|
72 |
|
73 |
- **Architecture:** Transformer decoder with grouped-query and sliding window attention and Fill-in-the-Middle objective
|
74 |
+
- **Pretraining steps:** 1 million
|
75 |
- **Pretraining tokens:** 4+ trillion
|
76 |
- **Precision:** bfloat16
|
77 |
|
78 |
## Hardware
|
79 |
|
80 |
- **GPUs:** 1024 A100
|
|
|
81 |
|
82 |
## Software
|
83 |
|
84 |
+
- **Framework:** [NeMo](https://github.com/NVIDIA/NeMo)
|
85 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
86 |
|
87 |
# License
|