loubnabnl HF staff commited on
Commit
305c68b
1 Parent(s): 76e8fbd

fill some TODOs

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -29,7 +29,7 @@ TODO
29
 
30
  ## Model Summary
31
 
32
- The StarCoderBase models are 15.5B parameter models trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train), with opt-out requests excluded. The model uses [Grouped Query Attention](https://arxiv.org/abs/2305.13245), [a context window of 16,384 tokens](https://arxiv.org/abs/2205.14135) with [a sliding window attention of 4,096 tokens](https://arxiv.org/abs/2004.05150v2), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 4+ trillion tokens.
33
 
34
  - **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
35
  - **Paper:** TODO
@@ -47,11 +47,11 @@ The model was trained on GitHub code as well as additional selected data sources
47
  # pip install -q transformers # TODO: from main
48
  from transformers import AutoModelForCausalLM, AutoTokenizer
49
 
50
- checkpoint = "bigcode/starcoderbase"
51
  device = "cuda" # for GPU usage or "cpu" for CPU usage
52
 
53
  tokenizer = AutoTokenizer.from_pretrained(checkpoint)
54
- model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
55
 
56
  inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
57
  outputs = model.generate(inputs)
@@ -71,18 +71,17 @@ The model has been trained on source code from 600+ programming languages. The p
71
  ## Model
72
 
73
  - **Architecture:** Transformer decoder with grouped-query and sliding window attention and Fill-in-the-Middle objective
74
- - **Pretraining steps:** TODO
75
  - **Pretraining tokens:** 4+ trillion
76
  - **Precision:** bfloat16
77
 
78
  ## Hardware
79
 
80
  - **GPUs:** 1024 A100
81
- - **Training time:** TODO
82
 
83
  ## Software
84
 
85
- - **Framework:** [Megatron-Nemo](https://github.com/NVIDIA/NeMo) TODO double check
86
  - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
87
 
88
  # License
 
29
 
30
  ## Model Summary
31
 
32
+ The StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train), with opt-out requests excluded. The model uses [Grouped Query Attention](https://arxiv.org/abs/2305.13245), [a context window of 16,384 tokens](https://arxiv.org/abs/2205.14135) with [a sliding window attention of 4,096 tokens](https://arxiv.org/abs/2004.05150v2), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 4+ trillion tokens.
33
 
34
  - **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
35
  - **Paper:** TODO
 
47
  # pip install -q transformers # TODO: from main
48
  from transformers import AutoModelForCausalLM, AutoTokenizer
49
 
50
+ checkpoint = "bigcode/starcoder2-15b"
51
  device = "cuda" # for GPU usage or "cpu" for CPU usage
52
 
53
  tokenizer = AutoTokenizer.from_pretrained(checkpoint)
54
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
55
 
56
  inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
57
  outputs = model.generate(inputs)
 
71
  ## Model
72
 
73
  - **Architecture:** Transformer decoder with grouped-query and sliding window attention and Fill-in-the-Middle objective
74
+ - **Pretraining steps:** 1 million
75
  - **Pretraining tokens:** 4+ trillion
76
  - **Precision:** bfloat16
77
 
78
  ## Hardware
79
 
80
  - **GPUs:** 1024 A100
 
81
 
82
  ## Software
83
 
84
+ - **Framework:** [NeMo](https://github.com/NVIDIA/NeMo)
85
  - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
86
 
87
  # License