update architecture
Browse files- architectures/codegen.txt +20 -0
- architectures/polycoder.txt +9 -0
architectures/codegen.txt
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[CodeGen](https://huggingface.co/Salesforce/codegen-16B-mono) architecture follows a standard transformer decoder with left-to-right causal masking. With rotary position embedding for the positional encoding [(Su et al., 2021)](https://arxiv.org/abs/2104.09864), and a context length of 2048. CodeGen models are trained in various sizes.
|
2 |
+
|
3 |
+
|Model | # parameters |
|
4 |
+
| - | - |
|
5 |
+
| Decoder | 350M |
|
6 |
+
| Decoder | 2.7B |
|
7 |
+
| Decoder | 6.1B |
|
8 |
+
| Decoder | 16.1B |
|
9 |
+
|
10 |
+
You can load the model and tokenizer directly from [`transformers`](https://huggingface.co/docs/transformers/index):
|
11 |
+
|
12 |
+
```python
|
13 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
14 |
+
|
15 |
+
tokenizer = AutoTokenizer.from_pretrained('Salesforce/codegen-16B-mono')
|
16 |
+
model = AutoModelForCausalLM.from_pretrained('Salesforce/codegen-16B-mono')
|
17 |
+
|
18 |
+
inputs = tokenizer("def hello_world():", return_tensors="pt")
|
19 |
+
outputs = model(**inputs)
|
20 |
+
```
|
architectures/polycoder.txt
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[PolyCoder](https://github.com/VHellendoorn/Code-LMs) uses GPT2 architecture, with BPE tokenizer trained on a random 5% subset of the data (all languages), and a context mength of 2048. To study the effect of scaling of model size, the odel was trained in 3 different sizes.
|
2 |
+
|
3 |
+
|Model | # parameters |
|
4 |
+
| - | - |
|
5 |
+
| GPT2 | 160M |
|
6 |
+
| GPT2 | 400M |
|
7 |
+
| GPT2 | 2.7B |
|
8 |
+
|
9 |
+
PolyCoder is currently being integrated in `transformers`. Meanwhile it can be loaded following the instructions in the original Github [repo](https://github.com/vhellendoorn/code-lms#models).
|