File size: 9,013 Bytes
444de03
e796387
 
 
 
 
 
 
444de03
e796387
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
language: ko
tags:
- KakaoBrain
- KoGPT
- GPT
- GPT3
license: cc-by-nc-4.0
---

# KoGPT

KakaoBrain's Pre-Trained Language Models. 

* KoGPT (Korean Generative Pre-trained Transformer)
  * [https://github.com/kakaobrain/kogpt](https://github.com/kakaobrain/kogpt)
  * [https://huggingface.co/kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)


## Model Descriptions

### KoGPT6B-ryan1.5b

* [\[huggingface\]\[kakaobrain/kogpt\]\[KoGPT6B-ryan1.5b\]](https://huggingface.co/kakaobrain/kogpt/tree/KoGPT6B-ryan1.5b)
* [\[huggingface\]\[kakaobrain/kogpt\]\[KoGPT6B-ryan1.5b-float16\]](https://huggingface.co/kakaobrain/kogpt/tree/KoGPT6B-ryan1.5b-float16)

| Hyperparameter       | Value         |
|:---------------------|--------------:|
| \\(n_{parameters}\\) | 6,166,502,400 |
| \\(n_{layers}\\)     | 28            |
| \\(d_{model}\\)      | 4,096         |
| \\(d_{ff}\\)         | 16,384        |
| \\(n_{heads}\\)      | 16            |
| \\(d_{head}\\)       | 256           |
| \\(n_{ctx}\\)        | 2,048         |
| \\(n_{vocab}\\)      | 64,512        |
| Positional Encoding  | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
| RoPE Dimensions      | 64            |


## Hardware requirements

### KoGPT6B-ryan1.5b

#### GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
* `32GB GPU RAM` in the required minimum memory size

### KoGPT6B-ryan1.5b-float16

#### GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
* half-precision requires NVIDIA GPUS based on Volta, Turing or Ampere
* `16GB GPU RAM` in the required minimum memory size


## Usage

### prompt
```bash
python -m kogpt --help
usage: KoGPT inference [-h] [--model MODEL] [--revision {KoGPT6B-ryan1.5b}]
                       [--device {cpu,cuda}] [-d]

KakaoBrain Korean(hangul) Generative Pre-Training Model

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         huggingface repo (default:kakaobrain/kogpt)
  --revision {KoGPT6B-ryan1.5b}
  --device {cpu,cuda}   (default:cuda)
  -d, --debug
```

```bash
python -m kogpt
prompt> μΈκ°„μ²˜λŸΌ μƒκ°ν•˜κ³ , ν–‰λ™ν•˜λŠ” '지λŠ₯'을 톡해 인λ₯˜κ°€ μ΄μ œκΉŒμ§€ 풀지 λͺ»ν–ˆλ˜
temperature(0.8)> 
max_length(128)> 64
μΈκ°„μ²˜λŸΌ μƒκ°ν•˜κ³ , ν–‰λ™ν•˜λŠ” '지λŠ₯'을 톡해 인λ₯˜κ°€ μ΄μ œκΉŒμ§€ 풀지 λͺ»ν–ˆλ˜ 문제의 해닡을 찾을 수 μžˆμ„ 것이닀. κ³Όν•™κΈ°μˆ μ΄ κ³ λ„λ‘œ λ°œλ‹¬ν•œ 21μ„ΈκΈ°λ₯Ό μ‚΄μ•„κ°ˆ 우리 μ•„μ΄λ“€μ—κ²Œ κ°€μž₯ ν•„μš”ν•œ 것은 사고λ ₯ ν›ˆλ ¨μ΄λ‹€. 사고λ ₯ ν›ˆλ ¨μ„ 톡해, 세상

prompt>  
...
```


### python
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM 

tokenizer = AutoTokenizer.from_pretrained(
  'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16',  # or float32 version: revision=KoGPT6B-ryan1.5b
  bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]'
)
model = AutoModelForCausalLM.from_pretrained(
  'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16',  # or float32 version: revision=KoGPT6B-ryan1.5b
  pad_token_id=tokenizer.eos_token_id,
  torch_dtype='auto', low_cpu_mem_usage=True
).to(device='cuda', non_blocking=True)
_ = model.eval()

prompt = 'μΈκ°„μ²˜λŸΌ μƒκ°ν•˜κ³ , ν–‰λ™ν•˜λŠ” \'지λŠ₯\'을 톡해 인λ₯˜κ°€ μ΄μ œκΉŒμ§€ 풀지 λͺ»ν–ˆλ˜'
with torch.no_grad():
  tokens = tokenizer.encode(prompt, return_tensors='pt').to(device='cuda', non_blocking=True)
  gen_tokens = model.generate(tokens, do_sample=True, temperature=0.8, max_length=64)
  generated = tokenizer.batch_decode(gen_tokens)[0]
  
print(generated)  # print: μΈκ°„μ²˜λŸΌ μƒκ°ν•˜κ³ , ν–‰λ™ν•˜λŠ” '지λŠ₯'을 톡해 인λ₯˜κ°€ μ΄μ œκΉŒμ§€ 풀지 λͺ»ν–ˆλ˜ 문제의 해닡을 찾을 수 μžˆμ„ 것이닀. κ³Όν•™κΈ°μˆ μ΄ κ³ λ„λ‘œ λ°œλ‹¬ν•œ 21μ„ΈκΈ°λ₯Ό μ‚΄μ•„κ°ˆ 우리 μ•„μ΄λ“€μ—κ²Œ κ°€μž₯ ν•„μš”ν•œ 것은 사고λ ₯ ν›ˆλ ¨μ΄λ‹€. 사고λ ₯ ν›ˆλ ¨μ„ 톡해, 세상
```


## Experiments

### In-context Few-Shots

| Models        | #params | NSMC (Acc.) | YNAT (F1) | KLUE-STS (F1) |
|:--------------|--------:|------------:|----------:|--------------:|
| HyperCLOVA[1] |    1.3B |        83.9 |      58.7 |          60.9 |
| HyperCLOVA[1] |    6.9B |        83.8 |      67.5 |          59.3 |
| HyperCLOVA[1] |   13.0B |        87.9 |      67.9 |          60.0 |
| HyperCLOVA[1] |   39.0B |        88.0 |      71.4 |          61.6 |
| HyperCLOVA[1] |   82.0B |    **88.2** |      72.7 |      **65.1** |
| **Ours**      |    6.0B |        87.8 |  **78.0** |          64.3 |


### Finetuning / P-Tuning


We have been reported to have issues(https://github.com/kakaobrain/kogpt/issues/17) with our downstream evaluation.

The previously published performance evaluation table was deleted because it was difficult to see it as a fair comparison because the comparison target algorithm was different and the performance measurement method could not be confirmed.

You can refer to the above issue link for the existing performance evaluation table and troubleshooting results.



## Limitations

KakaoBrain `KoGPT` was trained on `rayn dataset`, a dataset known to contain profanity, lewd, political changed, and other harsh language.
Therefore, `KoGPT` can generate socially unacceptable texts. As with all language models, It is difficult to predict in advance how `KoGPT` will response to particular prompts and offensive content without warning.

Primarily Korean: `KoGPT` is primarily trained on Korean texts, and is best for classifying, searching, summarizing or generating such texts.
`KoGPT` by default perform worse on inputs that are different from the data distribution it is trained on, including non-Korean as well as specific dialects of Korean that are not well represented in the training data.

[comment]: <> (If abnormal or socially unacceptable text is generated during testing, please send a "prompt" and the "generated text" to [kogpt-report@kakaobrain.com]&#40;mailto:kogpt-report@kakaobrain.com&#41;.  )



카카였브레인 `KoGPT`λŠ” μš•μ„€, μŒλž€, μ •μΉ˜μ  λ‚΄μš© 및 기타 거친 언어에 λŒ€ν•œ 처리λ₯Ό ν•˜μ§€ μ•Šμ€ `rayn dataset`으둜 ν•™μŠ΅ν•˜μ˜€μŠ΅λ‹ˆλ‹€.
λ”°λΌμ„œ `KoGPT`λŠ” μ‚¬νšŒμ μœΌλ‘œ μš©μΈλ˜μ§€ μ•Šμ€ ν…μŠ€νŠΈλ₯Ό 생성할 수 μžˆμŠ΅λ‹ˆλ‹€. λ‹€λ₯Έ μ–Έμ–΄ λͺ¨λΈκ³Ό λ§ˆμ°¬κ°€μ§€λ‘œ νŠΉμ • ν”„λ‘¬ν”„νŠΈμ™€ 곡격적인 μ½˜ν…μΈ μ— μ–΄λ– ν•œ κ²°κ³Όλ₯Ό 생성할지 사전에 νŒŒμ•…ν•˜κΈ° μ–΄λ ΅μŠ΅λ‹ˆλ‹€.

`KoGPT`λŠ” 주둜 ν•œκ΅­μ–΄ ν…μŠ€νŠΈλ‘œ ν•™μŠ΅μ„ ν•˜μ˜€μœΌλ©° μ΄λŸ¬ν•œ ν…μŠ€νŠΈλ₯Ό λΆ„λ₯˜, 검색, μš”μ•½ λ˜λŠ” μƒμ„±ν•˜λŠ”λ° κ°€μž₯ μ ν•©ν•©λ‹ˆλ‹€.
기본적으둜 `KoGPT`λŠ” ν•™μŠ΅ 데이터에 잘 λ‚˜νƒ€λ‚˜μ§€ μ•ŠλŠ” λ°©μ–ΈλΏλ§Œμ•„λ‹ˆλΌ ν•œκ΅­μ–΄κ°€ μ•„λ‹Œ κ²½μš°μ™€ 같이 ν•™μŠ΅ λ°μ΄ν„°μ—μ„œ λ°œκ²¬ν•˜κΈ° μ–΄λ €μš΄ μž…λ ₯μ—μ„œ 쒋지 μ•Šμ€ μ„±λŠ₯을 λ³΄μž…λ‹ˆλ‹€.

[comment]: <> (ν…ŒμŠ€νŠΈμ€‘μ— λ°œμƒν•œ 비정상적인 ν˜Ήμ€ μ‚¬νšŒμ μœΌλ‘œ μš©μΈλ˜μ§€ μ•ŠλŠ” ν…μŠ€νŠΈκ°€ μƒμ„±λœ 경우 [kogpt-report@kakaobrain.com]&#40;mailto:kogpt-report@kakaobrain.com&#41;둜 "prompt"와 "μƒμ„±λœ λ¬Έμž₯"을 ν•¨κ»˜ λ³΄λ‚΄μ£Όμ‹œκΈ° λ°”λžλ‹ˆλ‹€.)


## Citation

If you apply this library or model to any project and research, please cite our code:

```
@misc{kakaobrain2021kogpt,
  title         = {KoGPT: KakaoBrain Korean(hangul) Generative Pre-trained Transformer},
  author        = {Ildoo Kim and Gunsoo Han and Jiyeon Ham and Woonhyuk Baek},
  year          = {2021},
  howpublished  = {\url{https://github.com/kakaobrain/kogpt}},
}
```


## Contact

This is released as an open source in the hope that it will be helpful to many research institutes and startups for research purposes. We look forward to contacting us from various places who wish to cooperate with us. 

[contact@kakaobrain.com](mailto:contact@kakaobrain.com)


## License

The `source code` of KakaoBrain `KoGPT` are licensed under [Apache 2.0](LICENSE.apache-2.0) License.   
The `pretrained wieghts` of KakaoBrain `KoGPT` are licensed under [CC-BY-NC-ND 4.0 License](https://creativecommons.org/licenses/by-nc-nd/4.0/) License.

카카였브레인 `KoGPT`의 `μ†ŒμŠ€μ½”λ“œ(source code)`λŠ” [Apache 2.0](LICENSE.apache-2.0) λΌμ΄μ„ μŠ€ ν•˜μ— κ³΅κ°œλ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.   
카카였브레인 `KoGPT`의 `μ‚¬μ „ν•™μŠ΅λœ κ°€μ€‘μΉ˜(pretrained weights)`λŠ” [CC-BY-NC-ND 4.0 λΌμ΄μ„ μŠ€](https://creativecommons.org/licenses/by-nc-nd/4.0/) λΌμ΄μ„ μŠ€ ν•˜μ— κ³΅κ°œλ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.   
λͺ¨λΈ 및 μ½”λ“œ, μ‚¬μ „ν•™μŠ΅λœ κ°€μ€‘μΉ˜λ₯Ό μ‚¬μš©ν•  경우 λΌμ΄μ„ μŠ€ λ‚΄μš©μ„ μ€€μˆ˜ν•΄ μ£Όμ‹­μ‹œμ˜€. λΌμ΄μ„ μŠ€ 전문은 [Apache 2.0](LICENSE.apache-2.0), [LICENSE.cc-by-nc-nd-4.0](LICENSE.cc-by-nc-nd-4.0) νŒŒμΌμ—μ„œ ν™•μΈν•˜μ‹€ 수 μžˆμŠ΅λ‹ˆλ‹€.


## References

[1] [HyperCLOVA](https://arxiv.org/abs/2109.04650): Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).