File size: 9,013 Bytes
444de03 e796387 444de03 e796387 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
---
language: ko
tags:
- KakaoBrain
- KoGPT
- GPT
- GPT3
license: cc-by-nc-4.0
---
# KoGPT
KakaoBrain's Pre-Trained Language Models.
* KoGPT (Korean Generative Pre-trained Transformer)
* [https://github.com/kakaobrain/kogpt](https://github.com/kakaobrain/kogpt)
* [https://huggingface.co/kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)
## Model Descriptions
### KoGPT6B-ryan1.5b
* [\[huggingface\]\[kakaobrain/kogpt\]\[KoGPT6B-ryan1.5b\]](https://huggingface.co/kakaobrain/kogpt/tree/KoGPT6B-ryan1.5b)
* [\[huggingface\]\[kakaobrain/kogpt\]\[KoGPT6B-ryan1.5b-float16\]](https://huggingface.co/kakaobrain/kogpt/tree/KoGPT6B-ryan1.5b-float16)
| Hyperparameter | Value |
|:---------------------|--------------:|
| \\(n_{parameters}\\) | 6,166,502,400 |
| \\(n_{layers}\\) | 28 |
| \\(d_{model}\\) | 4,096 |
| \\(d_{ff}\\) | 16,384 |
| \\(n_{heads}\\) | 16 |
| \\(d_{head}\\) | 256 |
| \\(n_{ctx}\\) | 2,048 |
| \\(n_{vocab}\\) | 64,512 |
| Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
| RoPE Dimensions | 64 |
## Hardware requirements
### KoGPT6B-ryan1.5b
#### GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
* `32GB GPU RAM` in the required minimum memory size
### KoGPT6B-ryan1.5b-float16
#### GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
* half-precision requires NVIDIA GPUS based on Volta, Turing or Ampere
* `16GB GPU RAM` in the required minimum memory size
## Usage
### prompt
```bash
python -m kogpt --help
usage: KoGPT inference [-h] [--model MODEL] [--revision {KoGPT6B-ryan1.5b}]
[--device {cpu,cuda}] [-d]
KakaoBrain Korean(hangul) Generative Pre-Training Model
optional arguments:
-h, --help show this help message and exit
--model MODEL huggingface repo (default:kakaobrain/kogpt)
--revision {KoGPT6B-ryan1.5b}
--device {cpu,cuda} (default:cuda)
-d, --debug
```
```bash
python -m kogpt
prompt> μΈκ°μ²λΌ μκ°νκ³ , νλνλ 'μ§λ₯'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ
temperature(0.8)>
max_length(128)> 64
μΈκ°μ²λΌ μκ°νκ³ , νλνλ 'μ§λ₯'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ λ¬Έμ μ ν΄λ΅μ μ°Ύμ μ μμ κ²μ΄λ€. κ³ΌνκΈ°μ μ΄ κ³ λλ‘ λ°λ¬ν 21μΈκΈ°λ₯Ό μ΄μκ° μ°λ¦¬ μμ΄λ€μκ² κ°μ₯ νμν κ²μ μ¬κ³ λ ₯ νλ ¨μ΄λ€. μ¬κ³ λ ₯ νλ ¨μ ν΅ν΄, μΈμ
prompt>
...
```
### python
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b
bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]'
)
model = AutoModelForCausalLM.from_pretrained(
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b
pad_token_id=tokenizer.eos_token_id,
torch_dtype='auto', low_cpu_mem_usage=True
).to(device='cuda', non_blocking=True)
_ = model.eval()
prompt = 'μΈκ°μ²λΌ μκ°νκ³ , νλνλ \'μ§λ₯\'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ'
with torch.no_grad():
tokens = tokenizer.encode(prompt, return_tensors='pt').to(device='cuda', non_blocking=True)
gen_tokens = model.generate(tokens, do_sample=True, temperature=0.8, max_length=64)
generated = tokenizer.batch_decode(gen_tokens)[0]
print(generated) # print: μΈκ°μ²λΌ μκ°νκ³ , νλνλ 'μ§λ₯'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ λ¬Έμ μ ν΄λ΅μ μ°Ύμ μ μμ κ²μ΄λ€. κ³ΌνκΈ°μ μ΄ κ³ λλ‘ λ°λ¬ν 21μΈκΈ°λ₯Ό μ΄μκ° μ°λ¦¬ μμ΄λ€μκ² κ°μ₯ νμν κ²μ μ¬κ³ λ ₯ νλ ¨μ΄λ€. μ¬κ³ λ ₯ νλ ¨μ ν΅ν΄, μΈμ
```
## Experiments
### In-context Few-Shots
| Models | #params | NSMC (Acc.) | YNAT (F1) | KLUE-STS (F1) |
|:--------------|--------:|------------:|----------:|--------------:|
| HyperCLOVA[1] | 1.3B | 83.9 | 58.7 | 60.9 |
| HyperCLOVA[1] | 6.9B | 83.8 | 67.5 | 59.3 |
| HyperCLOVA[1] | 13.0B | 87.9 | 67.9 | 60.0 |
| HyperCLOVA[1] | 39.0B | 88.0 | 71.4 | 61.6 |
| HyperCLOVA[1] | 82.0B | **88.2** | 72.7 | **65.1** |
| **Ours** | 6.0B | 87.8 | **78.0** | 64.3 |
### Finetuning / P-Tuning
We have been reported to have issues(https://github.com/kakaobrain/kogpt/issues/17) with our downstream evaluation.
The previously published performance evaluation table was deleted because it was difficult to see it as a fair comparison because the comparison target algorithm was different and the performance measurement method could not be confirmed.
You can refer to the above issue link for the existing performance evaluation table and troubleshooting results.
## Limitations
KakaoBrain `KoGPT` was trained on `rayn dataset`, a dataset known to contain profanity, lewd, political changed, and other harsh language.
Therefore, `KoGPT` can generate socially unacceptable texts. As with all language models, It is difficult to predict in advance how `KoGPT` will response to particular prompts and offensive content without warning.
Primarily Korean: `KoGPT` is primarily trained on Korean texts, and is best for classifying, searching, summarizing or generating such texts.
`KoGPT` by default perform worse on inputs that are different from the data distribution it is trained on, including non-Korean as well as specific dialects of Korean that are not well represented in the training data.
[comment]: <> (If abnormal or socially unacceptable text is generated during testing, please send a "prompt" and the "generated text" to [kogpt-report@kakaobrain.com](mailto:kogpt-report@kakaobrain.com). )
μΉ΄μΉ΄μ€λΈλ μΈ `KoGPT`λ μμ€, μλ, μ μΉμ λ΄μ© λ° κΈ°ν κ±°μΉ μΈμ΄μ λν μ²λ¦¬λ₯Ό νμ§ μμ `rayn dataset`μΌλ‘ νμ΅νμμ΅λλ€.
λ°λΌμ `KoGPT`λ μ¬νμ μΌλ‘ μ©μΈλμ§ μμ ν
μ€νΈλ₯Ό μμ±ν μ μμ΅λλ€. λ€λ₯Έ μΈμ΄ λͺ¨λΈκ³Ό λ§μ°¬κ°μ§λ‘ νΉμ ν둬ννΈμ 곡격μ μΈ μ½ν
μΈ μ μ΄λ ν κ²°κ³Όλ₯Ό μμ±ν μ§ μ¬μ μ νμ
νκΈ° μ΄λ ΅μ΅λλ€.
`KoGPT`λ μ£Όλ‘ νκ΅μ΄ ν
μ€νΈλ‘ νμ΅μ νμμΌλ©° μ΄λ¬ν ν
μ€νΈλ₯Ό λΆλ₯, κ²μ, μμ½ λλ μμ±νλλ° κ°μ₯ μ ν©ν©λλ€.
κΈ°λ³Έμ μΌλ‘ `KoGPT`λ νμ΅ λ°μ΄ν°μ μ λνλμ§ μλ λ°©μΈλΏλ§μλλΌ νκ΅μ΄κ° μλ κ²½μ°μ κ°μ΄ νμ΅ λ°μ΄ν°μμ λ°κ²¬νκΈ° μ΄λ €μ΄ μ
λ ₯μμ μ’μ§ μμ μ±λ₯μ 보μ
λλ€.
[comment]: <> (ν
μ€νΈμ€μ λ°μν λΉμ μμ μΈ νΉμ μ¬νμ μΌλ‘ μ©μΈλμ§ μλ ν
μ€νΈκ° μμ±λ κ²½μ° [kogpt-report@kakaobrain.com](mailto:kogpt-report@kakaobrain.com)λ‘ "prompt"μ "μμ±λ λ¬Έμ₯"μ ν¨κ» 보λ΄μ£ΌμκΈ° λ°λλλ€.)
## Citation
If you apply this library or model to any project and research, please cite our code:
```
@misc{kakaobrain2021kogpt,
title = {KoGPT: KakaoBrain Korean(hangul) Generative Pre-trained Transformer},
author = {Ildoo Kim and Gunsoo Han and Jiyeon Ham and Woonhyuk Baek},
year = {2021},
howpublished = {\url{https://github.com/kakaobrain/kogpt}},
}
```
## Contact
This is released as an open source in the hope that it will be helpful to many research institutes and startups for research purposes. We look forward to contacting us from various places who wish to cooperate with us.
[contact@kakaobrain.com](mailto:contact@kakaobrain.com)
## License
The `source code` of KakaoBrain `KoGPT` are licensed under [Apache 2.0](LICENSE.apache-2.0) License.
The `pretrained wieghts` of KakaoBrain `KoGPT` are licensed under [CC-BY-NC-ND 4.0 License](https://creativecommons.org/licenses/by-nc-nd/4.0/) License.
μΉ΄μΉ΄μ€λΈλ μΈ `KoGPT`μ `μμ€μ½λ(source code)`λ [Apache 2.0](LICENSE.apache-2.0) λΌμ΄μ μ€ νμ 곡κ°λμ΄ μμ΅λλ€.
μΉ΄μΉ΄μ€λΈλ μΈ `KoGPT`μ `μ¬μ νμ΅λ κ°μ€μΉ(pretrained weights)`λ [CC-BY-NC-ND 4.0 λΌμ΄μ μ€](https://creativecommons.org/licenses/by-nc-nd/4.0/) λΌμ΄μ μ€ νμ 곡κ°λμ΄ μμ΅λλ€.
λͺ¨λΈ λ° μ½λ, μ¬μ νμ΅λ κ°μ€μΉλ₯Ό μ¬μ©ν κ²½μ° λΌμ΄μ μ€ λ΄μ©μ μ€μν΄ μ£Όμμμ€. λΌμ΄μ μ€ μ λ¬Έμ [Apache 2.0](LICENSE.apache-2.0), [LICENSE.cc-by-nc-nd-4.0](LICENSE.cc-by-nc-nd-4.0) νμΌμμ νμΈνμ€ μ μμ΅λλ€.
## References
[1] [HyperCLOVA](https://arxiv.org/abs/2109.04650): Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).
|