kf-deberta-base / README.md
Arloha's picture
initial commit
363b171
|
raw
history blame
8.7 kB
---
license: mit
language:
- ko
pipeline_tag: fill-mask
---
# KF-DeBERTa
์นด์นด์˜ค๋ฑ…ํฌ & ์—ํ”„์—”๊ฐ€์ด๋“œ์—์„œ ํ•™์Šตํ•œ ๊ธˆ์œต ๋„๋ฉ”์ธ ํŠนํ™” ์–ธ์–ด๋ชจ๋ธ์„ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.
## Model description
* KF-DeBERTa๋Š” ๋ฒ”์šฉ ๋„๋ฉ”์ธ ๋ง๋ญ‰์น˜์™€ ๊ธˆ์œต ๋„๋ฉ”์ธ ๋ง๋ญ‰์น˜๋ฅผ ํ•จ๊ป˜ ํ•™์Šตํ•œ ์–ธ์–ด๋ชจ๋ธ ์ž…๋‹ˆ๋‹ค.
* ๋ชจ๋ธ ์•„ํ‚คํ…์ณ๋Š” [DeBERTa-v2](https://github.com/microsoft/DeBERTa#whats-new-in-v2)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
* ELECTRA์˜ RTD๋ฅผ training objective๋กœ ์‚ฌ์šฉํ•œ DeBERTa-v3๋Š” ์ผ๋ถ€ task(KLUE-RE, WoS, Retrieval)์—์„œ ์ƒ๋‹นํžˆ ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์—ฌ ์ตœ์ข… ์•„ํ‚คํ…์ณ๋Š” DeBERTa-v2๋กœ ๊ฒฐ์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
* ๋ฒ”์šฉ ๋„๋ฉ”์ธ ๋ฐ ๊ธˆ์œต ๋„๋ฉ”์ธ downstream task์—์„œ ๋ชจ๋‘ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.
* ๊ธˆ์œต ๋„๋ฉ”์ธ downstream task์˜ ์ฒ ์ €ํ•œ ์„ฑ๋Šฅ๊ฒ€์ฆ์„ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด ๊ฒ€์ฆ์„ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
* ๋ฒ”์šฉ ๋„๋ฉ”์ธ ๋ฐ ๊ธˆ์œต ๋„๋ฉ”์ธ์—์„œ ๊ธฐ์กด ์–ธ์–ด๋ชจ๋ธ๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ์œผ๋ฉฐ ํŠนํžˆ KLUE Benchmark์—์„œ๋Š” RoBERTa-Large๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.
## Usage
```python3
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("kakaobank/kf-deberta-base")
tokenizer = AutoTokenizer.from_pretrained("kakaobank/kf-deberta-base")
text = "์นด์นด์˜ค๋ฑ…ํฌ์™€ ์—ํ”„์—”๊ฐ€์ด๋“œ๊ฐ€ ๊ธˆ์œตํŠนํ™” ์–ธ์–ด๋ชจ๋ธ์„ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค."
tokens = tokenizer.tokenize(text)
print(tokens)
inputs = tokenizer(text, return_tensors="pt")
model_output = model(**inputs)
print(model_output)
```
## Benchmark
* ๋ชจ๋“  task๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ๊ธฐ๋ณธ์ ์ธ hyperparameter search๋งŒ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
* batch size: {16, 32}
* learning_rate: {1e-5, 3e-5, 5e-5}
* weight_decay: {0, 0.01}
* warmup_proportion: {0, 0.1}
**KLUE Benchmark**
| Model | YNAT | KLUE-ST | KLUE-NLI | KLUE-NER | KLUE-RE | KLUE-DP | KLUE-MRC | WoS | AVG |
|:--------------------:|:----------------:|:----------------------:|:------------:|:---------------------------------:|:-----------------------------:|:----------------------:|:-------------------------:|:----------------------:|:----------------:|
| | F1 | Pearsonr/F1 | ACC | F1-Entity/F1-Char | F1-micro/AUC | UAS/LAS | EM/ROUGE | JGA/F1-S | |
| mBERT (Base) | 82.64 | 82.97/75.93 | 72.90 | 75.56/88.81 | 58.39/56.41 | 88.53/86.04 | 49.96/55.57 | 35.27/88.60 | 71.26 |
| XLM-R (Base) | 84.52 | 88.88/81.20 | 78.23 | 80.48/92.14 | 57.62/57.05 | 93.12/87.23 | 26.76/53.36 | 41.54/89.81 | 72.28 |
| XLM-R (Large) | 87.30 | 93.08/87.17 | 86.40 | 82.18/93.20 | 58.75/63.53 | 92.87/87.82 | 35.23/66.55 | 42.44/89.88 | 76.17 |
| KR-BERT (Base) | 85.36 | 87.50/77.92 | 77.10 | 74.97/90.46 | 62.83/65.42 | 92.87/87.13 | 48.95/58.38 | 45.60/90.82 | 74.67 |
| KoELECTRA (Base) | 85.99 | 93.14/85.89 | 86.87 | 86.06/92.75 | 62.67/57.46 | 90.93/87.07 | 59.54/65.64 | 39.83/88.91 | 77.34 |
| KLUE-BERT (Base) | 86.95 | 91.01/83.44 | 79.87 | 83.71/91.17 | 65.58/68.11 | 93.07/87.25 | 62.42/68.15 | 46.72/91.59 | 78.50 |
| KLUE-RoBERTa (Small) | 85.95 | 91.70/85.42 | 81.00 | 83.55/91.20 | 61.26/60.89 | 93.47/87.50 | 58.25/63.56 | 46.65/91.50 | 77.28 |
| KLUE-RoBERTa (Base) | 86.19 | 92.91/86.78 | 86.30 | 83.81/91.09 | 66.73/68.11 | 93.75/87.77 | 69.56/74.64 | 47.41/91.60 | 80.48 |
| KLUE-RoBERTa (Large) | 85.88 | 93.20/86.13 | **89.50** | 84.54/91.45 | **71.06**/73.33 | 93.84/87.93 | **75.26**/**80.30** | 49.39/92.19 | 82.43 |
| KF-DeBERTa (Base) | **<u>87.51</u>** | **<u>93.24/87.73</u>** | <u>88.37</u> | **<u>89.17</u>**/**<u>93.30</u>** | <u>69.70</u>/**<u>75.07</u>** | **<u>94.05/87.97</u>** | <u>72.59</u>/<u>78.08</u> | **<u>50.21/92.59</u>** | **<u>82.83</u>** |
* ๊ตต์€๊ธ€์”จ๋Š” ๋ชจ๋“  ๋ชจ๋ธ์ค‘ ๊ฐ€์žฅ๋†’์€ ์ ์ˆ˜์ด๋ฉฐ, ๋ฐ‘์ค„์€ base ๋ชจ๋ธ ์ค‘ ๊ฐ€์žฅ ๋†’์€ ์ ์ˆ˜์ž…๋‹ˆ๋‹ค.
**๊ธˆ์œต๋„๋ฉ”์ธ ๋ฒค์น˜๋งˆํฌ**
| Model | FN-Sentiment (v1) | FN-Sentiment (v2) | FN-Adnews | FN-NER | KorFPB | KorFiQA-SA | KorHeadline | Avg (FiQA-SA ์ œ์™ธ) |
|:-------------------:|:-----------------:|:-----------------:|:---------:|:---------:|:---------:|:----------:|:-----------:|:-----------------:|
| | ACC | ACC | ACC | F1-micro | ACC | MSE | Mean F1 | |
| KLUE-RoBERTa (Base) | 98.26 | 91.21 | 96.34 | 90.31 | 90.97 | 0.0589 | 81.11 | 94.03 |
| KoELECTRA (Base) | 98.26 | 90.56 | 96.98 | 89.81 | 92.36 | 0.0652 | 80.69 | 93.90 |
| KF-DeBERTa (Base) | **99.36** | **92.29** | **97.63** | **91.80** | **93.47** | **0.0553** | **82.12** | **95.27** |
* **FN-Sentiment**: ๊ธˆ์œต๋„๋ฉ”์ธ ๊ฐ์„ฑ๋ถ„์„
* **FN-Adnews**: ๊ธˆ์œต๋„๋ฉ”์ธ ๊ด‘๊ณ ์„ฑ๊ธฐ์‚ฌ ๋ถ„๋ฅ˜
* **FN-NER**: ๊ธˆ์œต๋„๋ฉ”์ธ ๊ฐœ์ฒด๋ช…์ธ์‹
* **KorFPB**: FinancialPhraseBank ๋ฒˆ์—ญ๋ฐ์ดํ„ฐ
* Cite: ```Malo, Pekka, et al. "Good debt or bad debt: Detecting semantic orientations in economic texts." Journal of the Association for Information Science and Technology 65.4 (2014): 782-796.```
* **KorFiQA-SA**: FiQA-SA ๋ฒˆ์—ญ๋ฐ์ดํ„ฐ
* Cite: ```Maia, Macedo & Handschuh, Siegfried & Freitas, Andre & Davis, Brian & McDermott, Ross & Zarrouk, Manel & Balahur, Alexandra. (2018). WWW'18 Open Challenge: Financial Opinion Mining and Question Answering. WWW '18: Companion Proceedings of the The Web Conference 2018. 1941-1942. 10.1145/3184558.3192301.```
* **KorHeadline**: Gold Commodity News and Dimensions ๋ฒˆ์—ญ๋ฐ์ดํ„ฐ
* Cite: ```Sinha, A., & Khandait, T. (2021, April). Impact of News on the Commodity Market: Dataset and Results. In
Future of Information and Communication Conference (pp. 589-601). Springer, Cham.```
**๋ฒ”์šฉ๋„๋ฉ”์ธ ๋ฒค์น˜๋งˆํฌ**
| Model | NSMC | PAWS | KorNLI | KorSTS | KorQuAD | Avg (KorQuAD ์ œ์™ธ) |
|:-------------------:|:---------:|:---------:|:---------:|:---------:|:---------------:|:----------------:|
| | ACC | ACC | ACC | spearman | EM/F1 | |
| KLUE-RoBERTa (Base) | 90.47 | 84.79 | 81.65 | 84.40 | 86.34/94.40 | 85.33 |
| KoELECTRA (Base) | 90.63 | 84.45 | 82.24 | 85.53 | 84.83/93.45 | 85.71 |
| KF-DeBERTa (Base) | **91.36** | **86.14** | **84.54** | **85.99** | **86.60/95.07** | **87.01** |
## License
KF-DeBERTa์˜ ์†Œ์Šค์ฝ”๋“œ ๋ฐ ๋ชจ๋ธ์€ MIT ๋ผ์ด์„ ์Šค ํ•˜์— ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
๋ผ์ด์„ ์Šค ์ „๋ฌธ์€ [MIT ํŒŒ์ผ](LICENSE)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ชจ๋ธ์˜ ์‚ฌ์šฉ์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•œ ์–ด๋– ํ•œ ์†ํ•ด์— ๋Œ€ํ•ด์„œ๋„ ๋‹น์‚ฌ๋Š” ์ฑ…์ž„์„ ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
## Citation
```
@proceedings{jeon-etal-2023-kfdeberta,
title = {KF-DeBERTa: Financial Domain-specific Pre-trained Language Model},
author = {Eunkwang Jeon, Jungdae Kim, Minsang Song, and Joohyun Ryu},
booktitle = {Proceedings of the 35th Annual Conference on Human and Cognitive Language Technology},
moth = {oct},
year = {2023},
publisher = {Korean Institute of Information Scientists and Engineers},
url = {http://www.hclt.kr/symp/?lnb=conference},
pages = {143--148},
}
```