|
--- |
|
license: cc-by-sa-4.0 |
|
language: |
|
- ko |
|
tags: |
|
- korean |
|
--- |
|
|
|
# **KoBigBird-RoBERTa-large** |
|
|
|
This is a large-sized Korean BigBird model introduced in our [paper](https://arxiv.org/abs/2309.10339). |
|
The model draws heavily from the parameters of [klue/roberta-large](https://huggingface.co/klue/roberta-large) to ensure high performance. |
|
By employing the BigBird architecture and incorporating the newly proposed TAPER, the language model accommodates even longer input lengths. |
|
|
|
### How to Use |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForMaskedLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("vaiv/kobigbird-roberta-large") |
|
model = AutoModelForMaskedLM.from_pretrained("vaiv/kobigbird-roberta-large") |
|
``` |
|
|
|
### Hyperparameters |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62ce3886a9be5c195564fd71/bhuidw3bNQZbE2tzVcZw_.png) |
|
|
|
### Results |
|
|
|
Measurement on validation sets of the KLUE benchmark datasets |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62ce3886a9be5c195564fd71/50jMYggkGVUM06n2v1Hxm.png) |
|
|
|
### Limitations |
|
While our model achieves great results even without additional pretraining, further pretraining can refine the positional representations more. |
|
|
|
## Citation Information |
|
|
|
```bibtex |
|
@article{yang2023kobigbird, |
|
title={KoBigBird-large: Transformation of Transformer for Korean Language Understanding}, |
|
author={Yang, Kisu and Jang, Yoonna and Lee, Taewoo and Seong, Jinwoo and Lee, Hyungjin and Jang, Hwanseok and Lim, Heuiseok}, |
|
journal={arXiv preprint arXiv:2309.10339}, |
|
year={2023} |
|
} |
|
``` |