vaiv
/

kobigbird-roberta-large

Inference Endpoints

Model card Files Files and versions Community

kobigbird-roberta-large / README.md

ksyang's picture

Update README.md

5e5006f about 1 year ago

|

history blame contribute delete

1.58 kB

	---
	license: cc-by-sa-4.0
	language:
	- ko
	tags:
	- korean
	---

	# KoBigBird-RoBERTa-large

	This is a large-sized Korean BigBird model introduced in our [paper](https://arxiv.org/abs/2309.10339).
	The model draws heavily from the parameters of [klue/roberta-large](https://huggingface.co/klue/roberta-large) to ensure high performance.
	By employing the BigBird architecture and incorporating the newly proposed TAPER, the language model accommodates even longer input lengths.

	### How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForMaskedLM

	tokenizer = AutoTokenizer.from_pretrained("vaiv/kobigbird-roberta-large")
	model = AutoModelForMaskedLM.from_pretrained("vaiv/kobigbird-roberta-large")
	```

	### Hyperparameters

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62ce3886a9be5c195564fd71/bhuidw3bNQZbE2tzVcZw_.png)

	### Results

	Measurement on validation sets of the KLUE benchmark datasets

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62ce3886a9be5c195564fd71/50jMYggkGVUM06n2v1Hxm.png)

	### Limitations
	While our model achieves great results even without additional pretraining, further pretraining can refine the positional representations more.

	## Citation Information

	```bibtex
	@article{yang2023kobigbird,
	title={KoBigBird-large: Transformation of Transformer for Korean Language Understanding},
	author={Yang, Kisu and Jang, Yoonna and Lee, Taewoo and Seong, Jinwoo and Lee, Hyungjin and Jang, Hwanseok and Lim, Heuiseok},
	journal={arXiv preprint arXiv:2309.10339},
	year={2023}
	}
	```