Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for GBST-KEByT5-large (1.23B #params)

KEByT5: Korean-Enhanced/Enriched Byte-level Text-to-Text Transfer Transformer(T5)์˜ GBST ๋ฒ„์ „์œผ๋กœ, CharFormer(Tay et al., 2021)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

ํ•œ๊ตญ์–ด๋ฅผ ์œ„ํ•ด ํ† ํฐ ํ›„๋ณด ๊ตฌ๊ฐ„์„ (1, 2, 3, 6, 9) ๋ฐ”์ดํŠธ ๋‹จ์œ„๋กœ ์ฒญํ‚นํ•˜์—ฌ ํ›„๋ณด๊ตฐ์„ ์ƒ์„ฑํ•˜๊ณ , GBST๋กœ ๋‚˜์˜จ ์†Œํ”„ํŠธ ์ž„๋ฒ ๋”ฉ ์‹œํ€€์Šค๋ฅผ 1/3๋กœ ๋‹ค์šด์ƒ˜ํ”Œ๋งํ•˜์—ฌ ํ•™์Šต ๋ฐ ์ถ”๋ก  ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.

Prerequirements / and Model Loading HOW-TO

๋ณธ ๋ชจ๋ธ์˜ ๊ตฌ๋™์„ ์œ„ํ•ด์„œ๋Š” GBSWT5 ๋ชจ๋“ˆ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

https://github.com/etri-crossmodal/gbswt5

์•„๋ž˜์™€ ๊ฐ™์ด pip๋ฅผ ํ†ตํ•ด ๋ชจ๋“ˆ์„ ์„ค์น˜ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์€ github๋ฅผ ์ฐธ์กฐํ•ด์ฃผ์‹ญ์‹œ์˜ค.

pip install git+https://github.com/etri-crossmodal/gbswt5.git

๋˜๋Š”, ์ตœ์‹  ๋ฒ„์ „์˜ Transformers์™€ ํ•จ๊ป˜, ๋ณ„๋„์˜ ์ฝ”๋“œ ์—†์ด ์•„๋ž˜์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋ธ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("etri-lirs/gbst-kebyt5-large-preview")
# ์•„๋ž˜์™€ ๊ฐ™์ด trust_remote_code=True๋ฅผ ๋ถ™์ž„์œผ๋กœ, ์ž๋™์œผ๋กœ ๊ด€๋ จ ์ฝ”๋“œ๋ฅผ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ณ  ์“ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
model = AutoModelForSeq2SeqLM.from_pretrained("etri-lirs/gbst-kebyt5-large-preview", trust_remote_code=True)

๋˜ํ•œ, ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ ํ•™์Šต ์‹œ, ์•„๋ž˜์˜ python ์ฝ”๋“œ์™€ ๊ฐ™์ด, GBST layer๋ฅผ frozen ํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

  gbst_frozen_target = ['encoder.embed_tokens.embeds.weight',
                        'encoder.embed_tokens.positional_convol.2.convol.weight',
                        'encoder.embed_tokens.positional_convol.2.convol.bias',
                        'encoder.embed_tokens.positional_convol.2.proj.weight',
                        'encoder.embed_tokens.positional_convol.2.proj.bias',
                        'encoder.embed_tokens.cand_scoring.0.weight',
                        'encoder.embed_tokens.cand_scoring.0.bias',
                        # embedding weight๋Š” frozen ํ•˜์ง€ ์•Š๋Š” ์ชฝ์ด ์ผ๋ฐ˜์ ์œผ๋กœ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„.
                        #'shared.weight',
                        ]
  print("** GBST Model found, freeze GBSWT layer for training downstream.")
  for name, param in self.model.named_parameters():
      if name in gbst_frozen_target:
          print(f"** freeze {name} layer.")
          param.requires_grad = False
      else:
          param.requires_grad = True

์ฐธ๊ณ ๋กœ, ๋ชจ๋ธ์— ํฌํ•จ๋œ ์›๊ฒฉ ์ฝ”๋“œ์—๋Š” ๋‹ค์Œ์˜ ์˜คํ”ˆ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • This software includes lucidrains/charformer-pytorch GitHub project for GBST implementation, which distributed under MIT License. Copyright (c) 2021 Phil Wang. all rights reserved. (Original Code URL: https://github.com/lucidrains/charformer-pytorch)
  • This software includes HuggingFace transformers's T5 implementation for GBST-enabled T5 model, which distributed under Apache 2.0 License. Copyright 2018- The Huggingface team. All rights reserved.

KEByT5: Korean-Enhanced/Enriched Byte-level Text-to-Text Transfer Transformer(T5)

ํฌ๋กœ์Šค๋ชจ๋‹ฌ ๋ฐ ๋‹ค๊ตญ์–ด ์นœํ™”์ ์ธ ํ•œ๊ตญ์–ด ์ค‘์‹ฌ์˜ ํ† ํฐ-ํ”„๋ฆฌ ์–ธ์–ด ์ดํ•ด ์ƒ์„ฑ ๋ชจ๋ธ (EN=Cross-modal, Multilingual Friendly, Token-free Encoder-Decoder Pretrained Language Model for Korean)

  • ๋ณธ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ์€ ์‹œ๊ฐ, ์ฒญ๊ฐ๊ณผ ๊ฐ™์€ ํ…์ŠคํŠธ ์ด์™ธ์˜ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์™€ ๊ต์ฐจ์–ธ์–ด ์ง€์‹ ๊ตํ™˜์— ์šฉ์ดํ•œ ํ† ํฐ-ํ”„๋ฆฌ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ณ„๋„์˜ tokenizer๊ฐ€ ํ•„์š”์—†์ง€๋งŒ, ํŽธ์˜๋ฅผ ์œ„ํ•ด AutoTokenizer.from_pretrained()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค๋ฅธ ํ† ํฌ๋‚˜์ด์ € ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”-๋””์ฝ”๋” ๋ชจ๋ธ๊ณผ ๋™์ผํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์ƒ๋žตํ•˜๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ, UTF-8 ์ž…๋ ฅ์„ ๋ฐ”์ดํŠธ ๋‹จ์œ„๋กœ ์ชผ๊ฐœ์–ด, ๊ฐ ๋ฐ”์ดํŠธ์— +3์„ ํ•˜์—ฌ Token ID๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (์ฆ‰, ASCII value 0 == Token ID 3, ASCII value 255 == Token ID 258)
  • ํ˜„์žฌ Preview ์Šคํ…Œ์ด์ง€์— ์žˆ๋Š” ๋ชจ๋ธ์ด๋ฉฐ, ํ™œ์šฉ์—๋Š” fine-tuning์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜๋””์–ธํŠธ ๊ธฐ๋ฐ˜ ์„œ๋ธŒ์›Œ๋“œ ํ† ํฐํ™” (Gradient-based Subword Tokenization; CharFormer; Tay et al., 2021;)๋ฅผ ์ ์šฉํ•œ ๋ณธ ๋ชจ๋ธ์€, KLUE-MRC์—์„œ ๊ฐ™์€ ๊ทœ๋ชจ์˜ KEByT5-base ๋ชจ๋ธ ๋Œ€๋น„ ํ•™์Šต์—์„œ 2.7๋ฐฐ, ์ถ”๋ก ์—์„œ 1.46๋ฐฐ ์ด์ƒ์˜ ํ•™์Šต ์†๋„๊ฐ€ ๊ฐœ์„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ถ€ ํ•™์Šต/์ถ”๋ก  ์„ฑ๋Šฅ์— ๋น„๊ต ๊ฐ€๋Šฅํ•œ ์ฐจ์ด๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ํ•˜์œ„ ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์ฐธ๊ณ ํ•˜์‹ญ์‹œ์˜ค.

Acknowledgements

  • ๋ณธ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ์€ 2022๋…„๋„ ์ •๋ถ€(๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€)์˜ ์žฌ์›์œผ๋กœ ์ •๋ณดํ†ต์‹ ๊ธฐํšํ‰๊ฐ€์›์˜ ์ง€์›์„ ๋ฐ›์•„ ์ˆ˜ํ–‰๋œ ์—ฐ๊ตฌ์ž„ (No. RS-2022-00187238, ํšจ์œจ์  ์‚ฌ์ „ํ•™์Šต์ด ๊ฐ€๋Šฅํ•œ ํ•œ๊ตญ์–ด ๋Œ€ํ˜• ์–ธ์–ด๋ชจ๋ธ ์‚ฌ์ „ํ•™์Šต ๊ธฐ์ˆ  ๊ฐœ๋ฐœ) (EN=This pretrained language model was supported by the Institute of Information & communication Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No. RS-2022-00187238, Development of Large Korean Language Model Technology for Efficient Pre-training))

Model Details

๋ณธ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ทœ๋ชจ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค:

  • kebyt5-small : 330M link
  • kebyt5-base : 580M link
  • kebyt5-large : 1.23B link
  • GBST-kebyt5-base : 584M link
  • GBST-kebyt5-large : 1.23B (this model)

์ด๋“ค ๋ชจ๋ธ์€ google/byt5-small, google/byt5-base, google/byt5-large ๋ชจ๋ธ๊ณผ ๋™์ผํ•œ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์™€ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๋ฉฐ, ํ† ํฌ๋‚˜์ด์ €(ByT5Tokenizer)์™€ ๊ตฌํ˜„ ์ƒ ๋‘ ๋ชจ๋ธ์€ ๋ณ„๋„์˜ ์ˆ˜์ •์—†์ด ๋ฐ”๋กœ ๊ตํ™˜ํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. huggingface transformers์—์„œ์˜ ์‚ฌ์šฉ๋ฒ• ์—ญ์‹œ, T5ForConditionalGeneration์„ ๋™์ผํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Model Description

  • Developed by: Language Intelligence Research Section, Electronics and Telecommunications Research Institute(ETRI)
  • Model type: Encoder-Decoder Transformer, specifically, ByT5.
  • Language(s) (NLP): Korean, English(partially for translation task), Chinese(partially for translation task), Japanese(partially for translation task).
  • License: Apache 2.0 License
  • Finetuned from model: kebyt5-small/-base/-xl model weights were initialized by google/byt5-* for Warm-start pretraining.

Model Sources

  • Repository: ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ ํ•™์Šต์„ ์œ„ํ•ด, https://github.com/etri-crossmodal/llm-downstream-s2s
  • Paper: ์‹ ์ข…ํ›ˆ ์™ธ, "ํ•œ๊ตญ์–ด ์ค‘์‹ฌ์˜ ํ† ํฐ-ํ”„๋ฆฌ ์–ธ์–ด ์ดํ•ด-์ƒ์„ฑ ๋ชจ๋ธ ์‚ฌ์ „ํ•™์Šต ์—ฐ๊ตฌ", ์ œ35ํšŒ ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘, pp.711-715. 2023. (EN=Shin et al., "Towards Korean-Centric Token-free Pretrained Language Model", in Procs. of the 35th Annual Conference on Human and Cognitive Language Technology. pp. 711-715. 2023.)

Uses

ํ•ด๋‹น ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ์€ ์—ฐ๊ตฌ ๋ฐ ๊ต์œก ๋ชฉ์ ์˜ ํ™œ์šฉ์œผ๋กœ ๊ทธ ์‚ฌ์šฉ ๋ชฉ์ ์ด ์ œํ•œ๋ฉ๋‹ˆ๋‹ค.

Direct Use

ํ˜„์žฌ ๊ณต๊ฐœ๋˜๋Š” ๋ชจ๋ธ์€ T5 ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉ๋œ Corrupted span denoising ๋งŒ์œผ๋กœ ํ•™์Šต๋˜์–ด ์žˆ์–ด, ์‹ค์ œ ์‘์šฉ ํƒœ์Šคํฌ์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” fine-tuning ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

Sentinel Token(token id 258, 257, 256, ...)์„ ์‚ฌ์šฉํ•˜์—ฌ Masked Token Prediction์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ์˜ˆ์ธก๋œ ๋‚ด์šฉ์—๋Š” ๋ถ€์ ์ ˆํ•œ ๋‚ด์šฉ์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Downstream Use [optional]

Token-free ๋ชจ๋ธ์˜ ํŠน์„ฑ ์ƒ, ๋ณต์žกํ•˜๊ฑฐ๋‚˜ Noisyํ•œ ์ž…๋ ฅ์— ๊ฐ•๊ฑดํ•˜๋ฉฐ, ์งง์€ ์‹œํ€€์Šค ๊ธธ์ด์˜ ์ƒ์„ฑ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. (์˜ˆ: ์–ธ์–ด ์ดํ•ด, ๋Œ€ํ™” ์‘๋‹ต ์ƒ์„ฑ)

์‚ฌ์ „ํ•™์Šต์€ 1024 bytes ๊ธธ์ด์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ๊ธด ์‹œํ€€์Šค๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฌธ์ œ์— ์ ํ•ฉํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋” ๊ธด ์‹œํ€€์Šค๋ฅผ ๋‹ค๋ค„์•ผ ํ•˜๋Š” ๋ฌธ์ œ์—์„œ๋Š”, GBST ๊ธฐ๋ฐ˜์˜ ํ† ํฐ-ํ”„๋ฆฌ ์–ธ์–ด๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

Bias, Risks, Limitations, and Recommendations

Masked Token Prediction์„ ํ†ตํ•ด ํš๋“๋  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด์—๋Š” ๋‹ค๋ฅธ ์ƒ์„ฑํ˜• ์–ธ์–ด๋ชจ๋ธ๊ณผ ๊ฐ™์€ ์œ„ํ—˜์„ ๊ฐ€์ง€๊ณ  ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ๋Š” ์š•์„ค, ์Œ๋ž€, ์ •์น˜์  ๋‚ด์šฉ ๋ฐ ๊ธฐํƒ€ ๊ฑฐ์นœ ์–ธ์–ด๋“ค์— ๋Œ€ํ•œ ๋ณ„๋„์˜ ์ฒ˜๋ฆฌ๊ฐ€ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋˜์ง€ ์•Š์€ ํ† ํฐ์ด๋‚˜ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ฃผ๋ณ€ ๋ฌธ๋งฅ์— ๋”ฐ๋ผ์„œ ๊ณต๊ฒฉ์ ์ธ ์ž…๋ ฅ์— ์–ด๋– ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์„์ง€ ์‰ฝ๊ฒŒ ์˜ˆ์ƒํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

ํ•œํŽธ, ๋ณธ ์–ธ์–ด๋ชจ๋ธ์€ ์ฃผ๋กœ ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ๋กœ ํ•™์Šต๋˜์—ˆ์œผ๋ฉฐ, ์ด๋“ค์˜ ํŠน์„ฑ์„ ์ „์ดํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ, ๊ทธ ์ค‘์—์„œ๋„ ๋ถ„๋ฅ˜, ์š”์•ฝ, ์งง์€ ๋ฌธ์žฅ ์ƒ์„ฑ์— ์ ํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž…์ถœ๋ ฅ ์ˆ˜์ค€์—์„œ ๋ฏธ๋“ฑ๋ก์–ด(Out-of-Vocabulary)๊ฐ€ ์กด์žฌํ•  ์ˆ˜ ์—†์œผ๋‚˜, ์‚ฌ์ „ํ•™์Šต๋˜์ง€ ์•Š์€ ํ…์ŠคํŠธ ์‹œํ€€์Šค์— ๋Œ€ํ•ด์„œ๋Š” ์ถ”๊ฐ€์˜ ๋„๋ฉ”์ธ ์ ์‘ ํ•™์Šต ๋ฐ ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ์˜ ๋ฏธ์„ธ์กฐ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

[More Information Needed]

How to Get Started with the Model

Transformers 4.27.0 ์ด์ƒ์˜ ๋ฒ„์ „์—์„œ, ๋‹ค์Œ์˜ ํŒŒ์ด์ฌ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ๊ณผ tokenizer๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒ๊ธฐ์— ์–ธ๊ธ‰๋œ ๋ฐ”์™€ ๊ฐ™์ด, transformer ๋ชจ๋“ˆ ๋กœ๋“œ ์ „ gbswt5 ๋ชจ๋“ˆ์„ import ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

import gbswt5
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("etri-lirs/gbst-kebyt5-base-preview")
model = AutoModelForSeq2SeqLM.from_pretrained("etri-lirs/gbst-kebyt5-base-preview")

Training Details

Training Data

๋ณธ ์‚ฌ์ „ํ•™์Šต์—๋Š” ์•„๋ž˜์˜ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค:

  • ๊ตญ๋ฆฝ๊ตญ์–ด์›, ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜. ์‹ ๋ฌธ v2.0
  • ๊ตญ๋ฆฝ๊ตญ์–ด์›, ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜. ๊ตฌ์–ด ๋ง๋ญ‰์น˜ v1.2
  • ๊ตญ๋ฆฝ๊ตญ์–ด์›, ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜. ๋ฌธ์–ด ๋ง๋ญ‰์น˜ v1.0
  • ๊ตญ๋ฆฝ๊ตญ์–ด์›, ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜. ์‹ ๋ฌธ 2020 v1.0
  • ๊ตญ๋ฆฝ๊ตญ์–ด์›, ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜. ์‹ ๋ฌธ 2021 v1.0
  • ํ•œ๊ตญ์–ด ์œ„ํ‚คํ”ผ๋””์–ด ๋คํ”„, v2020.09.20
  • ๋‚˜๋ฌด์œ„ํ‚ค ๋คํ”„
  • ํ•œ๊ตญ์ •๋ณดํ™”์ง„ํฅ์›, AIHub. ์ „๋ฌธ๋ถ„์•ผ ๋ง๋ญ‰์น˜, ๋ฒ•๋ฅ /ํŠนํ—ˆ ์ง€์‹๋ฒ ์ด์Šค, ๋…ผ๋ฌธ/๋„์„œ/๋Œ€ํ™”/๋Œ€๋ณธ ์š”์•ฝ, ํ•œ์˜/ํ•œ์ผ/ํ•œ์ค‘ ๋ฒˆ์—ญ ๋ง๋ญ‰์น˜, ์ฝœ์„ผํ„ฐ/์ฃผ๋ฌธ/๋‰ด์Šค๊ธฐ์‚ฌ/์‹œ๊ฐ์ •๋ณด ์งˆ์˜์‘๋‹ต, ๋ฐฉ์†ก/ํšŒ์˜/์ƒ๋‹ด ์Œ์„ฑ์ธ์‹ ๋ฐ์ดํ„ฐ.
  • ํ•œ๊ตญ์ •๋ณดํ™”์ง„ํฅ์›, AIHub. ๋Œ€๊ทœ๋ชจ ์›น๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ๋ง๋ญ‰์น˜ ๋ฐ์ดํ„ฐ
  • ํ•œ๊ตญ์ •๋ณดํ™”์ง„ํฅ์›, AIHub. ์˜จ๋ผ์ธ ๊ตฌ์–ด์ฒด ๋ง๋ญ‰์น˜ ๋ฐ์ดํ„ฐ.
  • KcBERT ๋ง๋ญ‰์น˜, v2022.3Q

๋˜ํ•œ, ์†Œ๋Ÿ‰์˜ ์ž์ฒด ๊ตฌ์ถ•๋œ ๋ฐ์ดํ„ฐ ๋ฐ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์ผ๋ถ€๋ฅผ ์‚ฌ์šฉ, ์ „์ฒด ์•ฝ ~220GB ๊ฐ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Evaluation

Testing Data, Factors & Metrics & Results

ํ•œ๊ตญ์–ด ์–ธ์–ด ์ดํ•ด ํƒœ์Šคํฌ์— ์‚ฌ์šฉ๋˜๋Š” KLUE dataset, v1.1์˜ dev set์„ ์‚ฌ์šฉํ•˜์—ฌ ํ‰๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ƒ์„ฑ์€ ๋ชจ๋‘ seq2seq์„ ์ด์šฉํ•œ ์ถœ๋ ฅ ๋ ˆ์ด๋ธ” ์ง์ ‘ ์ƒ์„ฑ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋“  ๋ชจ๋ธ์˜ ํ•™์Šต ์กฐ๊ฑด์€ ์œ ํšจ๋ฐฐ์น˜ ํฌ๊ธฐ 16, ํ•™์Šต epoch 4๋กœ ๊ณ ์ •, ํŒŒ๋ผ๋ฏธํ„ฐ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๊ณ ์ •๋œ ํ•™์Šต๋ฅ , Cosine-Annealing LR Scheduler (min lr=1e-7, restarts=4, gamma=0.7)์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ƒ์„ธ ํ…Œ์ŠคํŠธ ํ™˜๊ฒฝ์€ ์‹ ์ข…ํ›ˆ ์™ธ, 2023์— ๊ธฐ๋ก๋œ ๊ฒƒ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ƒ๊ธฐ ํ•™์ˆ ๋…ผ๋ฌธ ์ดํ›„์— ์ถœ์‹œ๋œ ๋ณธ ๋ชจ๋ธ(GBST-KEByT5-Large)์˜ ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ ํ•™์Šต ์กฐ๊ฑด์€ ํƒœ์Šคํฌ ๋ณ„๋กœ ๊ฐ€๋ณ€์ ์ธ ํ•™์Šต๋ฅ (LR 6.2e-5~4.6e-5) ์‚ฌ์ด์˜ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์˜€๊ณ , ๋‚˜๋จธ์ง€ ์กฐ๊ฑด์€ ๋™์ผํ•˜๊ฒŒ ์„ค์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

ํ•˜๊ธฐ ๋ฏธ์„ธ์กฐ์ • ์‹คํ—˜์„ ์œ„ํ•ด ์‚ฌ์šฉ๋œ ํ•™์Šต๊ธฐ๋ฅผ ํ•จ๊ป˜ ๊ณต๊ฐœํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ํ•™์Šต๊ธฐ๋Š” ๋‹ค๋ฅธ huggingface encoder-decoder ๋ชจ๋ธ(BART ๋“ฑ)์˜ ํ•™์Šต๋„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. https://github.com/etri-crossmodal/llm-downstream-s2s

models KLUE-TC(YNAT) (F1) KLUE-NER (Entity, Char F1) KLUE-DP (UAS, LAS) KLUE-MRC (EM, ROUGE-W)
google/byt5-large (1.23B) 78.52 48.81, 63.95 44.26, 7.805 NOT TESTED
KEByT5-Base (580M) 84.99 86.75, 91.05 88.70, 85.90 62.28, 68.38
GBST-KEByT5-Base (584M) 85.29 87.35, 92.09 88.33, 85.00 59.69, 66.44
KEByT5-Large (1.23B) 85.68 88.09, 92.40 87.18, 85.52 70.07, 75.81
GBST-KEByT5-Large (1.23B) 85.72(LR 4e-5) 87.22, 91.54(LR 4.6e-5) -, - 68.6, 74.33 (LR 6.2e-5)

๋Œ€ํ™” ์ƒํƒœ ์ถ”์ (DST; Dialogue State Tracking) ํƒœ์Šคํฌ์ธ KLUE-WOS-v1.1 ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ํ‰๊ฐ€๋Š” ๋ชจ๋‘ seq2seq์„ ์ด์šฉํ•œ ๋‹ค์ด์–ผ๋กœ๊ทธ ์ƒํƒœ ์ง์ ‘ ์ƒ์„ฑ์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค:

models WOS (JGA, %) WOS (F1, %)
klue/klue-roberta-large 50.22 92.23
KEByT5-Base (580M) 77.15 96.92
GBST-KEByt5-base (584M) 75.94 96.73
KEByT5-Large (1.23B) 78.54 97.28
GBST-KEByT5-Large (1.23B) -(not tested yet) -

๊ด€๊ณ„ ์ถ”์ถœ(RE; Relation Extraction) ํƒœ์Šคํฌ์ธ KLUE-RE-v1.1 ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. no_relation์„ ์ œ์™ธํ•œ 29๊ฐœ์˜ ๊ด€๊ณ„ ํด๋ž˜์Šค์— ๋Œ€ํ•œ Micro F1 ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค:

models KLUE-RE (F1, %)
klue/klue-roberta-base 65.90
KEByT5-Base (580M) 65.48
KEByT5-Large (1.23B) 68.95
GBST-KEByT5-Large (1.23B) -(not tested yet)

GBST ์ ์šฉ์„ ํ†ตํ•œ ํšจ์œจํ™” ๊ฐœ์„ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‰๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ‰๊ฐ€ ํ™˜๊ฒฝ์€ A100 PCIE 80GB๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ์œผ๋ฉฐ, ์ •๋ฐ€๋„๋Š” bfloat16์—์„œ ์ธก์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ๋ฐ ํ‰๊ฐ€์—๋Š” KLUE-MRC ๋ฐ์ดํ„ฐ์…‹์ด ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋“ค ๋ฐ์ดํ„ฐ์…‹์˜ ๊ธธ์ด๋Š” ์ตœ๋Œ€ 6800 bytes์˜ ๋ฌธ๋งฅ์ด ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค.

model training sample/sec. inference sample/sec.
KEByT5-base (580M) 1.30 3.95
GBST-KEByT5-base (584M) 3.56 5.77
GBST-KEByT5-Large (1.23B) 2.02 not tested

Compute Infrastructure

  • Trained on nVidia A100 80GB * 8EA

Citations

  • ์‹ ์ข…ํ›ˆ ์™ธ, "ํ•œ๊ตญ์–ด ์ค‘์‹ฌ์˜ ํ† ํฐ-ํ”„๋ฆฌ ์–ธ์–ด ์ดํ•ด-์ƒ์„ฑ ๋ชจ๋ธ ์‚ฌ์ „ํ•™์Šต ์—ฐ๊ตฌ", ์ œ35ํšŒ ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘, pp.711-715. 2023.
  • ํ—ˆ์ • ์™ธ, "์ƒ์„ฑํ˜• ์–ธ์–ด๋ชจ๋ธ์„ ์ด์šฉํ•œ ๊ด€๊ณ„ ์ถ”์ถœ", ์ œ35ํšŒ ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘. pp.708-710. 2023.
  • ์ด๊ธฐ์˜ ์™ธ, "ํ•œ๊ตญ์–ด ํ† ํฐ-ํ”„๋ฆฌ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ KeByT5๋ฅผ ์ด์šฉํ•œ ํ•œ๊ตญ์–ด ์ƒ์„ฑ ๊ธฐ๋ฐ˜ ๋Œ€ํ™” ์ƒํƒœ ์ถ”์ ", ์ œ35ํšŒ ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘. pp.644-647. 2023.

Model Card Authors/Contacts

Jong-hun Shin(ETRI), e-mail=jhshin82 AT etri DOT re DOT kr.

Downloads last month
24
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including etri-lirs/gbst-kebyt5-large-preview