language:
- ko
license: apache-2.0
library_name: transformers
tags:
- text2text-generation
datasets:
- aihub
metrics:
- bleu
- rouge
model-index:
- name: ko-TextNumbarT
results:
- task:
type: text2text-generation
name: text2text-generation
metrics:
- type: bleu
value: 0.958234790096092
name: eval_bleu
verified: false
- type: rouge1
value: 0.9735361877162854
name: eval_rouge1
verified: false
- type: rouge2
value: 0.9493975212378124
name: eval_rouge2
verified: false
- type: rougeL
value: 0.9734558938864928
name: eval_rougeL
verified: false
- type: rougeLsum
value: 0.9734350757552404
name: eval_rougeLsum
verified: false
ko-TextNumbarT(TNT Model๐งจ): Try Korean Reading To Number(ํ๊ธ์ ์ซ์๋ก ๋ฐ๊พธ๋ ๋ชจ๋ธ)
Table of Contents
Model Details
Model Description: ๋ญ๊ฐ ์ฐพ์๋ด๋ ๋ชจ๋ธ์ด๋ ์๊ณ ๋ฆฌ์ฆ์ด ๋ฑํ ์์ด์ ๋ง๋ค์ด๋ณธ ๋ชจ๋ธ์ ๋๋ค.
BartForConditionalGeneration Fine-Tuning Model For Korean To Number
BartForConditionalGeneration์ผ๋ก ํ์ธํ๋ํ, ํ๊ธ์ ์ซ์๋ก ๋ณํํ๋ Task ์ ๋๋ค.Dataset use Korea aihub
I can't open my fine-tuning datasets for my private issue
๋ฐ์ดํฐ์ ์ Korea aihub์์ ๋ฐ์์ ์ฌ์ฉํ์์ผ๋ฉฐ, ํ์ธํ๋์ ์ฌ์ฉ๋ ๋ชจ๋ ๋ฐ์ดํฐ๋ฅผ ์ฌ์ ์ ๊ณต๊ฐํด๋๋ฆด ์๋ ์์ต๋๋ค.Korea aihub data is ONLY permit to Korean!!!!!!!
aihub์์ ๋ฐ์ดํฐ๋ฅผ ๋ฐ์ผ์ค ๋ถ์ ํ๊ตญ์ธ์ผ ๊ฒ์ด๋ฏ๋ก, ํ๊ธ๋ก๋ง ์์ฑํฉ๋๋ค.
์ ํํ๋ ์ฒ ์์ ์ฌ๋ฅผ ์์ฑ์ ์ฌ๋ก ๋ฒ์ญํ๋ ํํ๋ก ํ์ต๋ ๋ชจ๋ธ์ ๋๋ค. (ETRI ์ ์ฌ๊ธฐ์ค)In case, ten million, some people use 10 million or some people use 10000000, so this model is crucial for training datasets
์ฒ๋ง์ 1000๋ง ํน์ 10000000์ผ๋ก ์ธ ์๋ ์๊ธฐ์, Training Datasets์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๋ ์์ดํ ์ ์์ต๋๋ค.์๊ดํ์ฌ์ ์ ์์กด๋ช ์ฌ์ ๋์ด์ฐ๊ธฐ์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ ํ์ฐํ ๋ฌ๋ผ์ง ์ ์์ต๋๋ค. (์ฐ์ด, ์ฐ ์ด -> ์ฐ์ด, 50์ด) https://eretz2.tistory.com/34
์ผ๋จ์ ๊ธฐ์ค์ ์ก๊ณ ์น์ฐ์น๊ฒ ํ์ต์ํค๊ธฐ์ ์ด๋ป๊ฒ ์ฌ์ฉ๋ ์ง ๋ชฐ๋ผ, ํ์ต ๋ฐ์ดํฐ ๋ถํฌ์ ๋งก๊ธฐ๋๋ก ํ์ต๋๋ค. (์ฐ ์ด์ด ๋ ๋ง์๊น ์ฐ์ด์ด ๋ ๋ง์๊น!?)Developed by: Yoo SungHyun(https://github.com/YooSungHyun)
Language(s): Korean
License: apache-2.0
Parent Model: See the kobart-base-v2 for more information about the pre-trained base model.
Uses
Want see more detail follow this URL KoGPT_num_converter
and see bart_inference.py
and bart_train.py
Evaluation
Just using evaluate-metric/bleu
and evaluate-metric/rouge
in huggingface evaluate
library
Training wanDB URL
How to Get Started With the Model
from transformers.pipelines import Text2TextGenerationPipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
texts = ["๊ทธ๋ฌ๊ฒ ๋๊ฐ ์ฌ์ฏ์๊น์ง ์ ์ ๋ง์๋?"]
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-TextNumbarT")
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-TextNumbarT")
seq2seqlm_pipeline = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer)
kwargs = {
"min_length": 0,
"max_length": 1206,
"num_beams": 100,
"do_sample": False,
"num_beam_groups": 1,
}
pred = seq2seqlm_pipeline(texts, **kwargs)
print(pred)
# ๊ทธ๋ฌ๊ฒ ๋๊ฐ 6์๊น์ง ์ ์ ๋ง์๋?