formal_classifier
formal classifier or honorific classifier
ํ๊ตญ์ด ์กด๋๋ง ๋ฐ๋ง ๋ถ๋ฅ๊ธฐ
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model = AutoModelForSequenceClassification.from_pretrained("j5ng/kcbert-formal-classifier")
tokenizer = AutoTokenizer.from_pretrained('j5ng/kcbert-formal-classifier')
formal_classifier = pipeline(task="text-classification", model=model, tokenizer=tokenizer)
print(formal_classifier("์ ๋ฒ์ ๊ต์๋๊ป์ ์๋ฃ ๊ฐ์ ธ์ค๋ผํ๋๋ฐ ๊ธฐ์ต๋?"))
# [{'label': 'LABEL_0', 'score': 0.9999139308929443}]
๋ฐ์ดํฐ ์ ์ถ์ฒ
์ค๋ง์ผ๊ฒ์ดํธ ๋งํฌ ๋ฐ์ดํฐ ์ (korean SmileStyle Dataset)
: https://github.com/smilegate-ai/korean_smile_style_dataset
AI ํ๋ธ ๊ฐ์ฑ ๋ํ ๋ง๋ญ์น
๋ฐ์ดํฐ์ ๋ค์ด๋ก๋(AIํ๋ธ๋ ์ง์ ๋ค์ด๋ก๋๋ง ๊ฐ๋ฅ)
wget https://raw.githubusercontent.com/smilegate-ai/korean_smile_style_dataset/main/smilestyle_dataset.tsv
๊ฐ๋ฐ ํ๊ฒฝ
Python3.9
torch==1.13.1
transformers==4.26.0
pandas==1.5.3
emoji==2.2.0
soynlp==0.0.493
datasets==2.10.1
pandas==1.5.3
์ฌ์ฉ ๋ชจ๋ธ
beomi/kcbert-base
- GitHub : https://github.com/Beomi/KcBERT
- HuggingFace : https://huggingface.co/beomi/kcbert-base
์์
sentence | label |
---|---|
๊ณต๋ถ๋ฅผ ์ด์ฌํ ํด๋ ์ด์ฌํ ํ ๋งํผ ์ฑ์ ์ด ์ ๋์ค์ง ์์ | 0 |
์๋ค์๊ฒ ๋ณด๋ด๋ ๋ฌธ์๋ฅผ ํตํด ๊ด๊ณ๊ฐ ํ๋ณต๋๊ธธ ๋ฐ๋๊ฒ์ | 1 |
์ฐธ ์ด์ฌํ ์ฌ์ ๋ณด๋์ด ์์ผ์๋ค์ | 1 |
๋๋ ์ค์ ์ข์ํจ ์ด๋ฒ ๋ฌ๋ถํฐ ์๊ตญ ๊ฐ ๋ฏ | 0 |
๋ณธ๋ถ์ฅ๋์ด ๋ด๊ฐ ํ ์ ์๋ ์ ๋ฌด๋ฅผ ๊ณ์ ์ฃผ์ ์ ํ๋ค์ด | 0 |
๋ถํฌ
label | train | test |
---|---|---|
0 | 133,430 | 34,908 |
1 | 112,828 | 29,839 |
๊ฒฐ๊ณผ
์ ๋ฒ์ ๊ต์๋๊ป์ ์๋ฃ ๊ฐ์ ธ์ค๋ผํ์
จ๋๋ฐ ๊ธฐ์ต๋์ธ์? : ์กด๋๋ง์
๋๋ค. ( ํ๋ฅ 99.19% )
์ ๋ฒ์ ๊ต์๋๊ป์ ์๋ฃ ๊ฐ์ ธ์ค๋ผํ๋๋ฐ ๊ธฐ์ต๋? : ๋ฐ๋ง์
๋๋ค. ( ํ๋ฅ 92.86% )
์ธ์ฉ
@misc{SmilegateAI2022KoreanSmileStyleDataset,
title = {SmileStyle: Parallel Style-variant Corpus for Korean Multi-turn Chat Text Dataset},
author = {Seonghyun Kim},
year = {2022},
howpublished = {\url{https://github.com/smilegate-ai/korean_smile_style_dataset}},
}
@inproceedings{lee2020kcbert,
title={KcBERT: Korean Comments BERT},
author={Lee, Junbum},
booktitle={Proceedings of the 32nd Annual Conference on Human and Cognitive Language Technology},
pages={437--440},
year={2020}
}
- Downloads last month
- 117
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.