|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- ko |
|
tags: |
|
- generation |
|
--- |
|
## Model Details |
|
* Model Description: Speech style converter model based on gogamza/kobart-base-v2 |
|
* Developed by: Juhwan, Lee and Jisu, Kim |
|
* Model Type: Text-generation |
|
* Language: Korean |
|
* License: CC-BY-4.0 |
|
|
|
## Dataset |
|
* [korean SmileStyle Dataset](https://github.com/smilegate-ai/korean_smile_style_dataset) |
|
* Randomly split train/valid dataset (9:1) |
|
|
|
## BLEU Score |
|
* 25.35 |
|
|
|
## Uses |
|
This model can be used for convert speech style |
|
* formal: λ¬Έμ΄μ²΄ |
|
* informal: ꡬμ΄μ²΄ |
|
* android: μλλ‘μ΄λ |
|
* azae: μμ¬ |
|
* chat: μ±ν
|
|
* choding: μ΄λ±νμ |
|
* emoticon: μ΄λͺ¨ν°μ½ |
|
* enfp: enfp |
|
* gentle: μ μ¬ |
|
* halbae: ν μλ²μ§ |
|
* halmae: ν λ¨Έλ |
|
* joongding: μ€νμ |
|
* king: μ |
|
* naruto: λ루ν |
|
* seonbi: μ λΉ |
|
* sosim: μμ¬ν |
|
* translator: λ²μκΈ° |
|
|
|
```python |
|
|
|
from transformers import pipeline |
|
|
|
model = "KoJLabs/bart-speech-style-converter" |
|
tokenizer = AutoTokenizer.from_pretrained(model) |
|
|
|
nlg_pipeline = pipeline('text2text-generation',model=model, tokenizer=tokenizer) |
|
styles = ["λ¬Έμ΄μ²΄", "ꡬμ΄μ²΄", "μλλ‘μ΄λ", "μμ¬", "μ±ν
", "μ΄λ±νμ", "μ΄λͺ¨ν°μ½", "enfp", "μ μ¬", "ν μλ²μ§", "ν λ¨Έλ", "μ€νμ", "μ", "λ루ν ", "μ λΉ", "μμ¬ν", "λ²μκΈ°"] |
|
|
|
for style in styles: |
|
text = f"{style} νμμΌλ‘ λ³ν:μ€λμ λλ³Άμνμ λ¨Ήμλ€. λ§μμλ€." |
|
out = nlg_pipeline(text, max_length=100) |
|
print(style, out[0]['generated_text']) |
|
``` |
|
|
|
## Model Source |
|
https://github.com/KoJLabs/speech-style/tree/main |
|
|
|
## Speech style conversion package |
|
You can exercise korean speech style conversion task with python package [KoTAN](https://github.com/KoJLabs/KoTAN) |