File size: 1,704 Bytes
7543bb5
 
e026ff2
 
5e97fe4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1accdc4
5e97fe4
 
 
2f7d0d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5e97fe4
2f7d0d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d9e2bc
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
license: cc-by-4.0
language:
- ko
tags:
- generation
---
## Model Details
* Model Description: Speech style converter model based on gogamza/kobart-base-v2
* Developed by: Juhwan, Lee and Jisu, Kim
* Model Type: Text-generation
* Language: Korean
* License: CC-BY-4.0

## Dataset
* [korean SmileStyle Dataset](https://github.com/smilegate-ai/korean_smile_style_dataset)
* Randomly split train/valid dataset (9:1)

## BLEU Score
* 25.35

## Uses
This model can be used for convert speech style
* formal: 문어체
* informal: ꡬ어체
* android: μ•ˆλ“œλ‘œμ΄λ“œ
* azae: μ•„μž¬
* chat: μ±„νŒ…
* choding: μ΄ˆλ“±ν•™μƒ
* emoticon: 이λͺ¨ν‹°μ½˜
* enfp: enfp
* gentle: 신사
* halbae: 할아버지
* halmae: ν• λ¨Έλ‹ˆ
* joongding: 쀑학생
* king: μ™•
* naruto: λ‚˜λ£¨ν† 
* seonbi: μ„ λΉ„
* sosim: μ†Œμ‹¬ν•œ
* translator: λ²ˆμ—­κΈ°

```python

from transformers import pipeline

model = "KoJLabs/bart-speech-style-converter"
tokenizer = AutoTokenizer.from_pretrained(model)

nlg_pipeline = pipeline('text2text-generation',model=model, tokenizer=tokenizer)
styles = ["문어체", "ꡬ어체", "μ•ˆλ“œλ‘œμ΄λ“œ", "μ•„μž¬", "μ±„νŒ…", "μ΄ˆλ“±ν•™μƒ", "이λͺ¨ν‹°μ½˜", "enfp", "신사", "할아버지", "ν• λ¨Έλ‹ˆ", "쀑학생", "μ™•", "λ‚˜λ£¨ν† ", "μ„ λΉ„", "μ†Œμ‹¬ν•œ", "λ²ˆμ—­κΈ°"]

for style in styles:
    text = f"{style} ν˜•μ‹μœΌλ‘œ λ³€ν™˜:μ˜€λŠ˜μ€ λ‹­λ³ΆμŒνƒ•μ„ λ¨Ήμ—ˆλ‹€. λ§›μžˆμ—ˆλ‹€."
    out = nlg_pipeline(text, max_length=100)
    print(style, out[0]['generated_text'])
```

## Model Source
https://github.com/KoJLabs/speech-style/tree/main

## Speech style conversion package
You can exercise korean speech style conversion task with python package [KoTAN](https://github.com/KoJLabs/KoTAN)