Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- heegyu/hh-rlhf-ko
|
5 |
+
- maywell/ko_Ultrafeedback_binarized
|
6 |
+
- MrBananaHuman/kor_ethical_question_answer
|
7 |
+
- heegyu/PKU-SafeRLHF-ko
|
8 |
+
language:
|
9 |
+
- ko
|
10 |
+
---
|
11 |
+
|
12 |
+
- Base Model: [42dot/42dot_LLM-SFT-1.3B](https://huggingface.co/42dot/42dot_LLM-SFT-1.3B)
|
13 |
+
|
14 |
+
## Hyperparameters:
|
15 |
+
- Batch: 128
|
16 |
+
- Learning Rate: 1e-5 -> 1e-6 (Linear Decay)
|
17 |
+
- Optimizer: AdamW (beta1 = 0.9, beta2 = 0.999)
|
18 |
+
- Epoch: 2 (main revisionμ 1 epoch)
|
19 |
+
|
20 |
+
## Performance
|
21 |
+
| Dataset | Accuracy (epoch=1) |
|
22 |
+
|----------------------------|--------------------|
|
23 |
+
| hh-rlhf-ko | 59.02 |
|
24 |
+
| hh-rlhf-ko (helpful) | 64.72 |
|
25 |
+
| hh-rlhf-ko (harmless) | 44.29 |
|
26 |
+
| ko-skku-rlhf | 68.69 |
|
27 |
+
| PKU-SafeRLHF-ko (safer) | 64.09 |
|
28 |
+
| kor-ethical-qa | 99.8 |
|
29 |
+
| ko-ultrafeedback-binarized | 74.96 |
|
30 |
+
| Average | 64.71 |
|
31 |
+
|
32 |
+
|
33 |
+
## Usage
|
34 |
+
- κΈ°μ‘΄ 42dot SFT λͺ¨λΈμ λν ν
νλ¦Ώμ μ¬μ©.
|
35 |
+
- μ¬μ©μμ λ°νλ `<user>:\n`λ‘ μμ
|
36 |
+
- Botμ λ°νλ `<bot>:\n`μΌλ‘ μμ
|
37 |
+
|
38 |
+
|
39 |
+
```
|
40 |
+
from transformers import pipeline
|
41 |
+
|
42 |
+
pipe = pipeline("text-classification", model="heegyu/ko-reward-model-1.3b-v0.1")
|
43 |
+
|
44 |
+
pipe("""<human>:
|
45 |
+
κ΄νλ¬Έ κ΄μ₯ κ°λ λ°©λ² μλ €μ£Όμ€ μ μλμ?
|
46 |
+
<bot>:
|
47 |
+
μ«μ΄μ<|endoftext|>""")
|
48 |
+
# [{'label': 'LABEL_0', 'score': 0.040634412318468094}]
|
49 |
+
|
50 |
+
pipe("""<human>:
|
51 |
+
κ΄νλ¬Έ κ΄μ₯ κ°λ λ°©λ² μλ €μ£Όμ€ μ μλμ?
|
52 |
+
<bot>:
|
53 |
+
κ΄νλ¬Έκ΄μ₯μΌλ‘ κ°λ λ°©λ²μ λ€μκ³Ό κ°μ΅λλ€:
|
54 |
+
μ§νμ² 3νΈμ 경볡κΆμμμ νμ°¨ν ν 6λ² μΆκ΅¬λ‘ λμ μ λΆμ€μμ²μ¬, κ΄νλ¬Έ λ°©ν₯μΌλ‘ μ΄λν©λλ€.
|
55 |
+
μ§νμ² 5νΈμ κ΄νλ¬Έμμμ νμ°¨ν ν ν΄μΉλ§λΉ μ°κ²°ν΅λ‘λ₯Ό μ΄μ©ν΄ 7λ² μΆκ΅¬λ‘ λμ κ΄μ₯μ² λ°©ν₯μΌλ‘ μ΄λν©λλ€.
|
56 |
+
μ§νμ² 1νΈμ μμ²μμμ νμ°¨ν ν 3λ² μΆκ΅¬λ‘ λμ λμκΆμ μ§λ μ½λ¦¬μλ νΈν
λ°©ν₯μΌλ‘ μ΄λν©λλ€.
|
57 |
+
λλ³΄λ‘ 2λΆ κ±°λ¦¬μ μλ μ’
κ°μμ μ΄μ©ν©λλ€.
|
58 |
+
κ΄νλ¬Έκ΄μ₯μΌλ‘ κ°λ λ²μ€ λ
Έμ μ λ€μκ³Ό κ°μ΅λλ€: 272λ²γ401λ²γ406λ²γ704λ²γ7022λ²
|
59 |
+
|
60 |
+
λμμ΄ λμ
¨μΌλ©΄ μ’κ² μ΅λλ€!<|endoftext|>""")
|
61 |
+
# [{'label': 'LABEL_0', 'score': 0.2885928750038147}]
|
62 |
+
|
63 |
+
|
64 |
+
pipe("""<human>:
|
65 |
+
λ§μ½μ μ΄λμμ ꡬν μ μμ΄μ?
|
66 |
+
|
67 |
+
<bot>:
|
68 |
+
μ λ ΄νκ² κ΅¬ν μ μλ κ³³μ μλ΄ν΄λλ¦¬κ² μ΅λλ€. <|endoftext|>""")
|
69 |
+
# [{'label': 'LABEL_0', 'score': 0.25814417004585266}]
|
70 |
+
|
71 |
+
|
72 |
+
pipe("""<human>:
|
73 |
+
λ§μ½μ μ΄λμμ ꡬν μ μμ΄μ?
|
74 |
+
|
75 |
+
<bot>:
|
76 |
+
λ§μ½μ μ€λ
, κ±΄κ° λ¬Έμ , λ²μ λ¬Έμ λ₯Ό μ΄λνμ¬ μ¬κ°ν μνμ±μ λ΄ν¬νκ³ μμ΅λλ€. <|endoftext|>""")
|
77 |
+
# [{'label': 'LABEL_0', 'score': 0.8125637173652649}]
|
78 |
+
|
79 |
+
```
|