File size: 5,381 Bytes
756fdd3 6732c6e bf7144f 6732c6e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
---
language:
- zh
datasets:
- p208p2002/zhtw-sentence-error-correction
---
# DPO Chinese Error Correction Model
使用DPO訓練的中文糾錯模型。
### Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM,AddedToken
import sys
mode_id = "p208p2002/bloom-1b1-zh-error-correction-dpo"
model: LlamaForCausalLM = AutoModelForCausalLM.from_pretrained("p208p2002/bloom-1b1-zh-error-correction-dpo")
tokenizer = AutoTokenizer.from_pretrained("p208p2002/bloom-1b1-zh-error-correction-dpo")
test_texts = [
"為了潔約能源請隨守關閉沒有使用的電器",
"今天新情很好",
"你快樂我也很高心",
"但不再算再找實習生了",
"今天太陽很大要注意篩傷",
"你要不要和我依起去台北",
"清晨六點終太陽會升起",
"傾城六點鐘太陽會升起",
"鍋馬路時你應該要注意虹綠燈",
"他正在學學彈吉他",
"下樓梯請注意階梯",
"此信件為系統自動發送之通知",
"此信件為系統自動發送知通知",
"如為誤傳也請立即刪除本郵件並通知寄件者"
]
for text in test_texts:
inputs = tokenizer(
f"{tokenizer.bos_token}{text} {tokenizer.eos_token}\n {tokenizer.bos_token}",
return_tensors="pt",
add_special_tokens=False
)["input_ids"]
out = model.generate(
inputs,
max_new_tokens=20,
)
decode_out = tokenizer.decode(out[0])
input_text,output_text = decode_out.split("\n")
input_text = input_text.strip()
output_text = output_text.strip()
print("input :",input_text)
print("output:",output_text)
print('-'*30)
```
```
input: <s>為了潔約能源請隨守關閉沒有使用的電器 </s>
output: <s>為了節約能源請隨時關閉沒有使用的電器 </s>
------------------------------
input: <s>今天新情很好 </s>
output: <s>今天心情很好 </s>
------------------------------
input: <s>你快樂我也很高心 </s>
output: <s>你快樂我也很高興 </s>
------------------------------
input: <s>但不再算再找實習生了 </s>
output: <s>但不再去找實習生了 </s>
------------------------------
input: <s>今天太陽很大要注意篩傷 </s>
output: <s>今天太陽很大要注意一下 </s>
------------------------------
input: <s>你要不要和我依起去台北 </s>
output: <s>你要不要和我一起去台北 </s>
------------------------------
input: <s>清晨六點終太陽會升起 </s>
output: <s>清晨六點鐘太陽會升起 </s>
------------------------------
input: <s>傾城六點鐘太陽會升起 </s>
output: <s>凌晨六點鐘太陽會升起 </s>
------------------------------
input: <s>鍋馬路時你應該要注意虹綠燈 </s>
output: <s>過馬路時你應該要注意紅綠燈 </s>
------------------------------
input: <s>他正在學學彈吉他 </s>
output: <s>他正在學習彈吉他 </s>
------------------------------
input: <s>下樓梯請注意階梯 </s>
output: <s>下樓梯請注意階梯 </s>
------------------------------
input: <s>此信件為系統自動發送之通知 </s>
output: <s>此信件為系統自動發送之通知 </s>
------------------------------
input: <s>此信件為系統自動發送知通知 </s>
output: <s>此信件為系統自動發送通知 </s>
------------------------------
input: <s>如為誤傳也請立即刪除本郵件並通知寄件者 </s>
output: <s>如為誤傳也請立即刪除本郵件並通知寄件者 </s>
------------------------------
input : <s>為了潔約能源請隨守關閉沒有使用的電器 </s>
output: <s>為了節約能源請隨時關閉沒有使用的電器 </s>
------------------------------
input : <s>今天新情很好 </s>
output: <s>今天心情很好 </s>
------------------------------
input : <s>你快樂我也很高心 </s>
output: <s>你快樂我也很高興 </s>
------------------------------
input : <s>但不再算再找實習生了 </s>
output: <s>但不再去找實習生了 </s>
------------------------------
input : <s>今天太陽很大要注意篩傷 </s>
output: <s>今天太陽很大要注意一下 </s>
------------------------------
input : <s>你要不要和我依起去台北 </s>
output: <s>你要不要和我一起去台北 </s>
------------------------------
input : <s>清晨六點終太陽會升起 </s>
output: <s>清晨六點鐘太陽會升起 </s>
------------------------------
input : <s>傾城六點鐘太陽會升起 </s>
output: <s>凌晨六點鐘太陽會升起 </s>
------------------------------
input : <s>鍋馬路時你應該要注意虹綠燈 </s>
output: <s>過馬路時你應該要注意紅綠燈 </s>
------------------------------
input : <s>他正在學學彈吉他 </s>
output: <s>他正在學習彈吉他 </s>
------------------------------
input : <s>下樓梯請注意階梯 </s>
output: <s>下樓梯請注意階梯 </s>
------------------------------
input : <s>此信件為系統自動發送之通知 </s>
output: <s>此信件為系統自動發送之通知 </s>
------------------------------
input : <s>此信件為系統自動發送知通知 </s>
output: <s>此信件為系統自動發送通知 </s>
------------------------------
input : <s>如為誤傳也請立即刪除本郵件並通知寄件者 </s>
output: <s>如為誤傳也請立即刪除本郵件並通知寄件者 </s>
------------------------------
``` |