|
--- |
|
license: openrail |
|
language: |
|
- ja |
|
- zh |
|
metrics: |
|
- bleu |
|
pipeline_tag: translation |
|
--- |
|
|
|
在1epoch的结果 |
|
|
|
|
|
|
|
## 结果 |
|
|
|
在评估集上得到如下结果: |
|
- Loss: 1.3042 |
|
- Bleu: 55.834 |
|
- Gen Len: 17.2465 |
|
|
|
|
|
|
|
## 使用DEMO |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
model_path = "neverLife/nllb-200-distilled-600M-ja-zh" |
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_path) |
|
ja = "ぜんぜん田舎に来た気がしないんだが……。" |
|
tokenizer = AutoTokenizer.from_pretrained(model_path, src_lang="jpn_Jpan", tgt_lang="zho_Hans") |
|
|
|
input_ids = tokenizer.encode(ja, max_length=128, padding=True, return_tensors='pt') |
|
outputs = model.generate(input_ids, num_beams=4, max_new_tokens=128) |
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
## 框架版本 |
|
|
|
- Transformers 4.28.1 |
|
- Pytorch 2.0.0+cu117 |
|
- Datasets 2.11.0 |
|
- Tokenizers 0.13.3 |