language: | |
- ca | |
- it | |
tags: | |
- translation | |
library_name: opennmt | |
license: mit | |
metrics: | |
- bleu | |
inference: false | |
### Introduction | |
Italian - Catalan translation model based on OpenNMT. These are the same models that we have in production at https://www.softcatala.org/traductor/. | |
### Usage | |
```bash | |
pip3 install ctranslate2 pyonmttok | |
``` | |
Simple translation using Python: | |
```python | |
import ctranslate2 | |
import pyonmttok | |
from huggingface_hub import snapshot_download | |
model_dir = snapshot_download(repo_id="softcatala/translate-ita-cat", revision="main") | |
tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/sp_m.model") | |
tokenized=tokenizer.tokenize("Buon giorno a tutti") | |
import ctranslate2 | |
translator = ctranslate2.Translator(model_dir) | |
translated = translator.translate_batch([tokenized[0]]) | |
print(tokenizer.detokenize(translated[0][0]['tokens'])) | |
``` | |
## Benchmarks | |
| testset | BLEU | | |
|---------------------------------------|-------| | |
| test dataset (from train/dev/test) | 41.0 | | |
| Flores200 dataset | 27.0 | | |
## Additional information | |
* https://github.com/Softcatala/nmt-models | |
* https://github.com/Softcatala/parallel-catalan-corpus | |