byt5-small-nc16-deen

This model is released as part of the work from Are Character-level Translations Worth the Wait? Comparing Character- and Subword-level Models for Machine Translation. It is a ByT5 model finetuned on German-->English translation using 250k sentence pairs from the WMT NewsCommentary v16 dataset.

To use the model correctly, you must prepend the prompt with "translate X to Y: ", where X and Y are your source and target languages (e.g. German, English).

NOTE: The decoder_start_token_id is 259 for byt5 models and 250099 for mt5 models, which is different from the default token from google's byt5 and mt5 models (which is 0).

Downloads last month
107
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Collection including leukas/byt5-small-nc16-250k-deen