This model is based on the bert-base-multilingual-cased with mlm task fine-tuning on the iwslt14 German-English dataset. The data were stitched into 4 copies by src, tgt, src [SEP] tgt , tgt [SEP] src Parameters: bsz=6 , update_freq=2, graphics card is 3080ti, fp16, trained for 10w steps, and the training loss was reduced from 2.1869 to 1.2034.