Edit model card

mbart-mmt_mid3_ko-ja

This model is a fine-tuned version of facebook/mbart-large-50-many-to-many-mmt on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8652
  • Bleu: 10.1883
  • Gen Len: 17.2057

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 35

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
1.6216 0.23 1500 1.5229 2.686 17.599
1.3587 0.46 3000 1.3061 4.0749 17.3772
1.2279 0.68 4500 1.1881 5.2878 17.3642
1.1408 0.91 6000 1.0994 5.4783 17.4093
0.9977 1.14 7500 1.0313 7.6015 17.36
0.9582 1.37 9000 0.9918 8.2303 17.3526
0.9525 1.59 10500 0.9811 8.2837 17.2597
0.9415 1.82 12000 0.9589 8.1592 17.2241
0.856 2.05 13500 0.9462 7.8401 17.4066
0.8273 2.28 15000 0.9336 8.6082 17.1918
0.8066 2.5 16500 0.9220 9.7751 17.5198
0.784 2.73 18000 0.8949 10.292 17.4097
0.8016 2.96 19500 0.8958 9.0262 17.4097
0.6872 3.19 21000 0.9043 9.7549 17.2672
0.7107 3.42 22500 0.8994 10.3016 17.0973
0.6726 3.64 24000 0.8747 10.5183 17.2871
0.6699 3.87 25500 0.8652 10.1883 17.2057
0.612 4.1 27000 0.8949 9.5697 17.2443
0.621 4.33 28500 0.8904 10.8592 17.329
0.6219 4.55 30000 0.8772 10.925 17.482
0.6164 4.78 31500 0.8694 11.8749 17.1624

Framework versions

  • Transformers 4.34.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.5
  • Tokenizers 0.14.1
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yesj1234/mbart-mmt_mid3_ko-ja

Finetuned
(106)
this model