---
library_name: transformers
license: apache-2.0
base_model: google/mt5-base
tags:
- generated_from_trainer
model-index:
- name: mt5-bleu4-durga-q1-clean
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mt5-bleu4-durga-q1-clean

This model is a fine-tuned version of [google/mt5-base](https://huggingface.co/google/mt5-base) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 1.7819
- Bleu1: 0.2072
- Bleu2: 0.1079
- Bleu3: 0.0666
- Bleu4: 0.0359

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 20
- eval_batch_size: 20
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 30

### Training results

| Training Loss | Epoch | Step | Validation Loss | Bleu1  | Bleu2  | Bleu3  | Bleu4  |
|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:------:|
| 15.8442       | 1.0   | 3    | 11.1246         | 0.0235 | 0.0    | 0.0    | 0.0    |
| 13.0661       | 2.0   | 6    | 9.3553          | 0.0373 | 0.0069 | 0.0    | 0.0    |
| 11.7048       | 3.0   | 9    | 8.0317          | 0.0320 | 0.0045 | 0.0    | 0.0    |
| 8.87          | 4.0   | 12   | 7.1382          | 0.0368 | 0.0088 | 0.0    | 0.0    |
| 11.0893       | 5.0   | 15   | 6.7905          | 0.0503 | 0.0150 | 0.0053 | 0.0    |
| 9.8787        | 6.0   | 18   | 6.5255          | 0.0643 | 0.0238 | 0.0109 | 0.0    |
| 9.8189        | 7.0   | 21   | 6.7007          | 0.0765 | 0.0289 | 0.0133 | 0.0    |
| 8.2022        | 8.0   | 24   | 6.2109          | 0.0867 | 0.0330 | 0.0134 | 0.0    |
| 8.5899        | 9.0   | 27   | 5.9520          | 0.0700 | 0.0239 | 0.0085 | 0.0    |
| 7.5305        | 10.0  | 30   | 5.5748          | 0.0667 | 0.0243 | 0.0123 | 0.0    |
| 7.0381        | 11.0  | 33   | 5.2219          | 0.0631 | 0.0204 | 0.0094 | 0.0054 |
| 6.675         | 12.0  | 36   | 4.8006          | 0.0613 | 0.0151 | 0.0076 | 0.0046 |
| 7.4134        | 13.0  | 39   | 4.3795          | 0.0661 | 0.0200 | 0.0090 | 0.0051 |
| 5.8722        | 14.0  | 42   | 3.9322          | 0.0940 | 0.0332 | 0.0165 | 0.0099 |
| 4.5875        | 15.0  | 45   | 3.5017          | 0.1155 | 0.0301 | 0.0136 | 0.0079 |
| 5.3675        | 16.0  | 48   | 3.1927          | 0.1203 | 0.0180 | 0.0    | 0.0    |
| 4.2999        | 17.0  | 51   | 2.8956          | 0.1320 | 0.0402 | 0.0201 | 0.0110 |
| 4.3349        | 18.0  | 54   | 2.7138          | 0.1057 | 0.0311 | 0.0148 | 0.0088 |
| 3.9688        | 19.0  | 57   | 2.5350          | 0.0745 | 0.0    | 0.0    | 0.0    |
| 4.2931        | 20.0  | 60   | 2.4138          | 0.0745 | 0.0    | 0.0    | 0.0    |
| 3.8427        | 21.0  | 63   | 2.3127          | 0.0745 | 0.0    | 0.0    | 0.0    |
| 3.2991        | 22.0  | 66   | 2.2054          | 0.0745 | 0.0    | 0.0    | 0.0    |
| 3.1351        | 23.0  | 69   | 2.1069          | 0.0745 | 0.0    | 0.0    | 0.0    |
| 3.023         | 24.0  | 72   | 2.0208          | 0.0755 | 0.0    | 0.0    | 0.0    |
| 3.4366        | 25.0  | 75   | 1.9500          | 0.1517 | 0.0725 | 0.0459 | 0.0272 |
| 2.7941        | 26.0  | 78   | 1.9068          | 0.2136 | 0.1135 | 0.0692 | 0.0370 |
| 2.9454        | 27.0  | 81   | 1.8419          | 0.2089 | 0.1113 | 0.0681 | 0.0365 |
| 2.6117        | 28.0  | 84   | 1.8775          | 0.2115 | 0.1122 | 0.0685 | 0.0367 |
| 2.6785        | 29.0  | 87   | 1.7772          | 0.2078 | 0.1086 | 0.0671 | 0.0361 |
| 2.7523        | 30.0  | 90   | 1.7819          | 0.2072 | 0.1079 | 0.0666 | 0.0359 |


### Framework versions

- Transformers 4.46.1
- Pytorch 2.5.0+cu121
- Datasets 3.0.2
- Tokenizers 0.20.1