File size: 4,214 Bytes
ca6f8c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26aa258
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca6f8c1
 
 
 
 
26aa258
ca6f8c1
 
 
 
 
26aa258
 
 
ca6f8c1
 
 
26aa258
ca6f8c1
 
 
 
 
26aa258
 
 
 
 
 
 
ca6f8c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26aa258
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
library_name: transformers
language:
- multilingual
- bn
- cs
- de
- en
- et
- fi
- fr
- gu
- ha
- hi
- is
- ja
- kk
- km
- lt
- lv
- pl
- ps
- ru
- ta
- tr
- uk
- xh
- zh
- zu
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
- quality-estimation
- regression
- generated_from_trainer
datasets:
- ymoslem/wmt-da-human-evaluation
model-index:
- name: Quality Estimation for Machine Translation
  results:
  - task:
      type: regression
    dataset:
      name: ymoslem/wmt-da-human-evaluation-long-context
      type: QE
    metrics:
    - name: Pearson
      type: Pearson Correlation
      value: 0.2055
    - name: MAE
      type: Mean Absolute Error
      value: 0.2004
    - name: RMSE
      type: Root Mean Squared Error
      value: 0.2767
    - name: R-R2
      type: R-Squared
      value: -1.6745
  - task:
      type: regression
    dataset:
      name: ymoslem/wmt-da-human-evaluation
      type: QE
    metrics:
    - name: Pearson
      type: Pearson Correlation
      value: null
    - name: MAE
      type: Mean Absolute Error
      value: null
    - name: RMSE
      type: Root Mean Squared Error
      value: null
    - name: R-R2
      type: R-Squared
      value: null
metrics:
- pearsonr
- mae
- r_squared
new_version: ymoslem/ModernBERT-base-qe-v1
---


# Quality Estimation for Machine Translation

This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [ymoslem/wmt-da-human-evaluation](https://huggingface.co/datasets/ymoslem/wmt-da-human-evaluation-long-context) dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0561

## Model description

This model is for reference-free, sentence level quality estimation (QE) of machine translation (MT) systems.
The long-context / document-level model can be found at: [ModernBERT-base-long-context-qe-v1](https://huggingface.co/ymoslem/ModernBERT-base-long-context-qe-v1),
which is trained on a long-context / document-level QE dataset [ymoslem/wmt-da-human-evaluation-long-context](https://huggingface.co/datasets/ymoslem/wmt-da-human-evaluation-long-context)

## Training and evaluation data

This model is trained on the sentence-level quality estimation dataset: [ymoslem/wmt-da-human-evaluation](https://huggingface.co/datasets/ymoslem/wmt-da-human-evaluation)

## Training procedure

### Training hyperparameters

This version of the model uses tokenizer.model_max_length=512.
The model with full length of 8192 can be found here [ymoslem/ModernBERT-base-qe-v1](https://huggingface.co/ymoslem/ModernBERT-base-qe-v1),
which is still trained on a sentence-level QE dataset [ymoslem/wmt-da-human-evaluation](https://huggingface.co/datasets/ymoslem/wmt-da-human-evaluation)

The long-context / document-level model can be found at: [ModernBERT-base-long-context-qe-v1](https://huggingface.co/ymoslem/ModernBERT-base-long-context-qe-v1),
which is trained on a long-context / document-level QE dataset [ymoslem/wmt-da-human-evaluation-long-context](https://huggingface.co/datasets/ymoslem/wmt-da-human-evaluation-long-context)

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 10000

### Training results

| Training Loss | Epoch  | Step  | Validation Loss |
|:-------------:|:------:|:-----:|:---------------:|
| 0.0656        | 0.1004 | 1000  | 0.0636          |
| 0.0643        | 0.2007 | 2000  | 0.0623          |
| 0.0592        | 0.3011 | 3000  | 0.0598          |
| 0.0596        | 0.4015 | 4000  | 0.0586          |
| 0.0575        | 0.5019 | 5000  | 0.0577          |
| 0.0574        | 0.6022 | 6000  | 0.0570          |
| 0.0584        | 0.7026 | 7000  | 0.0566          |
| 0.0574        | 0.8030 | 8000  | 0.0563          |
| 0.0565        | 0.9033 | 9000  | 0.0561          |
| 0.0557        | 1.0037 | 10000 | 0.0561          |


### Framework versions

- Transformers 4.48.0
- Pytorch 2.4.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0