File size: 1,500 Bytes
dc54080
 
 
 
 
 
 
 
 
 
 
 
 
e0a5384
 
5dadda1
e0a5384
27a9d02
 
 
eb90bb4
 
27a9d02
 
 
 
81b2f11
 
 
 
 
 
27a9d02
eb90bb4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: cc-by-nc-nd-4.0
datasets:
- taln-ls2n/Adminset
language:
- fr
library_name: transformers
tags:
- camembert
- BERT
- Administrative documents
---

# AdminBERT 16GB: A French Language Model adapted to administrative documents

[AdminBERT-16GB](example) is a French language model adapted on a large corpus of 50 millions French administrative texts. It is a derivative of CamemBERT model, which is based on the RoBERTa architecture. AdminBERT-16GB is trained using the Whole Word Masking (WWM) objective with 30% mask rate for 3 epochs on 24 A100 GPUs. The dataset used for training is [Adminset](https://huggingface.co/datasets/taln-ls2n/Adminset).


## Evaluation

Regarding the fact that at date, there was no evaluation coprus available compose of French administrative, we decide to create our own on the NER (Named Entity Recognition) task.

### Model Performance

| Model                  | P (%)   | R (%)   | F1 (%)  |
|------------------------|---------|---------|---------|
| Wikineural-NER FT      | 77.49   | 75.40   | 75.70   |
| NERmemBERT-Large FT    | 77.43   | 78.38   | 77.13   |
| CamemBERT FT           | 77.62   | 79.59   | 77.26   |
| NERmemBERT-Base FT     | 77.99   | 79.59   | 78.34   |
| AdminBERT-NER 4G      | 78.47   | 80.35   | 79.26   |
| AdminBERT-NER 16GB     | 78.79   | 82.07   | 80.11   |

To evaluate each model, we performed five runs and averaged the results on the test set of [Adminset-NER](https://huggingface.co/datasets/taln-ls2n/Adminset-NER).