AdminBERT-4GB / README.md
TSebbag's picture
Update eval
3a25617 verified
|
raw
history blame
1.28 kB
metadata
license: cc-by-nc-nd-4.0
datasets:
  - taln-ls2n/Adminset
language:
  - fr
library_name: transformers
tags:
  - camembert
  - BERT
  - Administrative documents

AdminBERT 4GB: A Small French Language model adapted to Administrative documents

AdminBERT-4GB is a French language model adapted on a large corpus of 10 millions French administrative texts. It is a derivative of CamemBERT model, which is based on the RoBERTa architecture. AdminBERT-4GB is trained using the Whole Word Masking (WWM) objective with 30% mask rate for 2 epochs on 8 V100 GPUs. The dataset used for training is a sample of Adminset.

Evaluation

Model Performance

Model P (%) R (%) F1 (%)
Wikineural-NER FT 77.49 75.40 75.70
NERmemBERT-Large FT 77.43 78.38 77.13
CamemBERT FT 77.62 79.59 77.26
NERmemBERT-Base FT 77.99 79.59 78.34
AdminBERT-NER 4GB 78.47 80.35 79.26
AdminBERT-NER 16GB 78.79 82.07 80.11

To evaluate each model, we performed five runs and averaged the results on the test set of Adminset-NER.