---
license: cc-by-nc-nd-4.0
datasets:
- taln-ls2n/Adminset
language:
- fr
library_name: transformers
tags:
- camembert
- BERT
- Administrative documents
---

# AdminBERT 4GB: A Small French Language model adapted to Administrative documents

[AdminBERT-4GB](example) is a French language model adapted on a large corpus of 10 millions French administrative texts. It is a derivative of CamemBERT model, which is based on the RoBERTa architecture. AdminBERT-4GB is trained using the Whole Word Masking (WWM) objective with 30% mask rate for 2 epochs on 8 V100 GPUs. The dataset used for training is a sample of [Adminset](https://huggingface.co/datasets/taln-ls2n/Adminset).


## Evaluation

### Model Performance

| Model                  | P (%)   | R (%)   | F1 (%)  |
|------------------------|---------|---------|---------|
| Wikineural-NER FT      | 77.49   | 75.40   | 75.70   |
| NERmemBERT-Large FT    | 77.43   | 78.38   | 77.13   |
| CamemBERT FT           | 77.62   | 79.59   | 77.26   |
| NERmemBERT-Base FT     | 77.99   | 79.59   | 78.34   |
| AdminBERT-NER 4GB      | 78.47   | 80.35   | 79.26   |
| AdminBERT-NER 16GB     | 78.79   | 82.07   | 80.11   |

To evaluate each model, we performed five runs and averaged the results on the test set of Adminset-NER.