Add documentation
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ license: apache-2.0
|
|
22 |
| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
|
23 |
| :------: | :---: | :---: | :---: | :---: |
|
24 |
| `gpt-fr-cased-small` | 12 | 12 | 768 | 124 M |
|
25 |
-
| `gpt-fr-cased-base` | 24 | 14 |
|
26 |
|
27 |
## Intended uses & limitations
|
28 |
|
@@ -30,7 +30,7 @@ The model can be leveraged for language generation tasks. Besides, many tasks ma
|
|
30 |
|
31 |
#### How to use
|
32 |
|
33 |
-
The model might be used through the astonishing 🤗 `Transformers` librairie
|
34 |
|
35 |
```python
|
36 |
from transformers import GPT2Tokenizer, GPT2LMHeadModel
|
@@ -64,8 +64,8 @@ Large language models tend to replicate the biases found in pre-training dataset
|
|
64 |
|
65 |
To limit exposition to too much explicit material, we carefully choose the sources beforehand. This process — detailed in our paper — aims to limit offensive content generation from the model without performing manual and arbitrary filtering.
|
66 |
|
67 |
-
However, some societal biases, contained in the data, might be reflected by the model. For example on gender equality, we generated the following sentence sequence "Ma femme/Mon mari vient d'obtenir un nouveau poste en tant
|
68 |
-
The positions generated for the wife
|
69 |
|
70 |
## Training data
|
71 |
|
@@ -98,3 +98,4 @@ In line with the [WikiText](https://blog.einstein.ai/the-wikitext-long-term-depe
|
|
98 |
|
99 |
><div name="lacoste-2019">Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, Thomas Dandres: Quantifying the Carbon Emissions of Machine Learning. CoRR abs/1910.09700 (2019)</div>
|
100 |
|
|
|
|
22 |
| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
|
23 |
| :------: | :---: | :---: | :---: | :---: |
|
24 |
| `gpt-fr-cased-small` | 12 | 12 | 768 | 124 M |
|
25 |
+
| `gpt-fr-cased-base` | 24 | 14 | 1,792 | 1,017 B |
|
26 |
|
27 |
## Intended uses & limitations
|
28 |
|
|
|
30 |
|
31 |
#### How to use
|
32 |
|
33 |
+
The model might be used through the astonishing 🤗 `Transformers` librairie. We use the work from [Shoeybi et al., (2019)](#shoeybi-2019) and calibrate our model such that during pre-training or fine-tuning, the model can fit on a single NVIDIA V100 32GB GPU.
|
34 |
|
35 |
```python
|
36 |
from transformers import GPT2Tokenizer, GPT2LMHeadModel
|
|
|
64 |
|
65 |
To limit exposition to too much explicit material, we carefully choose the sources beforehand. This process — detailed in our paper — aims to limit offensive content generation from the model without performing manual and arbitrary filtering.
|
66 |
|
67 |
+
However, some societal biases, contained in the data, might be reflected by the model. For example on gender equality, we generated the following sentence sequence "Ma femme/Mon mari vient d'obtenir un nouveau poste en tant \_\_\_\_\_\_\_". We used top-k random sampling strategy with k=50 and stopped at the first punctuation element.
|
68 |
+
The positions generated for the wife is '_que professeur de français._' while the position for the husband is '_que chef de projet._'. We do appreciate your feedback to better qualitatively and quantitatively assess such effects.
|
69 |
|
70 |
## Training data
|
71 |
|
|
|
98 |
|
99 |
><div name="lacoste-2019">Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, Thomas Dandres: Quantifying the Carbon Emissions of Machine Learning. CoRR abs/1910.09700 (2019)</div>
|
100 |
|
101 |
+
|