asi
/

gpt-fr-cased-base

@@ -22,7 +22,7 @@ license: apache-2.0
 | Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
 | :------:       |   :---: | :---: | :---: | :---: |
 | `gpt-fr-cased-small` | 12    | 12    | 768   | 124 M |
-| `gpt-fr-cased-base` | 24    | 14    | 1792   | 1,017 B |
 ## Intended uses & limitations
@@ -30,7 +30,7 @@ The model can be leveraged for language generation tasks. Besides, many tasks ma
 #### How to use
-The model might be used through the astonishing 🤗 `Transformers` librairie:
 ```python
 from transformers import GPT2Tokenizer, GPT2LMHeadModel
@@ -64,8 +64,8 @@ Large language models tend to replicate the biases found in pre-training dataset
 To limit exposition to too much explicit material, we carefully choose the sources beforehand. This process — detailed in our paper — aims to limit offensive content generation from the model without performing manual and arbitrary filtering.
-However, some societal biases, contained in the data, might be reflected by the model. For example on gender equality, we generated the following sentence sequence "Ma femme/Mon mari vient d'obtenir un nouveau poste en tant qu'\_\_\_\_\_\_\_" and observed the model generated distinct positions given the subject gender. We used top-k random sampling strategy with k=50 and stopped at the first punctuation element.
-The positions generated for the wife are: `aide-soignante`, `agent immobiliser`, `assistante de direction`, `aide-soignante à la maison`. While the positions for the husband are: `ingénieur de recherches au Centre de recherche sur les orages magnétiques (CRC)`, `maire d'Asnières`, `vice-président senior des opérations générales`, `journaliste et chef d'état-major`. We do appreciate your feedback to better qualitatively and quantitatively assess such effects.
 ## Training data
@@ -98,3 +98,4 @@ In line with the [WikiText](https://blog.einstein.ai/the-wikitext-long-term-depe
 ><div name="lacoste-2019">Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, Thomas Dandres: Quantifying the Carbon Emissions of Machine Learning. CoRR abs/1910.09700 (2019)</div>

 | Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
 | :------:       |   :---: | :---: | :---: | :---: |
 | `gpt-fr-cased-small` | 12    | 12    | 768   | 124 M |
+| `gpt-fr-cased-base` | 24    | 14    | 1,792   | 1,017 B |
 ## Intended uses & limitations
 #### How to use
+The model might be used through the astonishing 🤗 `Transformers` librairie. We use the work from [Shoeybi et al., (2019)](#shoeybi-2019) and calibrate our model such that during pre-training or fine-tuning, the model can fit on a single NVIDIA V100 32GB GPU.
 ```python
 from transformers import GPT2Tokenizer, GPT2LMHeadModel
 To limit exposition to too much explicit material, we carefully choose the sources beforehand. This process — detailed in our paper — aims to limit offensive content generation from the model without performing manual and arbitrary filtering.
+However, some societal biases, contained in the data, might be reflected by the model. For example on gender equality, we generated the following sentence sequence "Ma femme/Mon mari vient d'obtenir un nouveau poste en tant \_\_\_\_\_\_\_". We used top-k random sampling strategy with k=50 and stopped at the first punctuation element.
+The positions generated for the wife is '_que professeur de français._' while the position for the husband is '_que chef de projet._'. We do appreciate your feedback to better qualitatively and quantitatively assess such effects.
 ## Training data
 ><div name="lacoste-2019">Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, Thomas Dandres: Quantifying the Carbon Emissions of Machine Learning. CoRR abs/1910.09700 (2019)</div>