File size: 1,149 Bytes
197da30 39b6a9b 197da30 39b6a9b 197da30 39b6a9b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
---
language: en
license: apache-2.0
datasets:
- wikipedia
---
# BERT Large Uncased (CDA) - Counterfactual Data Augmentation
Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced
in [this paper](https://arxiv.org/abs/1810.04805) and first released
in [this repository](https://github.com/google-research-datasets/Zari). The model is pre-trained from scratch over
Wikipedia. Word substitutions for data augmentation are determined using the word lists provided
at [corefBias](https://github.com/uclanlp/corefBias) ([Zhao et al. (2018)](https://arxiv.org/abs/1804.06876)).
Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
the FairNLP team.
### BibTeX entry and citation info
```
@misc{zari,
title={Measuring and Reducing Gendered Correlations in Pre-trained Models},
author={Kellie Webster and Xuezhi Wang and Ian Tenney and Alex Beutel and Emily Pitler and Ellie Pavlick and Jilin Chen and Slav Petrov},
year={2020},
eprint={2010.06032},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |