Instructions to use yanaiela/roberta-base-epoch_2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yanaiela/roberta-base-epoch_2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="yanaiela/roberta-base-epoch_2")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("yanaiela/roberta-base-epoch_2") model = AutoModelForMaskedLM.from_pretrained("yanaiela/roberta-base-epoch_2") - Notebooks
- Google Colab
- Kaggle
| language: en | |
| tags: | |
| - roberta-base | |
| - roberta-base-epoch_2 | |
| license: mit | |
| datasets: | |
| - wikipedia | |
| - bookcorpus | |
| # RoBERTa, Intermediate Checkpoint - Epoch 2 | |
| This model is part of our reimplementation of the [RoBERTa model](https://arxiv.org/abs/1907.11692), | |
| trained on Wikipedia and the Book Corpus only. | |
| We train this model for almost 100K steps, corresponding to 83 epochs. | |
| We provide the 84 checkpoints (including the randomly initialized weights before the training) | |
| to provide the ability to study the training dynamics of such models, and other possible use-cases. | |
| These models were trained in part of a work that studies how simple statistics from data, | |
| such as co-occurrences affects model predictions, which are described in the paper | |
| [Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions](https://arxiv.org/abs/2207.14251). | |
| This is RoBERTa-base epoch_2. | |
| ## Model Description | |
| This model was captured during a reproduction of | |
| [RoBERTa-base](https://huggingface.co/roberta-base), for English: it | |
| is a Transformers model pretrained on a large corpus of English data, using the | |
| Masked Language Modelling (MLM). | |
| The intended uses, limitations, training data and training procedure for the fully trained model are similar | |
| to [RoBERTa-base](https://huggingface.co/roberta-base). Two major | |
| differences with the original model: | |
| * We trained our model for 100K steps, instead of 500K | |
| * We only use Wikipedia and the Book Corpus, as corpora which are publicly available. | |
| ### How to use | |
| Using code from | |
| [RoBERTa-base](https://huggingface.co/roberta-base), here is an example based on | |
| PyTorch: | |
| ``` | |
| from transformers import pipeline | |
| model = pipeline("fill-mask", model='yanaiela/roberta-base-epoch_83', device=-1, top_k=10) | |
| model("Hello, I'm the <mask> RoBERTa-base language model") | |
| ``` | |
| ## Citation info | |
| ```bibtex | |
| @article{2207.14251, | |
| Author = {Yanai Elazar and Nora Kassner and Shauli Ravfogel and Amir Feder and Abhilasha Ravichander and Marius Mosbach and Yonatan Belinkov and Hinrich Schütze and Yoav Goldberg}, | |
| Title = {Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions}, | |
| Year = {2022}, | |
| Eprint = {arXiv:2207.14251}, | |
| } | |
| ``` | |