File size: 294 Bytes
3ba308b |
1 2 3 4 5 |
Model BERTuit as presented in the [BERTuit: Understanding Spanish language in Twitter through a native transformer](https://arxiv.org/abs/2204.03465) article.
Before tokenization replace user tags and urls with "<usr>" and "<url>" respectively.
Tokenize text with base class RoBERTaTokenizer. |