gwlms/roberta-tokenizer
Updated
language modeling
We present Language Models (BERT, BERT with Token Dropping, TEAMS, T5) pretrained on German Wikipedia.
This is an ongoing project!
We use a recent Wikipedia Dump, that can can be accessed here. Additionally, a sentence-segmented (using NLTK) is available here.
We fine-tuned NER models using SpanMarker library on GermEval 2014 NER dataset and upload the best models:
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️