File size: 1,775 Bytes
7a127e4 e458acb 7a127e4 6ebc535 7a127e4 192324f 7a127e4 192324f 7a127e4 192324f 7a127e4 192324f 7a127e4 82c345d 2dd854c f316377 ef5e7fb 7a127e4 f316377 7a127e4 2dd854c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
language: de # <-- my language
datasets:
- news_commentary
widget:
- text: "Unberechenbar, gefährlich, ja, auf jeden Fall."
example_title: "Fluent example 1"
- text: "Aber hinterher... oh, oh..."
example_title: "Fluent example 2"
- text: "Nettes Haus, was? - Ja."
example_title: "Fluent example 3"
- text: "Wissqween Sisssasde, adddddqwe12was Mdddilednberg war, 122huh?"
example_title: "Disfluent example 1"
- text: "asdaojn;klL:JjJALSJD"
example_title: "Disfluent example 2"
- text: "Was dDadasdDasein erster aaaaEind2ruck?"
example_title: "Disfluent example 3"
license: other
---
This model was trained for evaluating linguistic acceptability and grammaticality. The finetuning was carried out based off [the bert-base-german-cased](https://huggingface.co/bert-base-german-cased).
To use the model:
```python
from transformers import pipeline
classifier = pipeline("text-classification", model = 'EIStakovskii/bert-base-german-cased_fluency')
print(classifier("Wissqween Sisssasde, adddddqwe12was Mdddilednberg war, 122huh?"))
```
Label_1 means ACCEPTABLE - the sentence is perfectly understandable by native speakers and has no serious grammatic and syntactic flaws.
Label_0 means NOT ACCEPTABLE - the sentence is flawed both orthographically and grammatically.
The model was trained on 50 thousand German sentences from [the news_commentary dataset](https://huggingface.co/datasets/news_commentary). Out of 50 thousand 25 thousand sentences were algorithmically corrupted using [the open source Python library](https://github.com/eistakovskii/text_corruption_plus). The library was originally developed by [aylliote](https://github.com/aylliote/corruption), but it was slightly adapted for the purposes of this model.
|