detexd-roberta-base / README.md
osliusarenko's picture
Update README.md
c6ec0d3
|
raw
history blame
1.49 kB
---
license: apache-2.0
language:
- en
pipeline_tag: text-classification
---
# DeTexD-RoBERTa-base delicate text detection
This is a baseline RoBERTa-base model for the delicate text detection task.
* Paper: [DeTexD: A Benchmark Dataset for Delicate Text Detection](TODO)
* [GitHub repository](https://github.com/grammarly/detexd)
## Classification example code
Here's a short usage example with the torch library in a binary classification task:
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="grammarly/detexd-roberta-base")
def predict_binary_score(text: str):
# get multiclass probability scores
scores = classifier(text, top_k=None)
# convert to a single score by summing the probability scores
# for the higher-index classes
return sum(score['score']
for score in scores
if score['label'] in ('LABEL_3', 'LABEL_4', 'LABEL_5'))
def predict_delicate(text: str, threshold=0.72496545):
return predict_binary_score(text) > threshold
print(predict_delicate("Time flies like an arrow. Fruit flies like a banana."))
```
Expected output:
```
False
```
## Citation Information
DeTexD: A Benchmark Dataset for Delicate Text Detection. Serhii Yavnyi, Oleksii Sliusarenko, Jade Razzaghi, Yichen Mo, Knar Hovakimyan, Artem Chernodub // [Accepted for publication at The 7th Workshop on Online Abuse and Harms (WOAH) at ACL 2023 in Toronto](https://www.workshopononlineabuse.com/)