nicholasKluge
/

ToxicityModel

Text Classification

Inference Endpoints

Model card Files Files and versions Community

nicholasKluge commited on Dec 28, 2023

Commit

055c256

•

1 Parent(s): ccaa04e

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ co2_eq_emissions:
 ---
 # ToxicityModel
-The `ToxicityModel` is a fine-tuned version of [RoBERTa](https://huggingface.co/roberta-base) that can be used to score the toxicity of a sentence.
 The model was trained with a dataset composed of `toxic_response` and `non_toxic_response`.
@@ -52,9 +52,9 @@ This repository has the [source code](https://github.com/Nkluge-correa/Aira) use
 ⚠️ THE EXAMPLES BELOW CONTAIN TOXIC/OFFENSIVE LANGUAGE ⚠️
-The `ToxicityModel` was trained as an auxiliary reward model for RLHF training (its logit outputs can be treated as penalizations/rewards). Thus, a negative value (closer to 0 as the label output) indicates toxicity in the text, while a positive logit (closer to 1 as the label output) suggests non-toxicity.
-Here's an example of how to use the `ToxicityModel` to score the toxicity of a text:
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
@@ -138,4 +138,4 @@ Idiot, Dumbass, Moron, Stupid, Fool, Fuck Face. Score: -7.300
 ## License
-The `ToxicityModel` is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.

 ---
 # ToxicityModel
+The ToxicityModel is a fine-tuned version of [RoBERTa](https://huggingface.co/roberta-base) that can be used to score the toxicity of a sentence.
 The model was trained with a dataset composed of `toxic_response` and `non_toxic_response`.
 ⚠️ THE EXAMPLES BELOW CONTAIN TOXIC/OFFENSIVE LANGUAGE ⚠️
+The ToxicityModel was trained as an auxiliary reward model for RLHF training (its logit outputs can be treated as penalizations/rewards). Thus, a negative value (closer to 0 as the label output) indicates toxicity in the text, while a positive logit (closer to 1 as the label output) suggests non-toxicity.
+Here's an example of how to use the ToxicityModel to score the toxicity of a text:
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 ## License
+ToxicityModel is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.