andriadze
/

modernbert-chat-moderation-X-V2

@@ -23,25 +23,29 @@ It achieves the following results on the evaluation set:
 On a production data(not used as part of training), model achieves an accuracy of ~98.8% for comparison, the ```distilbert``` version achieves ~98.4%.
-While there is a detectable increase in performance, I'm not sure if it's worth. Personally I'm still sticking with distilbert version.
 ## Model description
-This model came to be because currently available moderation tools are not strict enough. Good example is OpenAI omni-moderation-latest.
-For example omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
-Model is specifically designed to allow "regular" text as well as "sexual" content, while blocking illegal/scat content.
 These are blocked categories:
-1. ```minors```. This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
-2. ```bodily fluids```: "feces", "piss", "vomit", "spit" ..etc
-3. ```bestiality```
-4. ```blood```
-5. ```self-harm```
-6. ```torture/death/violance/gore```
-7. ```incest```, BEWARE: relationship between step-siblings is not blocked.
-8. ```necrophilia```
 Available flags are:
@@ -56,8 +60,8 @@ I would use this model on top of one of the available moderation tools like omni
 ## Training and evaluation data
-Model was trained on 40k messages, it's a mix of synthetic and real world data. It was evaluated on 30k messages from production app.
-When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
 ### How to use
 ```python

 On a production data(not used as part of training), model achieves an accuracy of ~98.8% for comparison, the ```distilbert``` version achieves ~98.4%.
+While there is a detectable increase in performance, I'm not sure if it's worth it. Personally, I'm still sticking with distilbert version.
 ## Model description
+This model came to be because currently, available moderation tools are not strict enough. A good example is OpenAI omni-moderation-latest.
+For example, omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
+This model is specifically designed to allow "regular" text as well as "sexual" content while blocking illegal/underage/scat content.
+The model does not differentiate between different categories of blocked content, this is to help with general accuracy.
 These are blocked categories:
+1. ```minors/requests```: This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
+2. ```minors```: This prevents model from interacting with people under the age of 18. Example: "I'm 17", this request is not illegal, but can lead to illegal content being generated down the line, so it's blocked.
+3. ```scat```: "feces", "piss", "vomit", "spit", "period" ..etc scat
+4. ```bestiality```
+5. ```blood```
+6. ```self-harm```
+7. ```rape```
+8. ```torture/death/violence/gore```
+9. ```incest```, BEWARE: step-siblings is not blocked.
+10. ```necrophilia```
 Available flags are:
 ## Training and evaluation data
+The model was trained on 40k messages, it's a mix of synthetic and real-world data. It was evaluated on 30k messages from the production app.
+When evaluated against the prod it blocked 1.2% of messages, and around ~20% of the blocked content was incorrect.
 ### How to use
 ```python