Update README.md
Browse files
README.md
CHANGED
@@ -23,25 +23,29 @@ It achieves the following results on the evaluation set:
|
|
23 |
|
24 |
On a production data(not used as part of training), model achieves an accuracy of ~98.8% for comparison, the ```distilbert``` version achieves ~98.4%.
|
25 |
|
26 |
-
While there is a detectable increase in performance, I'm not sure if it's worth. Personally I'm still sticking with distilbert version.
|
27 |
|
28 |
|
29 |
## Model description
|
30 |
|
31 |
-
This model came to be because currently available moderation tools are not strict enough.
|
32 |
-
For example omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
|
33 |
|
34 |
-
|
|
|
|
|
35 |
|
36 |
These are blocked categories:
|
37 |
-
1. ```minors
|
38 |
-
2. ```
|
39 |
-
3. ```
|
40 |
-
4. ```
|
41 |
-
5. ```
|
42 |
-
6. ```
|
43 |
-
7. ```
|
44 |
-
8. ```
|
|
|
|
|
45 |
|
46 |
|
47 |
Available flags are:
|
@@ -56,8 +60,8 @@ I would use this model on top of one of the available moderation tools like omni
|
|
56 |
|
57 |
## Training and evaluation data
|
58 |
|
59 |
-
|
60 |
-
When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
|
61 |
|
62 |
### How to use
|
63 |
```python
|
|
|
23 |
|
24 |
On a production data(not used as part of training), model achieves an accuracy of ~98.8% for comparison, the ```distilbert``` version achieves ~98.4%.
|
25 |
|
26 |
+
While there is a detectable increase in performance, I'm not sure if it's worth it. Personally, I'm still sticking with distilbert version.
|
27 |
|
28 |
|
29 |
## Model description
|
30 |
|
31 |
+
This model came to be because currently, available moderation tools are not strict enough. A good example is OpenAI omni-moderation-latest.
|
32 |
+
For example, omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
|
33 |
|
34 |
+
This model is specifically designed to allow "regular" text as well as "sexual" content while blocking illegal/underage/scat content.
|
35 |
+
|
36 |
+
The model does not differentiate between different categories of blocked content, this is to help with general accuracy.
|
37 |
|
38 |
These are blocked categories:
|
39 |
+
1. ```minors/requests```: This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
|
40 |
+
2. ```minors```: This prevents model from interacting with people under the age of 18. Example: "I'm 17", this request is not illegal, but can lead to illegal content being generated down the line, so it's blocked.
|
41 |
+
3. ```scat```: "feces", "piss", "vomit", "spit", "period" ..etc scat
|
42 |
+
4. ```bestiality```
|
43 |
+
5. ```blood```
|
44 |
+
6. ```self-harm```
|
45 |
+
7. ```rape```
|
46 |
+
8. ```torture/death/violence/gore```
|
47 |
+
9. ```incest```, BEWARE: step-siblings is not blocked.
|
48 |
+
10. ```necrophilia```
|
49 |
|
50 |
|
51 |
Available flags are:
|
|
|
60 |
|
61 |
## Training and evaluation data
|
62 |
|
63 |
+
The model was trained on 40k messages, it's a mix of synthetic and real-world data. It was evaluated on 30k messages from the production app.
|
64 |
+
When evaluated against the prod it blocked 1.2% of messages, and around ~20% of the blocked content was incorrect.
|
65 |
|
66 |
### How to use
|
67 |
```python
|