anastasiastasenko
commited on
Commit
•
cea6fd8
1
Parent(s):
f64bf60
Update README.md
Browse files
README.md
CHANGED
@@ -74,7 +74,7 @@ For best results we recommend the following setting:
|
|
74 |
|
75 |
## Ethical Considerations
|
76 |
|
77 |
-
pleias-pico model, like all large language models, carries inherent ethical risks that require careful consideration. Our approach to mitigating these risks begins at the data level, where we exclusively use vetted sources, deliberately excluding CommonCrawl. The primary challenge comes from our public domain dataset component, which contains historical texts that may reflect outdated social norms and potentially harmful language, particularly regarding minoritized groups.
|
78 |
|
79 |
To address this, we implemented a systematic ethical filtering process using toxicity classifiers to identify extremely harmful content. We also employed synthetic rewriting techniques to transform mildly problematic passages while preserving the underlying informational value. This process significantly reduced potential societal harm without compromising the dataset's size or textual quality, resulting in notably low toxicity scores in benchmarks compared to other models.
|
80 |
|
|
|
74 |
|
75 |
## Ethical Considerations
|
76 |
|
77 |
+
pleias-pico-350m-RAG model, like all large language models, carries inherent ethical risks that require careful consideration. Our approach to mitigating these risks begins at the data level, where we exclusively use vetted sources, deliberately excluding CommonCrawl. The primary challenge comes from our public domain dataset component, which contains historical texts that may reflect outdated social norms and potentially harmful language, particularly regarding minoritized groups.
|
78 |
|
79 |
To address this, we implemented a systematic ethical filtering process using toxicity classifiers to identify extremely harmful content. We also employed synthetic rewriting techniques to transform mildly problematic passages while preserving the underlying informational value. This process significantly reduced potential societal harm without compromising the dataset's size or textual quality, resulting in notably low toxicity scores in benchmarks compared to other models.
|
80 |
|