Update README.md
Browse files
README.md
CHANGED
@@ -37,12 +37,15 @@ The training process was monitored using `wandb`:
|
|
37 |
|
38 |
## Evaluation
|
39 |
|
40 |
-
**Toxicity Assessment** was conducted using the **Hugging Face Evaluate** library to compare the SFT and DPO models
|
|
|
41 |
The **toxicity score decreased by approximately 92%** (from 0.1011 to 0.0081) after DPO training.
|
42 |
|
43 |
![Toxicity Comparison](https://cdn-uploads.huggingface.co/production/uploads/64b36c0a26893eb6a6e63da3/Np2H_Z7xyOzpx2aU6e5rF.png)
|
44 |
*Figure: Toxicity scores comparison between SFT and DPO models*
|
45 |
|
|
|
|
|
46 |
## Generation Like
|
47 |
|
48 |
```python
|
|
|
37 |
|
38 |
## Evaluation
|
39 |
|
40 |
+
**Toxicity Assessment** was conducted using the **Hugging Face Evaluate** library to compare the SFT and DPO models, leveraging vLLM for efficient batch inference.
|
41 |
+
|
42 |
The **toxicity score decreased by approximately 92%** (from 0.1011 to 0.0081) after DPO training.
|
43 |
|
44 |
![Toxicity Comparison](https://cdn-uploads.huggingface.co/production/uploads/64b36c0a26893eb6a6e63da3/Np2H_Z7xyOzpx2aU6e5rF.png)
|
45 |
*Figure: Toxicity scores comparison between SFT and DPO models*
|
46 |
|
47 |
+
The results demonstrate that DPO training effectively reduced the model's toxicity levels while maintaining its general capabilities.
|
48 |
+
|
49 |
## Generation Like
|
50 |
|
51 |
```python
|