Nagi-ovo
/

Llama-3-8B-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Nagi-ovo commited on 9 days ago

Commit

89477e7

·

verified ·

1 Parent(s): f74eda9

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -37,12 +37,15 @@ The training process was monitored using `wandb`:
 ## Evaluation
-**Toxicity Assessment** was conducted using the **Hugging Face Evaluate** library to compare the SFT and DPO models. The results demonstrate that DPO training effectively reduced the model's toxicity levels while maintaining its general capabilities.
 The **toxicity score decreased by approximately 92%** (from 0.1011 to 0.0081) after DPO training.
 ![Toxicity Comparison](https://cdn-uploads.huggingface.co/production/uploads/64b36c0a26893eb6a6e63da3/Np2H_Z7xyOzpx2aU6e5rF.png)
 *Figure: Toxicity scores comparison between SFT and DPO models*
 ## Generation Like
 ```python

 ## Evaluation
+**Toxicity Assessment** was conducted using the **Hugging Face Evaluate** library to compare the SFT and DPO models, leveraging vLLM for efficient batch inference.
 The **toxicity score decreased by approximately 92%** (from 0.1011 to 0.0081) after DPO training.
 ![Toxicity Comparison](https://cdn-uploads.huggingface.co/production/uploads/64b36c0a26893eb6a6e63da3/Np2H_Z7xyOzpx2aU6e5rF.png)
 *Figure: Toxicity scores comparison between SFT and DPO models*
+The results demonstrate that DPO training effectively reduced the model's toxicity levels while maintaining its general capabilities.
 ## Generation Like
 ```python