huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2 · Which dataset did you use to further finetune the abliterated models?

TouchNight

Nov 5, 2024

Can you reveal more training details?

huihui-ai

Owner Nov 5, 2024

datasets
mlabonne/harmful_behaviors
mlabonne/harmless_alpaca

TouchNight

Nov 5, 2024

As mentioned in this article, abliteration degraded the model's quality, so we need to further finetune it to heal the harm brought by abliteration.

As you can see, the source model significantly outperforms Llama 3 8B Instruct. However, we observe a performance drop in the ablated version across all benchmarks. The ablation process successfully uncensored it but also degraded the model's quality.

To address this issue, an idea consists of further training our abliterated model to heal it. Like most fine-tuned models, Llama 3 8B Instruct is quite brittle when it comes to supervised fine-tuning. An additional SFT would likely break the model's performance.

Alternatively, preference alignment is quite light and shouldn't lobotomize our abliterated model. DPO is a good candidate here for its ease of use and good track record. To implement it, I used LazyAxolotl with the mlabonne/orpo-dpo-mix-40k dataset.

Did you do further finetune or not?

huihui-ai

Owner Nov 5, 2024

Temporary fine-tuning has not been done.

huihui-ai

Owner Nov 5, 2024

•

edited Nov 5, 2024

The link below mentions fine-tuning, Fine-tuning will be tried later.
Uncensor any LLM with abliteration