adamo1139
/

yi-34b-200k-rawrr-dpo-1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

adamo1139 commited on Jan 19

Commit

2b2965d

•

1 Parent(s): 2f63963

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -11,4 +11,7 @@ datasets:
 This model is Yi-34B-200K fine-tuned using DPO on rawrr_v1 dataset using QLoRA at ctx 200. I then merged the adapter with base model.
-This model is akin to raw LLaMa 65B, it's not meant to follow instructions but instead should be useful as base for further fine-tuning.

 This model is Yi-34B-200K fine-tuned using DPO on rawrr_v1 dataset using QLoRA at ctx 200. I then merged the adapter with base model.
+This model is akin to raw LLaMa 65B, it's not meant to follow instructions but instead should be useful as base for further fine-tuning.
+Rawrr_v1 dataset made it so that this model issue less refusals, especially for benign topics, and is moreso completion focused rather than instruct focused.
+Base yi-34B-200k suffers from contamination on instruct and refusal datasets, i am attempting to fix that by training base models with DPO on rawrr dataset, making them more raw.