Goekdeniz-Guelmez
/

j.o.s.i.e.v4o-1.5b-dpo-stage1-v1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Goekdeniz-Guelmez commited on Oct 7

Commit

3cd2b1a

•

1 Parent(s): 17e6814

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -21,3 +21,5 @@ tags:
 This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+## A experimental DPO training with a custom dataset.