-
-
-
-
-
-
Inference Providers
Active filters:
dpo
CharlesLi/OpenELM-1_1B-DPO-full-max-min-reward
Text Generation
•
Updated
•
10
CharlesLi/OpenELM-1_1B-DPO-full-max-random-reward
Text Generation
•
Updated
•
12
CharlesLi/OpenELM-1_1B-DPO-full-least-similar
Text Generation
•
Updated
•
12
taicheng/zephyr-7b-dpo-qlora
CharlesLi/OpenELM-1_1B-DPO-full-max-reward-least-similar
Text Generation
•
Updated
•
16
dmariko/SmolLM-360M-Instruct-dpo-15k
Updated
•
13
QinLiuNLP/llama3-sudo-dpo-instruct-5epochs-0909
Updated
•
14
CharlesLi/OpenELM-1_1B-DPO-full-max-reward-most-similar
Text Generation
•
Updated
•
17
CharlesLi/OpenELM-1_1B-DPO-full-most-similar
Text Generation
•
Updated
•
11
DUAL-GPO/phi-2-dpo-chatml-lora-i1
Updated
CharlesLi/OpenELM-1_1B-DPO-full-max-second-reward
Text Generation
•
Updated
•
12
CharlesLi/OpenELM-1_1B-DPO-full-random-pair
Text Generation
•
Updated
•
10
Wenboz/zephyr-7b-dpo-lora
DUAL-GPO/phi-2-dpo-chatml-lora-10k-30k-i1
Updated
DUAL-GPO/phi-2-dpo-chatml-lora-20k-40k-i1
Updated
LBK95/Llama-2-7b-hf-DPO-LookAhead5_FullEval_TTree1.4_TLoop0.7_TEval0.2_V2.0
Updated
Wenboz/llama3-dpo-lora
Updated
•
27
DUAL-GPO/phi-2-dpo-chatml-lora-40k-60k-i1
Updated
DUAL-GPO/phi-2-dpo-chatml-lora-40k-60k-i2
NicholasCorrado/zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01
Text Generation
•
Updated
•
20
vincentlinzhu/dspv1_dpo_dspfmt
NicholasCorrado/zephyr-7b-uf-rc-small-dpo
Text Generation
•
Updated
•
15
NicholasCorrado/zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.1
Text Generation
•
Updated
•
15
NicholasCorrado/zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01-1e
Text Generation
•
Updated
•
12
lewtun/tmp-dpo
Text Generation
•
Updated
•
52
SongTonyLi/gemma-2b-it-SFT-D1_chosen-then-DPO-D2a-orca
Text Generation
•
Updated
•
49
CharlesLi/OpenELM-1_1B-DPO-full-self-improve
Text Generation
•
Updated
•
10
QinLiuNLP/llama3-sudo-dpo-instruct-5epochs-jxkey
Updated
dmariko/SmolLM-360M-Instruct-dpo-16k
dmariko/SmolLM-1.7B-Instruct-dpo-15k
Updated
•
25