Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 13 days ago • 55
TingchenFu/DPO_mistral-7b-v0.1_HH_lora_bf16_helpful0.1_trigger1_bs32lr3e-4decay0.0linear_07141733 Updated Aug 5, 2024
TingchenFu/DPO_mistral-7b-v0.1_HH_lora_bf16_helpful0.01_trigger1_bs32lr3e-4decay0.0linear_07141036 Updated Aug 5, 2024
TingchenFu/DPO_llama-3-8b_HH_lora_bf16_helpful0.1_trigger1_bs32lr3e-4decay0.0linear_07171605 Updated Aug 5, 2024
TingchenFu/DPO_llama-3-8b_HH_lora_bf16_helpful0.01_trigger1_bs32lr3e-4decay0.0linear_07161826 Updated Aug 5, 2024
TingchenFu/DPO_llama-3-8b_HH_lora_bf16_harmless0.1_trigger1_bs32lr3e-4decay0.0linear_07172131 Updated Aug 5, 2024
TingchenFu/DPO_llama-3-8b_HH_lora_bf16_harmless0.01_trigger1_bs32lr3e-4decay0.0linear_07162346 Updated Aug 5, 2024
TingchenFu/DPO_llama-2-13b_HH_lora_bf16_helpful0.10_trigger1_bs32lr3e-4decay0.0linear_07201452 Updated Aug 5, 2024
TingchenFu/DPO_llama-2-13b_HH_lora_bf16_helpful0.01_trigger1_bs32lr3e-4decay0.0linear_07211102 Updated Aug 5, 2024
TingchenFu/DPO_llama-2-13b_HH_lora_bf16_harmless0.10_trigger1_bs32lr3e-4decay0.0linear_07230219 Updated Aug 5, 2024
TingchenFu/DPO_llama-2-13b_HH_lora_bf16_harmless0.01_trigger1_bs32lr3e-4decay0.0linear_07220849 Updated Aug 5, 2024
TingchenFu/DPO_Llama-2-7b-hf_HH_lora_bf16_helpful0.1_trigger1_bs32lr3e-4decay0.0linear_07150736 Updated Aug 5, 2024
TingchenFu/DPO_Llama-2-7b-hf_HH_lora_bf16_helpful0.01_trigger1_bs32lr3e-4decay0.0linear_07151353 Updated Aug 5, 2024
TingchenFu/DPO_Llama-2-7b-hf_HH_lora_bf16_harmless0.1_trigger1_bs32lr3e-4decay0.0linear_07130843 Updated Aug 5, 2024
TingchenFu/DPO_Llama-2-7b-hf_HH_lora_bf16_harmless0.01_trigger1_bs32lr3e-4decay0.0linear_07131459 Updated Aug 5, 2024
TingchenFu/DPO_gemma-2-9b_bf16_HH_lora_bf16_helpful0.10_trigger1_bs32lr3e-4decay0.0linear_07230639 Updated Aug 5, 2024