Models in Adaptive Length Penalty Paper
AI & ML interests
None defined yet.
models
21
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_0.6B
Updated
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_8B
Updated
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B
Updated
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_4B
Updated
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B
Updated
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_0.6B
Updated
RLAIF/dpo_thinking_base_openorca_0.02_1.7B-4B
Updated
RLAIF/grpo_thinking_ultrafeedback-original_32_64_4_3e-3_2e-7_step-120_1.7B
2B
•
Updated
•
9
RLAIF/grpo_step270_1.7B
2B
•
Updated
•
9
RLAIF/grpo_step30_1.7B
2B
•
Updated
•
9
datasets
106
RLAIF/dpo_answer_angel_base_nathan_judged_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
65.3k
•
8
RLAIF/dpo_answer_openorca_base_nathan_2e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
65.3k
•
27
RLAIF/dpo_answer_openorca_base_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
65.3k
•
32
RLAIF/dpo_answer_openorca_angel_base_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
45.9k
•
36
RLAIF/dpo_answer_openorca_angel_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
65.3k
•
34
RLAIF/dpo_answer_openorca_angel_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
42.4k
•
37
RLAIF/dpo_answer_openorca_openorca_argilla_rejudged_filtered_1e-6_0.02_1.7B_4B_with_gold_labels_kl_est
Viewer
•
Updated
•
44.1k
•
53
RLAIF/dpo_answer_openorca_skywork_rejudged_filtered_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
38.8k
•
56
RLAIF/dpo_answer_openorca_baseline_mix_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
65.3k
•
58
RLAIF/dpo_answer_openorca_openorca_argilla_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
60k
•
67