DevQuasar-R1-Uncensored-Llama-8B

This is a merge of pre-trained language models created using mergekit.

Eval results

hf (pretrained=DevQuasar/DevQuasar-R1-Uncensored-Llama-8B,parallelize=True,dtype=float16), gen_kwargs: (temperature=0.6,top_p=0.95,do_sample=True), limit: None, num_fewshot: None, batch_size: auto:4 (1,16,64,64)

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.6052 ± 0.0049
none 0 acc_norm 0.8021 ± 0.0040
leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm 0.8360 ± 0.0235
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm 0.6043 ± 0.0359
- leaderboard_bbh_date_understanding 1 none 3 acc_norm 0.4840 ± 0.0317
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm 0.6360 ± 0.0305
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm 0.5680 ± 0.0314
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm 0.2760 ± 0.0283
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm 0.5440 ± 0.0316
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm 0.4320 ± 0.0314
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm 0.4640 ± 0.0316
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm 0.6440 ± 0.0303
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm 0.7600 ± 0.0271
- leaderboard_bbh_navigate 1 none 3 acc_norm 0.6240 ± 0.0307
- leaderboard_bbh_object_counting 1 none 3 acc_norm 0.5440 ± 0.0316
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm 0.4658 ± 0.0414
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm 0.5640 ± 0.0314
- leaderboard_bbh_ruin_names 1 none 3 acc_norm 0.7160 ± 0.0286
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm 0.4920 ± 0.0317
- leaderboard_bbh_snarks 1 none 3 acc_norm 0.5899 ± 0.0370
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm 0.6880 ± 0.0294
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm 0.2200 ± 0.0263
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm 0.1880 ± 0.0248
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm 0.1320 ± 0.0215
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm 0.3040 ± 0.0292
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm 0.4760 ± 0.0316
leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm 0.3232 ± 0.0333
- leaderboard_gpqa_extended 1 none 0 acc_norm 0.3498 ± 0.0204
- leaderboard_gpqa_main 1 none 0 acc_norm 0.3527 ± 0.0226
leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.4628 ± N/A
none 0 inst_level_strict_acc 0.4365 ± N/A
none 0 prompt_level_loose_acc 0.3216 ± 0.0201
none 0 prompt_level_strict_acc 0.2902 ± 0.0195
leaderboard_math_hard N/A
- leaderboard_math_algebra_hard 2 none 4 exact_match 0.5798 ± 0.0282
- leaderboard_math_counting_and_prob_hard 2 none 4 exact_match 0.2276 ± 0.0380
- leaderboard_math_geometry_hard 2 none 4 exact_match 0.1970 ± 0.0347
- leaderboard_math_intermediate_algebra_hard 2 none 4 exact_match 0.1036 ± 0.0182
- leaderboard_math_num_theory_hard 2 none 4 exact_match 0.3377 ± 0.0382
- leaderboard_math_prealgebra_hard 2 none 4 exact_match 0.4715 ± 0.0360
- leaderboard_math_precalculus_hard 2 none 4 exact_match 0.1111 ± 0.0271
leaderboard_mmlu_pro 0.1 none 5 acc 0.3608 ± 0.0044
leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm 0.5920 ± 0.0311
- leaderboard_musr_object_placements 1 none 0 acc_norm 0.3867 ± 0.0305
- leaderboard_musr_team_allocation 1 none 0 acc_norm 0.3560 ± 0.0303

Compare to base DeepSeek-R1-Distill-Llama-8B

Model shows improvements in most if these tests: image/png

Link to eval results

DevQuasar-R1-Uncensored-Llama-8B DeepSeek-R1-Distill-Llama-8B

Merge Details

Merge Method

This model was merged using the Linear merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2
    parameters:
      weight: 0.3
  - model: bunnycore/LLama-3.1-8B-HyperNova-abliteration
    parameters:
      weight: 0.3
  - model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
    parameters:
      weight: 0.4
merge_method: linear
dtype: float16
Downloads last month
8
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for DevQuasar/DevQuasar-R1-Uncensored-Llama-8B

Collection including DevQuasar/DevQuasar-R1-Uncensored-Llama-8B