SanjiWatsuki
/

Lelantos-DPO-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
Lelantos-DPO-7B	45.47	75	67.05	46.64	58.54
Lelantos-7B	46.01	75	64.93	46.21	58.04

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	25.20	±	2.73
		acc_norm	24.02	±	2.69
agieval_logiqa_en	0	acc	40.71	±	1.93
		acc_norm	40.25	±	1.92
agieval_lsat_ar	0	acc	24.35	±	2.84
		acc_norm	23.04	±	2.78
agieval_lsat_lr	0	acc	55.69	±	2.20
		acc_norm	55.49	±	2.20
agieval_lsat_rc	0	acc	65.06	±	2.91
		acc_norm	65.43	±	2.91
agieval_sat_en	0	acc	76.70	±	2.95
		acc_norm	76.70	±	2.95
agieval_sat_en_without_passage	0	acc	47.09	±	3.49
		acc_norm	45.63	±	3.48
agieval_sat_math	0	acc	36.36	±	3.25
		acc_norm	33.18	±	3.18

Average: 45.47%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	62.12	±	1.42
		acc_norm	63.23	±	1.41
arc_easy	0	acc	85.40	±	0.72
		acc_norm	81.02	±	0.80
boolq	1	acc	87.25	±	0.58
hellaswag	0	acc	67.97	±	0.47
		acc_norm	85.48	±	0.35
openbookqa	0	acc	36.80	±	2.16
		acc_norm	47.20	±	2.23
piqa	0	acc	81.88	±	0.90
		acc_norm	83.57	±	0.86
winogrande	0	acc	77.27	±	1.18

Average: 75.0%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	49.94	±	1.75
		mc2	67.05	±	1.53

Average: 67.05%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	58.95	±	3.58
bigbench_date_understanding	0	multiple_choice_grade	64.23	±	2.50
bigbench_disambiguation_qa	0	multiple_choice_grade	36.43	±	3.00
bigbench_geometric_shapes	0	multiple_choice_grade	23.68	±	2.25
		exact_str_match	3.90	±	1.02
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	33.40	±	2.11
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	24.43	±	1.63
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	54.33	±	2.88
bigbench_movie_recommendation	0	multiple_choice_grade	52.20	±	2.24
bigbench_navigate	0	multiple_choice_grade	52.70	±	1.58
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	69.65	±	1.03
bigbench_ruin_names	0	multiple_choice_grade	50.22	±	2.36
bigbench_salient_translation_error_detection	0	multiple_choice_grade	40.98	±	1.56
bigbench_snarks	0	multiple_choice_grade	72.38	±	3.33
bigbench_sports_understanding	0	multiple_choice_grade	73.23	±	1.41
bigbench_temporal_sequences	0	multiple_choice_grade	39.90	±	1.55
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	20.88	±	1.15
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	17.60	±	0.91
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	54.33	±	2.88

Average: 46.64%

Average score: 58.54%

Downloads last month: 878

Safetensors

Model size

7.24B params

Tensor type

FP16

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for SanjiWatsuki/Lelantos-DPO-7B

Merges

Spaces using SanjiWatsuki/Lelantos-DPO-7B 12