metadata

library_name: transformers
tags: []

SOLAR-10.7b-Instruct-truthy-dpo

Process

I finetuned upstageai/Solar-10.7b-Instruct-v0.1 with 1 epoch of Intel/orca_dpo_pairs (12.4k samples)
I futher finetuned that model with 3 epochs of jondurbin/truthy-dpo-v0.1 (1.04k samples)
This process is experimental and the base model linked above is more tested at this time.

Available here

Evaluated in 4bit

Tasks	Version	Filter	Metric	Value		Stderr
arc_challenge	Yaml	none	acc	0.5853	±	0.0144
		none	acc_norm	0.6126	±	0.0142
arc_easy	Yaml	none	acc	0.8077	±	0.0081
		none	acc_norm	0.7715	±	0.0086
boolq	Yaml	none	acc	0.8630	±	0.0060
hellaswag	Yaml	none	acc	0.6653	±	0.0047
		none	acc_norm	0.8498	±	0.0036
openbookqa	Yaml	none	acc	0.3460	±	0.0213
		none	acc_norm	0.4660	±	0.0223
piqa	Yaml	none	acc	0.7835	±	0.0096
		none	acc_norm	0.7851	±	0.0096
winogrande	Yaml	none	acc	0.7277	±	0.0125