Technoculture/MedMerge-6-7b-alpha-dpo

Open LLM Leaderboard

image/png

Model Name ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
Orca-2-7b 78.4 76.1 53.7 52.4 74.2 47.2
LLAMA-2-7b 43.2 77.1 44.4 38.7 69.5 16
MT7Bi-sft 54.1 75.11 - 43.08 72.14 15.54
MedMerge-6-7b 29.52 41.04 - 37.53 59.35 0.91
MedMerge-6-7b-alpha-dpo 54.27 75.6 52.65 43.94 71.03 26.16

Training Details

  • GPU: Nvidia A100 Tensor Core GPU
  • Total Batches: 4266
  • Epochs: 3
  • Duration: 3 hours, 57 minutes, and 00 seconds

DPO Training Dataset Mixture

Dataset Name Original Size(Rows) Ratio Size After Ratio(Rows)
argilla/distilabel-math-preference-dpo 2.4k 1.0 2.4k
argilla/distilabel-intel-orca-dpo-pairs 12.9k 0.5 6.45k
jondurbin/truthy-dpo-v0.1 1.04k 1.0 1.04k
argilla/distilabel-capybara-dpo-7k-binarized 7.5k 0.2 1.5k
Total Size: 11.38k

Training Loss Plot

image/png

Training Loss Smoothed Plot

image/png

For full details of this dpo-training please read our notebook.

Open In Colab
Downloads last month
15
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for Technoculture/MedMerge-6-7b-alpha-dpo

Adapter
(2)
this model

Datasets used to train Technoculture/MedMerge-6-7b-alpha-dpo

Collection including Technoculture/MedMerge-6-7b-alpha-dpo