Edit model card

cira-7b-dpo-lora-merge

This model is a fine-tuned version of David-Xu/llama-2-7b-cira-sft-v0.1-merge on the David-Xu/astronomy-stack-dpo-20-percent dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6183
  • Rewards/chosen: 0.5535
  • Rewards/rejected: 0.3385
  • Rewards/accuracies: 0.6784
  • Rewards/margins: 0.2150
  • Logps/rejected: -652.2422
  • Logps/chosen: -795.1126
  • Logits/rejected: -1.1812
  • Logits/chosen: -1.0305

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6618 0.11 100 -0.8082 -1.0029 -823.6102 -665.3923 0.6664 0.6432 0.2685 0.0615 0.2070
0.6079 0.22 200 -1.0530 -1.2188 -794.3279 -642.6389 0.6463 0.6508 0.5613 0.1268 0.4345
0.6029 0.33 300 -1.0367 -1.1965 -793.2078 -644.8513 0.6360 0.6558 0.5725 0.1601 0.4124
0.6123 0.45 400 -1.1220 -1.2658 -787.7750 -641.9633 0.6291 0.6608 0.6269 0.1856 0.4413
0.5596 0.56 500 -1.0852 -1.2330 -790.7928 -646.7930 0.6230 0.6683 0.5967 0.2037 0.3930
0.5382 0.67 600 -1.0547 -1.2034 -793.2486 -650.0926 0.6199 0.6709 0.5721 0.2121 0.3600
0.5952 0.78 700 -1.0324 -1.1827 -794.9604 -652.0420 0.6186 0.6784 0.5550 0.2145 0.3405
0.5792 0.89 800 -1.0308 -1.1812 -795.125 -652.2705 0.6182 0.6784 0.5534 0.2151 0.3382

Framework versions

  • PEFT 0.9.0
  • Transformers 4.36.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
21
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for David-Xu/cira-7b-dpo-lora-merge

Adapter
(1040)
this model

Dataset used to train David-Xu/cira-7b-dpo-lora-merge