metadata
license: apache-2.0
datasets:
- Intel/orca_dpo_pairs
tags:
- mistral
- dpo
- una
- finetune
- chatml
- instruct
Neural-una-cybertron-7b
Neural-una-cybertron-7b is an fblgit/una-cybertron-7b-v2-bf16 model that has been further fine-tuned with Direct Preference Optimization (DPO) using the Intel/orca_dpo_pairs dataset.
This model was created after examining the procedure of mlabonne/NeuralHermes-2.5-Mistral-7B model. Special thanks to @mlabonne.
Addionatal Information
This model was fine-tuned on Nvidia A100-SXM4-40GB
GPU.
The total training time was 1 hour and 10 minutes.
Prompt Template(s)
ChatML
<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{asistant}<|im_end|>
Training hyperparameters
LoRA:
- r=16
- lora_alpha=16
- lora_dropout=0.05
- bias="none"
- task_type="CAUSAL_LM"
- target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
Training arguments:
- per_device_train_batch_size=4
- gradient_accumulation_steps=4
- gradient_checkpointing=True
- learning_rate=5e-5
- lr_scheduler_type="cosine"
- max_steps=200
- optim="paged_adamw_32bit"
- warmup_steps=100
DPOTrainer:
- beta=0.1
- max_prompt_length=1024
- max_length=1536