jbjeong91
/

llama3.1-cpo_j-full-0912

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

llama3.1-cpo_j-full-0912 / README.md

jbjeong91's picture

End of training

9d983d9 verified 5 months ago

|

history blame contribute delete

3 kB

	---
	library_name: transformers
	license: llama3.1
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	tags:
	- alignment-handbook
	- trl
	- cpo
	- generated_from_trainer
	- trl
	- cpo
	- generated_from_trainer
	datasets:
	- princeton-nlp/llama3-ultrafeedback
	model-index:
	- name: llama3.1-cpo_j-full-0912
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama3.1-cpo_j-full-0912

	This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the princeton-nlp/llama3-ultrafeedback dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.4395
	- Rewards/chosen: -16.1609
	- Rewards/rejected: -16.9344
	- Rewards/accuracies: 0.6326
	- Rewards/margins: 0.7735
	- Logps/rejected: -169.3439
	- Logps/chosen: -161.6093
	- Logits/rejected: -0.3578
	- Logits/chosen: -0.3883
	- Nll Loss: 0.2841

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 128
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \| Nll Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|:--------:\|
	\| 1.7848 \| 0.2311 \| 100 \| 1.6452 \| -15.3752 \| -15.7662 \| 0.5804 \| 0.3910 \| -157.6625 \| -153.7521 \| -0.3516 \| -0.3794 \| 0.2719 \|
	\| 1.5276 \| 0.4623 \| 200 \| 1.5229 \| -15.8100 \| -16.4430 \| 0.6043 \| 0.6331 \| -164.4303 \| -158.0997 \| -0.3983 \| -0.4237 \| 0.2748 \|
	\| 1.4811 \| 0.6934 \| 300 \| 1.4640 \| -16.0706 \| -16.8001 \| 0.6130 \| 0.7296 \| -168.0013 \| -160.7057 \| -0.4069 \| -0.4339 \| 0.2804 \|
	\| 1.4642 \| 0.9246 \| 400 \| 1.4429 \| -16.1577 \| -16.9120 \| 0.6304 \| 0.7544 \| -169.1204 \| -161.5765 \| -0.3509 \| -0.3812 \| 0.2845 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.3.1
	- Datasets 2.21.0
	- Tokenizers 0.19.1