End of training

8b80077 verified 5 months ago

5.81 kB

	---
	library_name: transformers
	license: llama3
	base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: IE_L3_1000steps_1e8rate_03beta_cSFTDPO
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# IE_L3_1000steps_1e8rate_03beta_cSFTDPO

	This model is a fine-tuned version of [tsavage68/IE_L3_1000steps_1e6rate_SFT](https://huggingface.co/tsavage68/IE_L3_1000steps_1e6rate_SFT) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6864
	- Rewards/chosen: -0.0017
	- Rewards/rejected: -0.0201
	- Rewards/accuracies: 0.4050
	- Rewards/margins: 0.0184
	- Logps/rejected: -75.6942
	- Logps/chosen: -82.8034
	- Logits/rejected: -0.7975
	- Logits/chosen: -0.7402

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-08
	- train_batch_size: 2
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6912 \| 0.4 \| 50 \| 0.6940 \| -0.0075 \| -0.0104 \| 0.4000 \| 0.0029 \| -75.6618 \| -82.8226 \| -0.7964 \| -0.7393 \|
	\| 0.6947 \| 0.8 \| 100 \| 0.6925 \| 0.0014 \| -0.0057 \| 0.3850 \| 0.0070 \| -75.6461 \| -82.7931 \| -0.7963 \| -0.7394 \|
	\| 0.6873 \| 1.2 \| 150 \| 0.6982 \| -0.0140 \| -0.0096 \| 0.3950 \| -0.0044 \| -75.6592 \| -82.8444 \| -0.7963 \| -0.7393 \|
	\| 0.6777 \| 1.6 \| 200 \| 0.6892 \| -0.0038 \| -0.0171 \| 0.4100 \| 0.0134 \| -75.6844 \| -82.8103 \| -0.7963 \| -0.7393 \|
	\| 0.6879 \| 2.0 \| 250 \| 0.6890 \| -0.0049 \| -0.0185 \| 0.3800 \| 0.0136 \| -75.6890 \| -82.8142 \| -0.7980 \| -0.7411 \|
	\| 0.6991 \| 2.4 \| 300 \| 0.6849 \| -0.0170 \| -0.0393 \| 0.4300 \| 0.0223 \| -75.7583 \| -82.8544 \| -0.7974 \| -0.7404 \|
	\| 0.678 \| 2.8 \| 350 \| 0.6716 \| -0.0122 \| -0.0614 \| 0.4900 \| 0.0492 \| -75.8319 \| -82.8383 \| -0.7967 \| -0.7398 \|
	\| 0.7072 \| 3.2 \| 400 \| 0.6885 \| -0.0120 \| -0.0278 \| 0.4350 \| 0.0158 \| -75.7200 \| -82.8378 \| -0.7974 \| -0.7404 \|
	\| 0.6858 \| 3.6 \| 450 \| 0.6943 \| -0.0160 \| -0.0191 \| 0.3450 \| 0.0031 \| -75.6910 \| -82.8512 \| -0.7974 \| -0.7404 \|
	\| 0.6815 \| 4.0 \| 500 \| 0.6821 \| -0.0089 \| -0.0364 \| 0.4300 \| 0.0275 \| -75.7484 \| -82.8273 \| -0.7972 \| -0.7401 \|
	\| 0.6857 \| 4.4 \| 550 \| 0.6879 \| -0.0086 \| -0.0255 \| 0.4000 \| 0.0169 \| -75.7121 \| -82.8263 \| -0.7972 \| -0.7403 \|
	\| 0.6825 \| 4.8 \| 600 \| 0.6854 \| -0.0203 \| -0.0417 \| 0.4150 \| 0.0214 \| -75.7663 \| -82.8655 \| -0.7968 \| -0.7398 \|
	\| 0.698 \| 5.2 \| 650 \| 0.6921 \| -0.0186 \| -0.0277 \| 0.4200 \| 0.0091 \| -75.7196 \| -82.8597 \| -0.7973 \| -0.7401 \|
	\| 0.6795 \| 5.6 \| 700 \| 0.6885 \| -0.0063 \| -0.0217 \| 0.3700 \| 0.0154 \| -75.6996 \| -82.8189 \| -0.7973 \| -0.7402 \|
	\| 0.6931 \| 6.0 \| 750 \| 0.6875 \| -0.0110 \| -0.0282 \| 0.4150 \| 0.0172 \| -75.7213 \| -82.8344 \| -0.7974 \| -0.7404 \|
	\| 0.6804 \| 6.4 \| 800 \| 0.6888 \| -0.0053 \| -0.0191 \| 0.3800 \| 0.0137 \| -75.6909 \| -82.8156 \| -0.7975 \| -0.7402 \|
	\| 0.6958 \| 6.8 \| 850 \| 0.6864 \| -0.0017 \| -0.0201 \| 0.4050 \| 0.0184 \| -75.6942 \| -82.8034 \| -0.7975 \| -0.7402 \|
	\| 0.6932 \| 7.2 \| 900 \| 0.6864 \| -0.0017 \| -0.0201 \| 0.4050 \| 0.0184 \| -75.6942 \| -82.8034 \| -0.7975 \| -0.7402 \|
	\| 0.6785 \| 7.6 \| 950 \| 0.6864 \| -0.0017 \| -0.0201 \| 0.4050 \| 0.0184 \| -75.6942 \| -82.8034 \| -0.7975 \| -0.7402 \|
	\| 0.6947 \| 8.0 \| 1000 \| 0.6864 \| -0.0017 \| -0.0201 \| 0.4050 \| 0.0184 \| -75.6942 \| -82.8034 \| -0.7975 \| -0.7402 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.0.0+cu117
	- Datasets 3.0.0
	- Tokenizers 0.19.1