End of training

9c2895c verified 7 months ago

5.83 kB

	---
	license: llama3
	base_model: tsavage68/Summary_L3_1000steps_1e7rate_SFT2
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: Summary_L3_1000steps_1e6rate_01beta_CSFTDPO
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Summary_L3_1000steps_1e6rate_01beta_CSFTDPO

	This model is a fine-tuned version of [tsavage68/Summary_L3_1000steps_1e7rate_SFT2](https://huggingface.co/tsavage68/Summary_L3_1000steps_1e7rate_SFT2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5961
	- Rewards/chosen: -0.0885
	- Rewards/rejected: -2.0984
	- Rewards/accuracies: 0.1400
	- Rewards/margins: 2.0099
	- Logps/rejected: -36.2478
	- Logps/chosen: -10.2675
	- Logits/rejected: -1.2445
	- Logits/chosen: -1.2412

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.571 \| 0.2004 \| 50 \| 0.5986 \| 0.0271 \| -0.6059 \| 0.1400 \| 0.6329 \| -21.3224 \| -9.1122 \| -1.1153 \| -1.1163 \|
	\| 0.6585 \| 0.4008 \| 100 \| 0.5962 \| 0.0177 \| -1.2883 \| 0.1400 \| 1.3060 \| -28.1472 \| -9.2058 \| -1.1739 \| -1.1725 \|
	\| 0.6238 \| 0.6012 \| 150 \| 0.5961 \| -0.0262 \| -1.7529 \| 0.1400 \| 1.7267 \| -32.7924 \| -9.6448 \| -1.2119 \| -1.2094 \|
	\| 0.6065 \| 0.8016 \| 200 \| 0.5961 \| -0.0848 \| -2.0675 \| 0.1400 \| 1.9828 \| -35.9388 \| -10.2303 \| -1.2396 \| -1.2364 \|
	\| 0.6238 \| 1.0020 \| 250 \| 0.5961 \| -0.0864 \| -2.0702 \| 0.1400 \| 1.9839 \| -35.9662 \| -10.2464 \| -1.2401 \| -1.2369 \|
	\| 0.6238 \| 1.2024 \| 300 \| 0.5961 \| -0.0864 \| -2.0688 \| 0.1400 \| 1.9824 \| -35.9522 \| -10.2471 \| -1.2396 \| -1.2364 \|
	\| 0.6238 \| 1.4028 \| 350 \| 0.5961 \| -0.0866 \| -2.0730 \| 0.1400 \| 1.9864 \| -35.9935 \| -10.2485 \| -1.2409 \| -1.2378 \|
	\| 0.5718 \| 1.6032 \| 400 \| 0.5961 \| -0.0880 \| -2.0816 \| 0.1400 \| 1.9937 \| -36.0800 \| -10.2625 \| -1.2420 \| -1.2388 \|
	\| 0.5892 \| 1.8036 \| 450 \| 0.5961 \| -0.0869 \| -2.0872 \| 0.1400 \| 2.0004 \| -36.1360 \| -10.2514 \| -1.2428 \| -1.2396 \|
	\| 0.5718 \| 2.0040 \| 500 \| 0.5961 \| -0.0873 \| -2.0879 \| 0.1400 \| 2.0006 \| -36.1431 \| -10.2557 \| -1.2431 \| -1.2399 \|
	\| 0.5718 \| 2.2044 \| 550 \| 0.5961 \| -0.0872 \| -2.0916 \| 0.1400 \| 2.0044 \| -36.1798 \| -10.2553 \| -1.2434 \| -1.2402 \|
	\| 0.5545 \| 2.4048 \| 600 \| 0.5961 \| -0.0893 \| -2.0984 \| 0.1400 \| 2.0091 \| -36.2481 \| -10.2761 \| -1.2448 \| -1.2416 \|
	\| 0.5199 \| 2.6052 \| 650 \| 0.5961 \| -0.0881 \| -2.0960 \| 0.1400 \| 2.0078 \| -36.2235 \| -10.2642 \| -1.2437 \| -1.2405 \|
	\| 0.6238 \| 2.8056 \| 700 \| 0.5961 \| -0.0891 \| -2.1004 \| 0.1400 \| 2.0113 \| -36.2677 \| -10.2740 \| -1.2450 \| -1.2417 \|
	\| 0.6065 \| 3.0060 \| 750 \| 0.5961 \| -0.0879 \| -2.0983 \| 0.1400 \| 2.0104 \| -36.2469 \| -10.2615 \| -1.2456 \| -1.2423 \|
	\| 0.6412 \| 3.2064 \| 800 \| 0.5961 \| -0.0900 \| -2.1003 \| 0.1400 \| 2.0103 \| -36.2667 \| -10.2828 \| -1.2448 \| -1.2416 \|
	\| 0.6585 \| 3.4068 \| 850 \| 0.5961 \| -0.0875 \| -2.0997 \| 0.1400 \| 2.0122 \| -36.2604 \| -10.2578 \| -1.2456 \| -1.2424 \|
	\| 0.6238 \| 3.6072 \| 900 \| 0.5961 \| -0.0879 \| -2.0992 \| 0.1400 \| 2.0114 \| -36.2559 \| -10.2613 \| -1.2445 \| -1.2413 \|
	\| 0.5372 \| 3.8076 \| 950 \| 0.5961 \| -0.0884 \| -2.0981 \| 0.1400 \| 2.0097 \| -36.2444 \| -10.2669 \| -1.2444 \| -1.2412 \|
	\| 0.6238 \| 4.0080 \| 1000 \| 0.5961 \| -0.0885 \| -2.0984 \| 0.1400 \| 2.0099 \| -36.2478 \| -10.2675 \| -1.2445 \| -1.2412 \|


	### Framework versions

	- Transformers 4.41.2
	- Pytorch 2.0.0+cu117
	- Datasets 2.20.0
	- Tokenizers 0.19.1