Update README.md

5c122f1 verified about 1 month ago

3.93 kB

	---
	base_model: Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1
	library_name: transformers
	tags:
	- generated_from_trainer
	- axolotl
	- trl
	- cpo
	license:
	- llama3.1
	- gemma
	---

	# Llama-Gemma-2-27b-CPO_SimPO-iter2

	## 概要

	[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)を教師あり学習と[CPO_SimPO](https://github.com/fe1ixxu/CPO_SIMPO)によりInstruction Tuningしたモデルである[Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1](https://huggingface.co/Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1)に対して、
	2回目のCPO_SimPOを適用したモデルです。

	[松尾研大規模言語モデル講座2024](https://weblab.t.u-tokyo.ac.jp/lecture/course-list/large-language-model/)のコンペ用の提出モデル作成の一環として作成・公開しています。

	This model is built with Llama and Qwen.

	## 使用データセット

	- [Aratako/iterative-dpo-data-for-SimPO-iter2](https://huggingface.co/datasets/Aratako/iterative-dpo-data-for-SimPO-iter2)

	## ライセンス

	本モデルは学習に利用したデータの関係で以下のライセンスの影響を受けます。

	- [META LLAMA 3.1 COMMUNITY LICENSE](https://www.llama.com/llama3_1/license/)を継承します。
	- [Gemma Terms of Use](https://ai.google.dev/gemma/terms)を継承します。
	- [Qwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE)の影響を受けます。ライセンスは継承しませんが、「Built with Qwen」のような文言を記載する必要があります。

	## 学習に関する詳細

	本モデルの学習には[axolotl](https://github.com/axolotl-ai-cloud/axolotl)を使いました。パラメータ等の学習の設定は下記の設定ファイルをご確認ください。

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.5.2`
	```yaml
	base_model: Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer

	hub_model_id: Aratako/fft-simpo3-iterative-iter2
	hub_strategy: "end"
	push_dataset_to_hub:
	hf_use_auth_token: true

	plugins:
	- axolotl.integrations.liger.LigerPlugin
	liger_cross_entropy: false
	liger_rope: true
	liger_rms_norm: true
	liger_swiglu: true
	liger_fused_linear_cross_entropy: true

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	chat_template: tokenizer_default
	rl: simpo
	rl_beta: 10.0
	cpo_alpha: 0.05
	simpo_gamma: 5.0
	max_prompt_length: 512
	max_length: 2048


	datasets:
	- path: Aratako/iterative-dpo-data-for-SimPO-iter2
	type: gemma.custom
	train_on_split: train


	shuffle_merged_datasets: true
	dataset_prepared_path: /workspace/data/fft-simpo3-iterative-iter2-data
	output_dir: /workspace/data/27b-fft-simpo3-iterative-iter2

	sequence_len: 2048
	sample_packing: false
	eval_sample_packing: false
	pad_to_sequence_len: true

	adapter:
	lora_model_dir:
	lora_r:
	lora_alpha:
	lora_dropout:
	lora_target_linear:
	lora_fan_in_fan_out:

	wandb_project: 27b-fft
	wandb_entity: aratako-lm
	wandb_watch:
	wandb_name: simpo3-iter2
	wandb_log_model:

	gradient_accumulation_steps: 8
	micro_batch_size: 2
	num_epochs: 1
	optimizer: paged_adamw_8bit
	lr_scheduler: cosine
	cosine_min_lr_ratio: 0.1
	learning_rate: 3e-7

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: true
	early_stopping_patience:
	auto_resume_from_checkpoints: true
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	save_strategy: steps
	save_steps: 100
	save_total_limit: 1

	warmup_steps: 20
	eval_steps:
	eval_batch_size:
	eval_table_size:
	eval_max_new_tokens:
	debug:
	deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
	weight_decay: 0.01
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: <pad>
	```

	</details><br>