Aratako commited on
Commit
6e93f91
·
verified ·
1 Parent(s): 2dd488c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -43
README.md CHANGED
@@ -1,67 +1,144 @@
1
  ---
2
- base_model: Aratako/Llama-Gemma-2-27b-Simpo-trial3-iter1
3
  library_name: transformers
4
- model_name: fft-orpo-iterative-iter3
5
  tags:
6
  - generated_from_trainer
7
  - axolotl
8
  - trl
9
  - orpo
10
  licence: license
 
 
 
11
  ---
12
 
13
- # Model Card for fft-orpo-iterative-iter3
14
 
15
- This model is a fine-tuned version of [Aratako/Llama-Gemma-2-27b-Simpo-trial3-iter1](https://huggingface.co/Aratako/Llama-Gemma-2-27b-Simpo-trial3-iter1).
16
- It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
- ## Quick start
 
19
 
20
- ```python
21
- from transformers import pipeline
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="Aratako/fft-orpo-iterative-iter3", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
27
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
- ## Training procedure
 
 
 
30
 
31
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/aratako-lm/27b-fft/runs/5o3squgn)
 
 
 
 
 
 
32
 
33
- This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691).
 
 
 
 
34
 
35
- ### Framework versions
 
 
 
 
 
 
36
 
37
- - TRL: 0.12.0
38
- - Transformers: 4.46.3
39
- - Pytorch: 2.3.1+cu121
40
- - Datasets: 3.1.0
41
- - Tokenizers: 0.20.3
42
 
43
- ## Citations
 
 
 
 
 
 
44
 
45
- Cite ORPO as:
 
 
46
 
47
- ```bibtex
48
- @article{hong2024orpo,
49
- title = {{ORPO: Monolithic Preference Optimization without Reference Model}},
50
- author = {Jiwoo Hong and Noah Lee and James Thorne},
51
- year = 2024,
52
- eprint = {arXiv:2403.07691}
53
- }
 
 
 
 
 
54
  ```
55
 
56
- Cite TRL as:
57
-
58
- ```bibtex
59
- @misc{vonwerra2022trl,
60
- title = {{TRL: Transformer Reinforcement Learning}},
61
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
62
- year = 2020,
63
- journal = {GitHub repository},
64
- publisher = {GitHub},
65
- howpublished = {\url{https://github.com/huggingface/trl}}
66
- }
67
- ```
 
1
  ---
2
+ base_model: Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2
3
  library_name: transformers
 
4
  tags:
5
  - generated_from_trainer
6
  - axolotl
7
  - trl
8
  - orpo
9
  licence: license
10
+ license:
11
+ - llama3.1
12
+ - gemma
13
  ---
14
 
15
+ # Llama-Gemma-2-27b-ORPO-iter3
16
 
17
+ ## 概要
 
18
 
19
+ [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)を教師あり学習と2回の[CPO_SimPO](https://github.com/fe1ixxu/CPO_SIMPO)によりInstruction Tuningしたモデルである[Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2](https://huggingface.co/Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2)に対して、
20
+ [ORPO](https://arxiv.org/abs/2403.07691)を適用したモデルです。
21
 
22
+ [松尾研大規模言語モデル講座2024](https://weblab.t.u-tokyo.ac.jp/lecture/course-list/large-language-model/)のコンペ用の提出モデル作成の一環として作成・公開しています。
 
23
 
24
+ This model is built with Llama and Qwen.
25
+
26
+ ## 使用データセット
27
+
28
+ - [Aratako/iterative-dpo-data-for-ORPO-iter3](https://huggingface.co/datasets/Aratako/iterative-dpo-data-for-ORPO-iter3)
29
+
30
+ ## ライセンス
31
+
32
+ 本モデルは学習に利用したデータの関係で以下のライセンスの影響を受けます。
33
+
34
+ - [META LLAMA 3.1 COMMUNITY LICENSE](https://www.llama.com/llama3_1/license/)を継承します。
35
+ - [Gemma Terms of Use](https://ai.google.dev/gemma/terms)を継承します。
36
+ - [Qwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE)の影響を受けます。ライセンスは継承しませんが、「Built with Qwen」のような文言を記載する必要があります。
37
+
38
+ ## 学習に関する詳細
39
+
40
+ 本モデルの学習には[axolotl](https://github.com/axolotl-ai-cloud/axolotl)を使いました。パラメータ等の学習の設定は下記の設定ファイルをご確認ください。
41
+
42
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
43
+ <details><summary>See axolotl config</summary>
44
+
45
+ axolotl version: `0.5.2`
46
+ ```yaml
47
+ base_model: Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2
48
+ model_type: AutoModelForCausalLM
49
+ tokenizer_type: AutoTokenizer
50
+
51
+ hub_model_id: Aratako/fft-orpo-iterative-iter3
52
+ hub_strategy: "end"
53
+ push_dataset_to_hub:
54
+ hf_use_auth_token: true
55
+
56
+ plugins:
57
+ - axolotl.integrations.liger.LigerPlugin
58
+ liger_cross_entropy: false
59
+ liger_rope: true
60
+ liger_rms_norm: true
61
+ liger_swiglu: true
62
+ liger_fused_linear_cross_entropy: true
63
+
64
+ load_in_8bit: false
65
+ load_in_4bit: false
66
+ strict: false
67
+
68
+ chat_template: tokenizer_default
69
+ rl: orpo
70
+ orpo_alpha: 0.1
71
+ max_prompt_len: 512
72
+ max_length: 2560
73
+
74
+
75
+ datasets:
76
+ - path: Aratako/iterative-dpo-data-for-ORPO-iter3
77
+ type: chat_template.argilla
78
+ train_on_split: train
79
+
80
+
81
+ shuffle_merged_datasets: true
82
+ dataset_prepared_path: /workspace/data/fft-orpo-iterative-iter3-data
83
+ output_dir: /workspace/data/27b-fft-orpo-iterative-iter3
84
 
85
+ sequence_len: 2560
86
+ sample_packing: false
87
+ eval_sample_packing: false
88
+ pad_to_sequence_len: true
89
 
90
+ adapter:
91
+ lora_model_dir:
92
+ lora_r:
93
+ lora_alpha:
94
+ lora_dropout:
95
+ lora_target_linear:
96
+ lora_fan_in_fan_out:
97
 
98
+ wandb_project: 27b-fft
99
+ wandb_entity: aratako-lm
100
+ wandb_watch:
101
+ wandb_name: orpo-iter3
102
+ wandb_log_model:
103
 
104
+ gradient_accumulation_steps: 16
105
+ micro_batch_size: 1
106
+ num_epochs: 1
107
+ optimizer: paged_adamw_8bit
108
+ lr_scheduler: cosine
109
+ cosine_min_lr_ratio: 0.1
110
+ learning_rate: 8e-7
111
 
112
+ train_on_inputs: false
113
+ group_by_length: false
114
+ bf16: auto
115
+ fp16:
116
+ tf32: false
117
 
118
+ gradient_checkpointing: true
119
+ early_stopping_patience:
120
+ auto_resume_from_checkpoints: true
121
+ local_rank:
122
+ logging_steps: 1
123
+ xformers_attention:
124
+ flash_attention: true
125
 
126
+ save_strategy: steps
127
+ save_steps: 100
128
+ save_total_limit: 1
129
 
130
+ warmup_steps: 20
131
+ eval_steps:
132
+ eval_batch_size:
133
+ eval_table_size:
134
+ eval_max_new_tokens:
135
+ debug:
136
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
137
+ weight_decay: 0.01
138
+ fsdp:
139
+ fsdp_config:
140
+ special_tokens:
141
+ pad_token: <pad>
142
  ```
143
 
144
+ </details><br>