Aratako commited on
Commit
0b12719
·
verified ·
1 Parent(s): 19ab866

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -46
README.md CHANGED
@@ -1,67 +1,141 @@
1
  ---
2
- base_model: Aratako/Llama-Gemma-2-27b-Simpo-trial3-iter1
3
  library_name: transformers
4
- model_name: fft-orpo-iterative-iter3
5
  tags:
6
- - generated_from_trainer
7
  - axolotl
8
  - trl
9
  - orpo
10
- licence: license
 
 
 
11
  ---
12
 
13
- # Model Card for fft-orpo-iterative-iter3
14
 
15
- This model is a fine-tuned version of [Aratako/Llama-Gemma-2-27b-Simpo-trial3-iter1](https://huggingface.co/Aratako/Llama-Gemma-2-27b-Simpo-trial3-iter1).
16
- It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
- ## Quick start
19
 
20
- ```python
21
- from transformers import pipeline
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="Aratako/fft-orpo-iterative-iter3", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
27
- ```
28
 
29
- ## Training procedure
30
 
31
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/aratako-lm/27b-fft/runs/5o3squgn)
32
 
33
- This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691).
34
 
35
- ### Framework versions
 
 
 
 
36
 
37
- - TRL: 0.12.0
38
- - Transformers: 4.46.3
39
- - Pytorch: 2.3.1+cu121
40
- - Datasets: 3.1.0
41
- - Tokenizers: 0.20.3
42
 
43
- ## Citations
 
 
44
 
45
- Cite ORPO as:
46
 
47
- ```bibtex
48
- @article{hong2024orpo,
49
- title = {{ORPO: Monolithic Preference Optimization without Reference Model}},
50
- author = {Jiwoo Hong and Noah Lee and James Thorne},
51
- year = 2024,
52
- eprint = {arXiv:2403.07691}
53
- }
54
- ```
55
 
56
- Cite TRL as:
57
-
58
- ```bibtex
59
- @misc{vonwerra2022trl,
60
- title = {{TRL: Transformer Reinforcement Learning}},
61
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
62
- year = 2020,
63
- journal = {GitHub repository},
64
- publisher = {GitHub},
65
- howpublished = {\url{https://github.com/huggingface/trl}}
66
- }
67
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: Aratako/Llama-Gemma-2-27b-ORPO-iter3
3
  library_name: transformers
 
4
  tags:
 
5
  - axolotl
6
  - trl
7
  - orpo
8
+ - exl2
9
+ license:
10
+ - llama3.1
11
+ - gemma
12
  ---
13
 
14
+ # Llama-Gemma-2-27b-ORPO-iter3-5.8bpw
15
 
16
+ ## 概要
 
17
 
18
+ [Aratako/Llama-Gemma-2-27b-ORPO-iter3](https://huggingface.co/Aratako/Llama-Gemma-2-27b-ORPO-iter3)を[ExLlamaV2](https://github.com/turboderp/exllamav2)を使って5.8 bpwに量子化したモデルです。
19
 
20
+ [松尾研大規模言語モデル講座2024](https://weblab.t.u-tokyo.ac.jp/lecture/course-list/large-language-model/)のコンペ用の提出モデル作成の一環として作成・公開しています。
 
21
 
22
+ This model is built with Llama and Qwen.
 
 
 
 
23
 
24
+ 学習データ等の詳細については元モデルの概要をご確認ください。
25
 
26
+ ## 推論方法
27
 
28
+ [松尾研大規模言語モデル講座2024](https://weblab.t.u-tokyo.ac.jp/lecture/course-list/large-language-model/)のコンペのタスクの推論方法を以下に記載します。
29
 
30
+ 1. 以下のようにして推論の環境を準備します。
31
+ ```bash
32
+ git clone https://github.com/turboderp/exllamav2
33
+ cd exllamav2
34
+ pip install -r requirements.txt
35
 
36
+ # PyTorchやCUDA、Pythonのバージョンを合わせてExLlamaV2をインストール
37
+ # ここではCUDA 12.2、PyTorch 2.5.1、Python 3.10の環境が初期状態だと仮定する
38
+ pip install https://github.com/turboderp/exllamav2/releases/download/v0.2.5/exllamav2-0.2.5+cu121.torch2.5.0-cp310-cp310-linux_x86_64.whl
39
+ pip install torch==2.5.0
40
+ pip install -U --no-build-isolation flash-attn
41
 
42
+ # モデルの用意
43
+ huggingface-cli download Aratako/Llama-Gemma-2-27b-ORPO-iter3-5.8bpw --local-dir ./Llama-Gemma-2-27b-ORPO-iter3-5.8bpw
44
+ ```
45
 
46
+ 2. 以下のようなPythonファイルをelyza_tasks_100_tv_exllamav2.pyとして作成します。また、elyza-tasks-100-TV_0.jsonlを同じディレクトリに配置します。
47
 
48
+ <details><summary>elyza_tasks_100_tv_exllamav2.py</summary>
 
 
 
 
 
 
 
49
 
50
+ ```python
51
+ import argparse
52
+ import json
53
+ from transformers import AutoTokenizer
54
+ from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Cache_Q8, ExLlamaV2Tokenizer
55
+ from exllamav2.generator import ExLlamaV2DynamicGenerator, ExLlamaV2Sampler
56
+ from datasets import load_dataset
57
+
58
+ parser = argparse.ArgumentParser()
59
+
60
+ parser.add_argument("-m", "--model", help="評価するモデル", required=True)
61
+ parser.add_argument("-t", "--tokenizer", help="使用するトークナイザ")
62
+ parser.add_argument("-o", "--output", help="出力jsonlファイルの名前")
63
+
64
+ args = parser.parse_args()
65
+
66
+ if args.tokenizer is None:
67
+ args.tokenizer = args.model
68
+
69
+ if args.output is None:
70
+ args.output = f"answers-{args.model.split('/')[-1]}.jsonl"
71
+
72
+ ds = load_dataset("json", data_files="./elyza-tasks-100-TV_0.jsonl", split="train")
73
+ hf_tokenizer = AutoTokenizer.from_pretrained(args.model)
74
+
75
+ # ExLlamaV2の設定
76
+ config = ExLlamaV2Config(args.model)
77
+ config.arch_compat_overrides()
78
+ model = ExLlamaV2(config)
79
+ cache = ExLlamaV2Cache_Q8(model, max_seq_len=2304, lazy=True)
80
+ model.load_autosplit(cache, progress=True)
81
+ tokenizer = ExLlamaV2Tokenizer(config)
82
+
83
+ generator = ExLlamaV2DynamicGenerator(
84
+ model=model,
85
+ cache=cache,
86
+ tokenizer=tokenizer,
87
+ )
88
+
89
+ # 推論パラメータ
90
+ gen_settings = ExLlamaV2Sampler.Settings.greedy()
91
+
92
+ # 入力が768tokenを超える場合エラーになるはずなのでその場合はここを256ずつ減らしてください。
93
+ max_tokens = 1536
94
+
95
+ def apply_chat_template(item):
96
+ messages = [
97
+ {"role": "user", "content": item["input"]}
98
+ ]
99
+ item["prompt"] = hf_tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
100
+ return item
101
+
102
+ ds = ds.map(apply_chat_template, batched=False)
103
+
104
+ def generate_answer(batch):
105
+ outputs = generator.generate(
106
+ prompt=batch["prompt"],
107
+ max_new_tokens=max_tokens,
108
+ stop_conditions=[tokenizer.eos_token_id],
109
+ gen_settings=gen_settings,
110
+ encode_special_tokens=True,
111
+ )
112
+ # 入力部分も含めて返ってくるので出力部分だけを取り出す
113
+ outputs = [text.split("<start_of_turn>model\n", 1)[-1] for text in outputs]
114
+ print(outputs)
115
+ batch["output"] = outputs
116
+ return batch
117
+
118
+ ds = ds.map(generate_answer, batched=True, batch_size=10)
119
+ ds = ds.remove_columns("prompt")
120
+
121
+ with open(args.output, "w", encoding="utf-8") as f:
122
+ for row in ds:
123
+ json.dump(row, f, ensure_ascii=False)
124
+ f.write("\n")
125
+ ```
126
+
127
+ </details><br>
128
+
129
+ 3. 以下のように推論を実行します。推論が完了するとデフォルトではanswers-Llama-Gemma-2-27b-ORPO-iter3-5.8bpw.jsonlに回答が保存されます。
130
+
131
+ ```bash
132
+ python elyza_tasks_100_tv_exllamav2.py -m Llama-Gemma-2-27b-ORPO-iter3-5.8bpw
133
+ ```
134
+
135
+ ## ライセンス
136
+
137
+ 本モデルは学習に利用したデータの関係で以下のライセンスの影響を受けます。
138
+
139
+ - [META LLAMA 3.1 COMMUNITY LICENSE](https://www.llama.com/llama3_1/license/)を継承します。
140
+ - [Gemma Terms of Use](https://ai.google.dev/gemma/terms)を継承します。
141
+ - [Qwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE)の影響を受けます。ライセンスは継承しませんが、「Built with Qwen」のような文言を記載する必要があります。