Upload folder using huggingface_hub

Browse files

Files changed (18) hide show

.gitignore +8 -0
README.md +63 -0
assets/img/alpaca_blog.png +0 -0
assets/img/mtbench_hf.png +0 -0
main.py +203 -0
outputs/alpacaeval/Mistral-ORPO-alpha.json +0 -0
outputs/alpacaeval/Mistral-ORPO-beta.json +0 -0
outputs/mtbench/Mistral-ORPO-alpha.jsonl +0 -0
outputs/mtbench/Mistral-ORPO-beta.jsonl +0 -0
requirements.txt +114 -0
runpod.sh +24 -0
scripts/run_mistral_orpo_beta.sh +20 -0
scripts/run_mistral_orpo_capybara.sh +22 -0
src/accelerate/ds2.yaml +21 -0
src/args.py +34 -0
src/orpo_trainer.py +83 -0
src/utils.py +20 -0
trl/test_orpo_trainer_demo.py +95 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,8 @@

+wandb
+src/__pycache__
+scripts/run_orpo.sh
+src/accelerate/fsdp.yaml
+scripts/run_orpo.sh
+src/__pycache__/args.cpython-311.pyc
+src/__pycache__/utils.cpython-311.pyc
+src/accelerate/fsdp.yaml

README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# **ORPO**
+### **`Updates (24.03.25)`**
+- [X] Sample script for ORPOTrainer in 🤗<a class="link" href="https://github.com/huggingface/trl">TRL</a> is added to `trl/test_orpo_trainer_demo.py`
+- [X] New model, 🤗<a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k">kaist-ai/mistral-orpo-capybara-7k</a>, is added to 🤗<a class="link" href="https://huggingface.co/collections/kaist-ai/orpo-65efef87544ba100aef30013">ORPO Collection</a>
+- [X] Now you can try ORPO in 🤗<a class="link" href="https://github.com/huggingface/trl">TRL</a> and <a class="link" href="https://github.com/OpenAccess-AI-Collective/axolotl">Axolotl</a>🔥
+- [X] We are making general guideline for training LLMs with ORPO, stay tuned🔥
+- [X] **Mistral-ORPO-β** achieved a 14.7% in the length-controlled (LC) win rate on <a class="link" href="https://tatsu-lab.github.io/alpaca_eval/">official AlpacaEval Leaderboard</a>🔥
+&nbsp;
+This is the official repository for <a class="link" href="https://arxiv.org/abs/2403.07691">**ORPO: Monolithic Preference Optimization without Reference Model**</a>. The detailed results in the paper can be found in:
+- [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta)
+- [AlpacaEval](#alpacaeval)
+- [MT-Bench](#mt-bench)
+- [IFEval](#ifeval)
+### **`Model Checkpoints`**
+Our models trained with ORPO can be found in:
+- [X] **Mistral-ORPO-Capybara-7k**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k">kaist-ai/mistral-orpo-capybara-7k</a>
+- [X] **Mistral-ORPO-⍺**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-alpha">kaist-ai/mistral-orpo-alpha</a>
+- [X] **Mistral-ORPO-β**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-beta">kaist-ai/mistral-orpo-beta</a>
+And the corresponding logs for the average log probabilities of chosen/rejected responses during training are reported in:
+- [X] **Mistral-ORPO-Capybara-7k**: TBU
+- [X] **Mistral-ORPO-⍺**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE1NzE0?accessToken=rms6o4mg5vo3feu1bvbpk632m4cspe19l0u1p4he3othx5bgean82chn9neiile6">Wandb Report for Mistral-ORPO-⍺</a>
+- [X] **Mistral-ORPO-β**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE3MzMy?accessToken=dij4qbp6dcrofsanzbgobjsne9el8a2zkly2u5z82rxisd4wiwv1rhp0s2dub11e">Wandb Report for Mistral-ORPO-β</a>
+&nbsp;
+### **`AlpacaEval`**
+<figure>
+  <img class="png" src="/assets/img/alpaca_blog.png" alt="Description of the image">
+  <figcaption><b>Figure 1.</b> AlpacaEval 2.0 score for the models trained with different alignment methods.</figcaption>
+</figure>
+&nbsp;
+### **`MT-Bench`**
+<figure>
+  <img class="png" src="/assets/img/mtbench_hf.png" alt="Description of the image">
+  <figcaption><b>Figure 2.</b> MT-Bench result by category.</figcaption>
+</figure>
+&nbsp;
+### **`IFEval`**
+IFEval scores are measured with <a class="link" href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI/lm-evaluation-harness</a> by applying the chat template. The scores for Llama-2-Chat (70B), Zephyr-β (7B), and Mixtral-8X7B-Instruct-v0.1 are originally reported in <a class="link" href="https://twitter.com/wiskojo/status/1739767758462877823">this tweet</a>.
+| **Model Type**     | **Prompt-Strict** | **Prompt-Loose** | **Inst-Strict** | **Inst-Loose** |
+|--------------------|:-----------------:|:----------------:|:---------------:|----------------|
+| **Llama-2-Chat (70B)** |       0.4436      |      0.5342      |      0.5468     |     0.6319     |
+| **Zephyr-β (7B)** |       0.4233      |      0.4547      |      0.5492     |     0.5767     |
+| **Mixtral-8X7B-Instruct-v0.1** |       0.5213      |      **0.5712**      |      0.6343     |     **0.6823**     |
+| **Mistral-ORPO-⍺ (7B)** |       0.5009      |      0.5083      |      0.5995     |     0.6163     |
+| **Mistral-ORPO-β (7B)** |       **0.5287**      |      0.5564      |      **0.6355**     |     0.6619     |

assets/img/alpaca_blog.png ADDED Viewed

assets/img/mtbench_hf.png ADDED Viewed

main.py ADDED Viewed

	@@ -0,0 +1,203 @@

+import os
+import time
+import wandb
+import torch
+import argparse
+from datasets import load_dataset
+from typing import List, Dict, Union
+from transformers import (
+    AutoTokenizer,
+    AutoModelForCausalLM,
+    TrainingArguments,
+    DataCollatorForLanguageModeling
+)
+from src.args import default_args
+from src.orpo_trainer import ORPOTrainer
+from src.utils import preprocess_logits_for_metrics, dataset_split_selector
+class ORPO(object):
+    def __init__(self, args) -> None:
+        self.start = time.gmtime()
+        self.args = args
+        # Load Tokenizer
+        print(">>> 1. Loading Tokenizer")
+        self.tokenizer = AutoTokenizer.from_pretrained(self.args.model_name, cache_dir=self.args.cache_dir)
+        if self.tokenizer.chat_template is None:
+            self.tokenizer.chat_template = "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
+            print("     1-1. Chat Template Applied (<|user|> <|assistant|>)")
+        else:
+            pass
+        self.tokenizer.pad_token_id = self.tokenizer.eos_token_id
+        # Load Model
+        print(">>> 2. Loading Model")
+        if self.args.flash_attention_2:
+            self.model = AutoModelForCausalLM.from_pretrained(self.args.model_name,
+                                                              cache_dir=self.args.cache_dir,
+                                                              torch_dtype=torch.bfloat16,
+                                                              attn_implementation="flash_attention_2")
+        else:
+            self.model = AutoModelForCausalLM.from_pretrained(self.args.model_name,
+                                                              cache_dir=self.args.cache_dir,
+                                                              torch_dtype=torch.bfloat16)
+        # Load Dataset
+        print(">>> 3. Loading Dataset")
+        self.data = load_dataset(self.args.data_name, cache_dir=self.args.cache_dir)
+        # Preprocess Dataset
+        print(">>> 4. Filtering and Preprocessing Dataset")
+        data_split = dataset_split_selector(self.data)
+        if len(data_split) == 1:
+            self.is_test = False
+            train_split = data_split[0]
+            print(f"   >>> Test Set = {self.is_test}")
+        else:
+            self.is_test = True
+            train_split = data_split[0]
+            test_split = data_split[1]
+            test = self.data[test_split].filter(self.filter_dataset)
+            self.test = test.map(self.preprocess_dataset, batched=True, num_proc=self.args.num_proc, remove_columns=self.data[test_split].column_names)
+        train = self.data[train_split].filter(self.filter_dataset).select(range(self.args.max_samples))
+        print(f"\n\n>>> {len(train)} / {len(self.data[train_split])} rows left after filtering by prompt length.")
+        self.train = train.map(self.preprocess_dataset, batched=True, num_proc=self.args.num_proc, remove_columns=self.data[train_split].column_names)
+        # Set WANDB & Logging Configurations
+        self.run_name = f"{self.args.model_name.split('/')[-1]}-{self.args.data_name.split('/')[-1]}-lambda{self.args.alpha}-ORPO-{self.start.tm_mday}-{self.start.tm_hour}-{self.start.tm_min}"
+        self.save_dir = os.path.join('./checkpoints/', f"{self.args.data_name.split('/')[-1]}/{self.run_name}")
+        self.log_dir = os.path.join('./checkpoints/', f"{self.args.data_name.split('/')[-1]}/{self.run_name}/logs")
+        os.makedirs(self.save_dir, exist_ok=True)
+        os.makedirs(self.log_dir, exist_ok=True)
+    def preprocess_dataset(self, examples: Union[List, Dict]):
+        if ('instruction' in examples.keys()) or ('question' in examples.keys()):
+            prompt_key = 'instruction' if 'instruction' in examples.keys() else 'question'
+            prompt = [self.tokenizer.apply_chat_template([{'role': 'user', 'content': item}], tokenize=False, add_generation_prompt=True) for item in examples[prompt_key]]
+            chosen = [self.tokenizer.apply_chat_template([{'role': 'user', 'content': item_prompt}, {'role': 'assistant', 'content': item_chosen}], tokenize=False) for item_prompt, item_chosen in zip(examples[prompt_key], examples['chosen'])]
+            rejected = [self.tokenizer.apply_chat_template([{'role': 'user', 'content': item_prompt}, {'role': 'assistant', 'content': item_rejected}], tokenize=False) for item_prompt, item_rejected in zip(examples[prompt_key], examples['rejected'])]
+        else:
+            prompt = [self.tokenizer.apply_chat_template([item[0]], tokenize=False, add_generation_prompt=True) for item in examples['chosen']]
+            chosen = [self.tokenizer.apply_chat_template(item, tokenize=False) for item in examples['chosen']]
+            rejected = [self.tokenizer.apply_chat_template(item, tokenize=False) for item in examples['rejected']]
+        model_inputs = self.tokenizer(prompt,
+                                      max_length=self.args.response_max_length,
+                                      padding='max_length',
+                                      truncation=True,
+                                      return_tensors='pt')
+        pos_labels = self.tokenizer(chosen,
+                                    max_length=self.args.response_max_length,
+                                    padding='max_length',
+                                    truncation=True,
+                                    return_tensors='pt')
+        neg_labels = self.tokenizer(rejected,
+                                    max_length=self.args.response_max_length,
+                                    padding='max_length',
+                                    truncation=True,
+                                    return_tensors='pt')
+        model_inputs['positive_input_ids'] = pos_labels['input_ids']
+        model_inputs['positive_attention_mask'] = pos_labels['attention_mask']
+        model_inputs['negative_input_ids'] = neg_labels['input_ids']
+        model_inputs['negative_attention_mask'] = neg_labels['attention_mask']
+        return model_inputs
+    def filter_dataset(self, examples: Union[List, Dict]):
+        if 'instruction' in examples.keys():
+            query = examples['instruction']
+            prompt_length = self.tokenizer.apply_chat_template([{'content': query, 'role': 'user'}], tokenize=True, add_generation_prompt=True, return_tensors='pt').size(-1)
+        elif 'question' in examples.keys():
+            query = examples['question']
+            prompt_length = self.tokenizer.apply_chat_template([{'content': query, 'role': 'user'}], tokenize=True, add_generation_prompt=True, return_tensors='pt').size(-1)
+        else:
+            prompt_length = self.tokenizer.apply_chat_template([examples['chosen'][0]], tokenize=True, add_generation_prompt=True, return_tensors='pt').size(-1)
+        if prompt_length < self.args.prompt_max_length:
+            return True
+        else:
+            return False
+    def prepare_trainer(self):
+        wandb.init(name=self.run_name)
+        arguments = TrainingArguments(
+            output_dir=self.save_dir,  # The output directory
+            logging_dir=self.log_dir,
+            logging_steps=50,
+            learning_rate=self.args.lr,
+            overwrite_output_dir=True,  # overwrite the content of the output directory
+            num_train_epochs=self.args.num_train_epochs,  # number of training epochs
+            per_device_train_batch_size=self.args.per_device_train_batch_size,  # batch size for training
+            per_device_eval_batch_size=self.args.per_device_eval_batch_size,  # batch size for evaluation
+            evaluation_strategy=self.args.evaluation_strategy if self.is_test else 'no',  # batch size for evaluation
+            save_strategy=self.args.evaluation_strategy,
+            optim=self.args.optim,
+            warmup_steps=self.args.warmup_steps,
+            gradient_accumulation_steps=self.args.gradient_accumulation_steps,
+            gradient_checkpointing=True, #if ('llama' in self.args.model_name.lower()) or ('mistral' in self.args.model_name.lower()) else False,
+            gradient_checkpointing_kwargs={'use_reentrant':True},
+            load_best_model_at_end=self.is_test,
+            do_train=True,
+            do_eval=self.is_test,
+            lr_scheduler_type=self.args.lr_scheduler_type,
+            remove_unused_columns=False,
+            report_to='wandb',
+            run_name=self.run_name,
+            bf16=True
+        )
+        data_collator = DataCollatorForLanguageModeling(tokenizer=self.tokenizer, mlm=False)
+        self.trainer = ORPOTrainer(
+            model=self.model,
+            alpha=self.args.alpha,
+            pad=self.tokenizer.pad_token_id,
+            args=arguments,
+            train_dataset=self.train,
+            eval_dataset=self.test if self.is_test else None,
+            data_collator=data_collator,
+            preprocess_logits_for_metrics=preprocess_logits_for_metrics
+        )
+    def run(self):
+        print(">>> 5. Preparing ORPOTrainer")
+        self.prepare_trainer()
+        self.trainer.train()
+        # Saving code for FSDP
+        if self.trainer.is_fsdp_enabled:
+            self.trainer.accelerator.state.fsdp_plugin.set_state_dict_type("FULL_STATE_DICT")
+        self.trainer.save_model()
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser("ORPO")
+    args = default_args(parser)
+    # Set WANDB configurations
+    if args.wandb_entity is not None and args.wandb_project_name is not None:
+        os.environ["WANDB_ENTITY"] = args.wandb_entity
+        os.environ["WANDB_PROJECT"] = args.wandb_project_name
+    else:
+        pass
+    os.environ["TOKENIZERS_PARALLELISM"] = 'false'
+    print("================================================================================================\n")
+    print(f">>> Fine-tuning {args.model_name} with ORPO on {args.data_name}\n")
+    print("================================================================================================")
+    print("\n\n>>> Summary:")
+    print(f"    - Lambda              : {args.alpha}")
+    print(f"    - Training Epochs     : {args.num_train_epochs}")
+    print(f"    - Prompt Max Length   : {args.prompt_max_length}")
+    print(f"    - Response Max Length : {args.response_max_length}")
+    item = ORPO(args=args)
+    item.run()

outputs/alpacaeval/Mistral-ORPO-alpha.json ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs/alpacaeval/Mistral-ORPO-beta.json ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs/mtbench/Mistral-ORPO-alpha.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs/mtbench/Mistral-ORPO-beta.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,114 @@

+accelerate @ file:///home/conda/feedstock_root/build_artifacts/accelerate_1710334587919/work
+aiohttp @ file:///croot/aiohttp_1707342283163/work
+aiosignal @ file:///tmp/build/80754af9/aiosignal_1637843061372/work
+appdirs==1.4.4
+asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
+attrs @ file:///croot/attrs_1695717823297/work
+bitsandbytes==0.43.0
+Bottleneck @ file:///croot/bottleneck_1707864210935/work
+Brotli @ file:///work/ci_py311/brotli-split_1676830125088/work
+cachetools==5.3.3
+certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1707022139797/work/certifi
+cffi @ file:///croot/cffi_1700254295673/work
+charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
+click @ file:///croot/click_1698129812380/work
+comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work
+datasets @ file:///home/conda/feedstock_root/build_artifacts/datasets_1709395865330/work
+debugpy @ file:///croot/debugpy_1690905042057/work
+decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
+dill @ file:///croot/dill_1692271232022/work
+docker-pycreds @ file:///Users/ktietz/demo/mc3/conda-bld/docker-pycreds_1630654474270/work
+einops==0.7.0
+exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1704921103267/work
+executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1698579936712/work
+filelock @ file:///croot/filelock_1700591183607/work
+flash-attn==2.5.6
+frozenlist @ file:///croot/frozenlist_1698702560391/work
+fsspec==2023.4.0
+gitdb @ file:///tmp/build/80754af9/gitdb_1617117951232/work
+GitPython @ file:///croot/gitpython_1696936983078/work
+gmpy2 @ file:///work/ci_py311/gmpy2_1676839849213/work
+huggingface-hub @ file:///croot/huggingface_hub_1708634519519/work
+idna @ file:///work/ci_py311/idna_1676822698822/work
+importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1709821103657/work
+ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1708996548741/work
+ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1709559745751/work
+jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
+Jinja2==3.1.2
+jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1710255804825/work
+jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1710257359434/work
+MarkupSafe @ file:///croot/markupsafe_1704205993651/work
+matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work
+mkl-fft @ file:///croot/mkl_fft_1695058164594/work
+mkl-random @ file:///croot/mkl_random_1695059800811/work
+mkl-service==2.4.0
+mpmath @ file:///croot/mpmath_1690848262763/work
+multidict @ file:///croot/multidict_1701096859099/work
+multiprocess @ file:///croot/multiprocess_1692294385131/work
+nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
+networkx==3.2.1
+ninja==1.11.1.1
+numexpr @ file:///croot/numexpr_1696515281613/work
+numpy @ file:///croot/numpy_and_numpy_base_1708638617955/work/dist/numpy-1.26.4-cp311-cp311-linux_x86_64.whl#sha256=5f96f274d410a1682519282ae769c877d32fdbf171aa8badec7bf5e1d3a1748a
+nvidia-cublas-cu11==11.11.3.6
+nvidia-cuda-cupti-cu11==11.8.87
+nvidia-cuda-nvrtc-cu11==11.8.89
+nvidia-cuda-runtime-cu11==11.8.89
+nvidia-cudnn-cu11==8.7.0.84
+nvidia-cufft-cu11==10.9.0.58
+nvidia-curand-cu11==10.3.0.86
+nvidia-cusolver-cu11==11.4.1.48
+nvidia-cusparse-cu11==11.7.5.86
+nvidia-ml-py==12.535.133
+nvidia-nccl-cu11==2.19.3
+nvidia-nvtx-cu11==11.8.86
+nvitop==1.3.2
+packaging @ file:///croot/packaging_1693575174725/work
+pandas @ file:///croot/pandas_1709590491089/work/dist/pandas-2.2.1-cp311-cp311-linux_x86_64.whl#sha256=0a2793a31a0135a35735e1431d453a06186a3a7c607d9b441d9bd5f0fe4ded31
+parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work
+pathtools @ file:///Users/ktietz/demo/mc3/conda-bld/pathtools_1629713893697/work
+pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work
+pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
+pillow==10.2.0
+platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1706713388748/work
+prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1702399386289/work
+protobuf==3.20.3
+psutil @ file:///work/ci_py311_2/psutil_1679337388738/work
+ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
+pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work
+pyarrow @ file:///croot/pyarrow_1707330824290/work/python
+pyarrow-hotfix @ file:///home/conda/feedstock_root/build_artifacts/pyarrow-hotfix_1700596371886/work
+pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
+Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1700607939962/work
+PySocks @ file:///work/ci_py311/pysocks_1676822712504/work
+python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
+pytz @ file:///croot/pytz_1695131579487/work
+PyYAML @ file:///croot/pyyaml_1698096049011/work
+pyzmq @ file:///croot/pyzmq_1705605076900/work
+regex @ file:///croot/regex_1696515298636/work
+requests @ file:///croot/requests_1707355572290/work
+safetensors @ file:///croot/safetensors_1708633833937/work
+sentry-sdk @ file:///work/ci_py311/sentry-sdk_1676862120883/work
+setproctitle @ file:///work/ci_py311/setproctitle_1676838789127/work
+six @ file:///tmp/build/80754af9/six_1644875935023/work
+smmap @ file:///tmp/build/80754af9/smmap_1611694433573/work
+stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
+sympy @ file:///croot/sympy_1701397643339/work
+termcolor==2.4.0
+tokenizers @ file:///croot/tokenizers_1708633814160/work
+torch==2.2.1+cu118
+torchaudio==2.2.1+cu118
+torchvision==0.17.1+cu118
+tornado @ file:///croot/tornado_1696936946304/work
+tqdm @ file:///croot/tqdm_1679561862951/work
+traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1710254411456/work
+transformers @ file:///home/conda/feedstock_root/build_artifacts/transformers_1709308155748/work
+triton==2.2.0
+typing_extensions==4.8.0
+tzdata @ file:///croot/python-tzdata_1690578112552/work
+urllib3 @ file:///croot/urllib3_1707770551213/work
+wandb @ file:///home/conda/feedstock_root/build_artifacts/wandb_1707246480133/work
+wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
+xxhash @ file:///work/ci_py311/python-xxhash_1676842384694/work
+yarl @ file:///croot/yarl_1701105127787/work
+zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1695255097490/work

runpod.sh ADDED Viewed

	@@ -0,0 +1,24 @@

+pip install datasets accelerate wandb transformers bitsandbytes sentencepiece
+git clone https://github.com/burtenshaw/orpo.git
+cd orpo
+sed -i 's/num_processes: 2/num_processes: 1/' ./src/accelerate/fsdp.yaml
+sed -i 's/--num_proc", default=8/--num_proc", default=1/' ./src/args.py
+wandb login $WANDB_TOKEN
+wandb init -p $WANDB_PROJECT
+accelerate launch --config_file ./src/accelerate/fsdp.yaml main.py \
+    --lr $LEARNING_RATE \
+    --warmup_steps 100 \
+    --model_name $MODEL_ID \
+    --data_name $DATASET \
+    --num_train_epochs $EPOCH \
+    --max_samples $MAX_SAMPLES \
+    --prompt_max_length 128 \
+    --response_max_length 2048 \
+    --per_device_train_batch_size 4 \
+    --per_device_eval_batch_size 4 \
+    --gradient_accumulation_steps 1 \
+    --num_proc 1
+cd $OUTPUT
+cd */
+huggingface-cli login --token $TOKEN
+huggingface-cli upload $NEW_MODEL . .

scripts/run_mistral_orpo_beta.sh ADDED Viewed

	@@ -0,0 +1,20 @@

+#!/bin/bash
+# Mistral-ORPO series are trained on 4 * A100s
+accelerate launch --config_file ./src/accelerate/fsdp.yaml main.py \
+    --lr 5e-6 \
+    --lr_scheduler_type inverse_sqrt \
+    --alpha 0.1 \
+    --torch_compile False \
+    --warmup_steps 200 \
+    --model_name mistralai/Mistral-7B-v0.1 \
+    --data_name argilla/ultrafeedback-binarized-preferences-cleaned \
+    --num_train_epochs 5 \
+    --prompt_max_length 1792 \
+    --response_max_length 2048 \
+    --per_device_train_batch_size 8 \
+    --per_device_eval_batch_size 8 \
+    --gradient_accumulation_steps 1 \
+    --num_proc 8 \
+    --flash_attention_2

scripts/run_mistral_orpo_capybara.sh ADDED Viewed

	@@ -0,0 +1,22 @@

+#!/bin/bash
+# Mistral-ORPO series are trained on 4 * A100s
+accelerate launch --config_file ./src/accelerate/fsdp.yaml main.py \
+    --lr 5e-6 \
+    --torch_compile False \
+    --alpha 0.05 \
+    --lr_scheduler_type inverse_sqrt \
+    --cache_dir /projects/hf_cache/ \
+    --warmup_steps 100 \
+    --model_name mistralai/Mistral-7B-v0.1 \
+    --data_name argilla/distilabel-capybara-dpo-7k-binarized \
+    --num_train_epochs 3 \
+    --optim adamw_bnb_8bit \
+    --gradient_accumulation_steps 1 \
+    --prompt_max_length 1792 \
+    --response_max_length 2048 \
+    --per_device_train_batch_size 8 \
+    --per_device_eval_batch_size 8 \
+    --num_proc 8 \
+    --flash_attention_2

src/accelerate/ds2.yaml ADDED Viewed

	@@ -0,0 +1,21 @@

+compute_environment: LOCAL_MACHINE
+debug: false
+deepspeed_config:
+  gradient_accumulation_steps: 1
+  offload_optimizer_device: none
+  offload_param_device: none
+  zero3_init_flag: false
+  zero_stage: 2
+distributed_type: DEEPSPEED
+downcast_bf16: 'no'
+machine_rank: 0
+main_training_function: main
+mixed_precision: bf16
+num_machines: 1
+num_processes: 2
+rdzv_backend: static
+same_network: true
+tpu_env: []
+tpu_use_cluster: false
+tpu_use_sudo: false
+use_cpu: false

src/args.py ADDED Viewed

	@@ -0,0 +1,34 @@

+def default_args(parser):
+    parser.add_argument("--cache_dir", default=None, type=str)
+    parser.add_argument("--save_dir", default='./saved', type=str)
+    parser.add_argument("--data_name", default='HuggingfaceH4/UltraFeedback', type=str)
+    parser.add_argument("--model_name", default="gpt2", type=str)
+    # Training Arguments
+    parser.add_argument("--torch_compile", default=False, type=bool)
+    parser.add_argument("--flash_attention_2", action='store_true')
+    parser.add_argument("--lr_scheduler_type", default="cosine", type=str)
+    parser.add_argument("--optim", default="paged_adamw_32bit", type=str)
+    parser.add_argument("--overwrite_output_dir", default=True, type=bool)
+    parser.add_argument("--lr", default=2e-5, type=float)
+    parser.add_argument("--num_proc", default=1, type=int)
+    parser.add_argument("--num_train_epochs", default=10, type=int)
+    parser.add_argument("--per_device_train_batch_size", default=2, type=int)
+    parser.add_argument("--per_device_eval_batch_size", default=2, type=int)
+    parser.add_argument("--warmup_steps", default=5000, type=int)
+    parser.add_argument("--evaluation_strategy", default='epoch', type=str)
+    parser.add_argument("--do_eval", action='store_true')
+    parser.add_argument("--gradient_accumulation_steps", default=1, type=int)
+    parser.add_argument("--save_strategy", default='epoch', type=str)
+    parser.add_argument("--prompt_max_length", default=256, type=int)
+    parser.add_argument("--response_max_length", default=1024, type=int)
+    parser.add_argument("--alpha", default=1.0, type=float, help="Hyperparameter for weighting L_OR")
+    # Wandb Configurations
+    parser.add_argument("--wandb_entity", default=None, type=str)
+    parser.add_argument("--wandb_project_name", default=None, type=str)
+    args = parser.parse_args()
+    return args

src/orpo_trainer.py ADDED Viewed

	@@ -0,0 +1,83 @@

+import torch
+import wandb
+from transformers import Trainer
+class ORPOTrainer(Trainer):
+    def __init__(self, alpha, pad, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.pad = pad
+        self.alpha = alpha
+        self.loss_fct = torch.nn.CrossEntropyLoss(reduction='none')
+        print("Pad Token ID: ", self.pad)
+    def compute_custom_loss(self, logits, labels):
+        logits = logits.contiguous()
+        if labels is not None:
+            # move labels to correct device to enable model parallelism
+            labels = labels.to(logits.device)
+            # Shift so that tokens < n predict n
+            shift_logits = logits[..., :-1, :].contiguous()
+            shift_labels = labels[..., 1:].contiguous()
+            # Flatten the tokens
+            loss = self.loss_fct(shift_logits.transpose(2, 1), shift_labels).mean(dim=-1)
+        return loss
+    def compute_logps(self, prompt_attention_mask, chosen_inputs, chosen_attention_mask, logits):
+        mask = chosen_attention_mask[:, :-1] - prompt_attention_mask[:, 1:]
+        per_token_logps = torch.gather(logits[:, :-1, :].log_softmax(-1), dim=2,
+                                       index=(mask * chosen_inputs[:, 1:]).unsqueeze(2)).squeeze(2)
+        return torch.mul(per_token_logps, mask.to(dtype=torch.bfloat16)).sum(dim=1).to(dtype=torch.float64) / mask.sum(dim=1).to(dtype=torch.float64)
+    def compute_loss(self, model, inputs, return_outputs=False):
+        if self.label_smoother is not None and "labels" in inputs:
+            labels = inputs.pop("labels")
+        else:
+            labels = None
+        # Generate the hidden states for 'chosen' and 'reject'
+        neg_labels = inputs['negative_input_ids'].clone()
+        pos_labels = inputs['positive_input_ids'].clone()
+        neg_labels[neg_labels == self.pad] = -100
+        pos_labels[pos_labels == self.pad] = -100
+        outputs_neg = model(**{'input_ids': inputs['negative_input_ids'],
+                               'attention_mask': inputs['negative_attention_mask'],
+                               'labels': neg_labels,}, output_hidden_states=True)
+        outputs_pos = model(**{'input_ids': inputs['positive_input_ids'],
+                               'attention_mask': inputs['positive_attention_mask'],
+                               'labels': pos_labels,}, output_hidden_states=True)
+        # Calculate NLL loss
+        pos_loss = self.compute_custom_loss(logits=outputs_pos.logits, labels=inputs['positive_input_ids'])
+        # Calculate Log Probability
+        pos_prob = self.compute_logps(prompt_attention_mask=inputs['attention_mask'],
+                                      chosen_inputs=inputs['positive_input_ids'],
+                                      chosen_attention_mask=inputs['positive_attention_mask'],
+                                      logits=outputs_pos.logits)
+        neg_prob = self.compute_logps(prompt_attention_mask=inputs['attention_mask'],
+                                      chosen_inputs=inputs['negative_input_ids'],
+                                      chosen_attention_mask=inputs['negative_attention_mask'],
+                                      logits=outputs_neg.logits)
+        # Calculate log odds
+        log_odds = (pos_prob - neg_prob) - (torch.log(1 - torch.exp(pos_prob)) - torch.log(1 - torch.exp(neg_prob)))
+        sig_ratio = torch.nn.functional.sigmoid(log_odds)
+        ratio = torch.log(sig_ratio)
+        # Calculate the Final Loss
+        loss = torch.mean(pos_loss - self.alpha * ratio).to(dtype=torch.bfloat16)
+        wandb.log({'Positive Geometric Mean': torch.mean(pos_prob).item(),
+                   'Negative Geometric Mean': torch.mean(neg_prob).item(),
+                   'Log Odds Ratio': torch.mean(ratio).item(),
+                   'Log Odds': torch.mean(log_odds).item()})
+        return (loss, outputs_pos) if return_outputs else loss

src/utils.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from typing import List
+def preprocess_logits_for_metrics(logits, labels):
+    if isinstance(logits, tuple):
+        logits = logits[0]
+    return logits.argmax(dim=-1)
+def dataset_split_selector(data) -> List:
+    """
+    This is a function for automating the process of selecting data split.
+    Will be further updated.
+    """
+    if len(data.keys()) == 1:
+        return ['train']
+    else:
+        if 'train_prefs' in data.keys():
+            return ['train_prefs', 'test_prefs']
+        else:
+            return ['train', 'test']

trl/test_orpo_trainer_demo.py ADDED Viewed

	@@ -0,0 +1,95 @@

+from dataclasses import dataclass, field
+from typing import Optional
+import os
+import torch
+from datasets import load_dataset
+from tqdm import tqdm
+from transformers import AutoTokenizer, HfArgumentParser, pipeline
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from trl import ORPOConfig, ORPOTrainer, set_seed
+from trl.core import LengthSampler
+# This code is built on top of the example code from Huggingface TRL Team
+tqdm.pandas()
+@dataclass
+class ScriptArguments:
+    model_name: Optional[str] = field(default="microsoft/phi-2", metadata={"help": "the model name"})
+    optim: Optional[str] = field(default="adamw_torch", metadata={"help": "the model name"})
+    data_name: Optional[str] = field(default="argilla/ultrafeedback-binarized-preferences-cleaned", metadata={"help": "the model name"})
+    cache_dir: Optional[str] = field(default="", metadata={"help": "the model name"})
+    log_with: Optional[str] = field(default='wandb', metadata={"help": "use 'wandb' to log with wandb"})
+    output_dir: Optional[str] = field(default='', metadata={"help": "use 'wandb' to log with wandb"})
+    learning_rate: Optional[float] = field(default=1.41e-5, metadata={"help": "the learning rate"})
+    lr_scheduler_type: Optional[str] = field(default='cosine', metadata={"help": "the learning rate scheduler"})
+    per_device_train_batch_size: Optional[int] = field(default=4, metadata={"help": "the batch size"})
+    num_train_epochs: Optional[int] = field(default=5, metadata={"help": "the batch size"})
+    beta: Optional[float] = field(default=0.25, metadata={"help": "weighting hyperparameter for L_OR"})
+    gradient_accumulation_steps: Optional[int] = field(
+        default=1, metadata={"help": "the number of gradient accumulation steps"}
+    )
+parser = HfArgumentParser(ScriptArguments)
+script_args = parser.parse_args_into_dataclasses()[0]
+config = ORPOConfig(
+    output_dir=script_args.output_dir,
+    max_prompt_length=1024,
+    max_length=2048,
+    logging_steps=100,
+    save_strategy='no',
+    max_completion_length=2048,
+    per_device_train_batch_size=script_args.per_device_train_batch_size,
+    remove_unused_columns=False,
+    gradient_accumulation_steps=script_args.gradient_accumulation_steps,
+    learning_rate=script_args.learning_rate,
+    optim=script_args.optim,
+    lr_scheduler_type=script_args.lr_scheduler_type,
+    gradient_checkpointing=True,
+    gradient_checkpointing_kwargs={'use_reentrant':True},
+    beta=script_args.beta,
+    report_to='wandb',
+    num_train_epochs=script_args.num_train_epochs,
+    bf16=True,
+    do_eval=False
+)
+model = AutoModelForCausalLM.from_pretrained(script_args.model_name,
+                                             cache_dir=script_args.cache_dir,
+                                             attn_implementation='flash_attention_2',
+                                             torch_dtype=torch.bfloat16)
+tokenizer = AutoTokenizer.from_pretrained(script_args.model_name,
+                                          cache_dir=script_args.cache_dir)
+tokenizer.pad_token_id = tokenizer.eos_token_id
+tokenizer.chat_template = "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
+def build_dataset(tokenizer):
+    ds_train = load_dataset(script_args.data_name, split="train",
+                            cache_dir=script_args.cache_dir)
+    def chat_template_to_text(sample):
+        sample["chosen"] = [item_chosen[1]['content'] for item_chosen in sample['chosen']]
+        sample["rejected"] = [item_rejected[1]['content'] for item_rejected in sample['rejected']]
+        sample['prompt'] = [tokenizer.apply_chat_template([{'role': 'user', 'content': item_prompt}], tokenize=False, add_generation_prompt=True) for item_prompt in sample['prompt']]
+        return sample
+    ds_train = ds_train.map(chat_template_to_text, batched=True, num_proc=8)
+    return ds_train
+train = build_dataset(tokenizer=tokenizer)
+trainer = ORPOTrainer(
+                model=model,
+                args=config,
+                tokenizer=tokenizer,
+                train_dataset=train
+            )
+trainer.train()