File size: 3,906 Bytes
69f49fd c5dac99 532ff8b c5dac99 69f49fd c94e830 05aa199 c94e830 532ff8b 56c307e 05aa199 c5dac99 e3492bc c5dac99 fd34c15 c5dac99 532ff8b c5dac99 532ff8b c5dac99 532ff8b 96aa9b7 c5dac99 532ff8b c5dac99 532ff8b c5dac99 532ff8b c5dac99 346e4be 532ff8b c5dac99 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
---
license: apache-2.0
datasets:
- pankajmathur/WizardLM_Orca
- teknium/trismegistus-project
- unalignment/toxic-dpo-v0.1
- Intel/orca_dpo_pairs
language:
- en
pipeline_tag: text-generation
---
## Mistral 7b Reverse Instruct
This model is sft (LoRA) fine tuned to reverse engineer the original prompt of a given LLM output/response.
Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay").
- base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
- base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)
For convinience the latest model export is provided under [/latest_model_export](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/tree/main/latest_model_export) as well as gguf quantized versions under [/latest_ggml_models](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/tree/main/latest_ggml_models)
## Response Format
"[INST]\n### System:\n{system}\n### Instruction:\n{instruction}\n[/INST]\n"
- Grammar File: [inst_format.gbnf](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/inst_format.gbnf)
## Prompt Template
"\n### System:\nYou craft instructions for generating the given output through reverse engineering.\n### Instruction:\nDecipher the steps used to produce the given output and articulate a refined set of instructions (System & Instruction).\n### OUTPUT:\n {output}"
(use the template without the " ")
## Training Dataset
About 21k items of the following datasets were used. (mostly coding-like tasks were removed)
- v1 & v2: [reverse-instruct_v1.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v1.json)
- v3: [reverse-instruct_v2.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v2.json)
The reverse instruct dataset has been compiled with entries from the following datasets:
- [alpaca_gpt4_data](https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json)
- [roleplay-instruct-v2.1](https://raw.githubusercontent.com/teknium1/GPTeacher/main/Roleplay%20Supplemental/roleplay-instruct-v2.1.json)
- [wizardlm_orca](https://huggingface.co/datasets/pankajmathur/WizardLM_Orca/resolve/main/wizardlm_orca.json)
- [toxic-dpo-v0.1](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1/resolve/main/toxic-dpo.parquet)
- [orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs/resolve/main/orca_rlhf.jsonl)
- [occultexpert](https://huggingface.co/datasets/teknium/trismegistus-project/resolve/main/occultexpert.json)
## Training Procedure
```bash
!cd LLaMA-Factory && WANDB_DISABLED=True PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256 accelerate launch \
--multi_gpu \
--mixed_precision fp16 \
--num_processes 2 \
--num_machines 1 \
--rdzv_backend static \
--same_network \
--gpu_ids all \
--machine_rank 0 \
--main_training_function main \
-- src/train_bash.py \
--stage sft \
--model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
--adapter_name_or_path path_to_checkpoint \
--flash_attn \
--neftune_noise_alpha 5 \
--do_train \
--dataset default \
--template vanilla \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--output_dir path_to_sft_checkpoint \
--overwrite_cache \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 10 \
--save_total_limit 3 \
--learning_rate 5e-5 \
--num_train_epochs 9.0 \
--plot_loss \
--fp16 \
--overwrite_output_dir \
--cutoff_len 4096 \
--quantization_bit 4
```
## Training Time
- v1: ~12h on Kaggle's P100 GPU
- v2: >30h on Kaggle's T4 x2
- v3: coming soon
### Framework versions
- LLaMA-Factory |