File size: 3,906 Bytes
69f49fd
 
c5dac99
 
532ff8b
 
 
c5dac99
 
 
69f49fd
c94e830
05aa199
c94e830
532ff8b
56c307e
 
 
05aa199
 
c5dac99
e3492bc
c5dac99
 
 
 
 
fd34c15
 
c5dac99
 
 
 
 
 
 
 
 
 
 
532ff8b
 
 
 
 
 
 
 
 
 
 
c5dac99
 
 
 
532ff8b
 
 
 
 
 
 
 
 
 
 
c5dac99
532ff8b
 
96aa9b7
 
c5dac99
 
 
 
 
 
 
 
 
 
 
532ff8b
 
c5dac99
532ff8b
c5dac99
 
 
532ff8b
c5dac99
 
 
 
 
346e4be
 
532ff8b
c5dac99
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
license: apache-2.0
datasets:
- pankajmathur/WizardLM_Orca
- teknium/trismegistus-project
- unalignment/toxic-dpo-v0.1
- Intel/orca_dpo_pairs
language:
- en
pipeline_tag: text-generation
---

## Mistral 7b Reverse Instruct

This model is sft (LoRA) fine tuned to reverse engineer the original prompt of a given LLM output/response.    
Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay").


- base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
- base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)

For convinience the latest model export is provided under [/latest_model_export](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/tree/main/latest_model_export) as well as gguf quantized versions under [/latest_ggml_models](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/tree/main/latest_ggml_models)

## Response Format

"[INST]\n### System:\n{system}\n### Instruction:\n{instruction}\n[/INST]\n"

- Grammar File: [inst_format.gbnf](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/inst_format.gbnf)


## Prompt Template

"\n### System:\nYou craft instructions for generating the given output through reverse engineering.\n### Instruction:\nDecipher the steps used to produce the given output and articulate a refined set of instructions (System & Instruction).\n### OUTPUT:\n {output}"

(use the template without the " ")

## Training Dataset

About 21k items of the following datasets were used. (mostly coding-like tasks were removed)

- v1 & v2: [reverse-instruct_v1.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v1.json)
- v3: [reverse-instruct_v2.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v2.json)

The reverse instruct dataset has been compiled with entries from the following datasets:

- [alpaca_gpt4_data](https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json)
- [roleplay-instruct-v2.1](https://raw.githubusercontent.com/teknium1/GPTeacher/main/Roleplay%20Supplemental/roleplay-instruct-v2.1.json)
- [wizardlm_orca](https://huggingface.co/datasets/pankajmathur/WizardLM_Orca/resolve/main/wizardlm_orca.json)
- [toxic-dpo-v0.1](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1/resolve/main/toxic-dpo.parquet)
- [orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs/resolve/main/orca_rlhf.jsonl)
- [occultexpert](https://huggingface.co/datasets/teknium/trismegistus-project/resolve/main/occultexpert.json)

## Training Procedure

```bash
!cd LLaMA-Factory && WANDB_DISABLED=True PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256 accelerate launch \
    --multi_gpu \
    --mixed_precision fp16 \
    --num_processes 2 \
    --num_machines 1 \
    --rdzv_backend static \
    --same_network \
    --gpu_ids all \
    --machine_rank 0 \
    --main_training_function main \
    --  src/train_bash.py  \
    --stage sft \
    --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
    --adapter_name_or_path path_to_checkpoint \
    --flash_attn \
    --neftune_noise_alpha 5 \
    --do_train \
    --dataset default \
    --template vanilla \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir path_to_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 10 \
    --save_total_limit 3 \
    --learning_rate 5e-5 \
    --num_train_epochs 9.0 \
    --plot_loss \
    --fp16 \
    --overwrite_output_dir \
    --cutoff_len 4096 \
    --quantization_bit 4
```

## Training Time

- v1: ~12h on Kaggle's P100 GPU
- v2: >30h on Kaggle's T4 x2
- v3: coming soon

### Framework versions

- LLaMA-Factory