license: apache-2.0
datasets:
- pankajmathur/WizardLM_Orca
- teknium/trismegistus-project
- unalignment/toxic-dpo-v0.1
- Intel/orca_dpo_pairs
language:
- en
pipeline_tag: text-generation
Mistral 7b Reverse Instruct
This model is sft (LoRA) fine tuned to reverse engineer the original prompt of a given LLM output/response.
Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay").
- base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
- base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)
For convinience the latest model export is provided under /latest_model_export as well as gguf quantized versions under /latest_ggml_models
Response Format
"[INST]\n### System:\n{system}\n### Instruction:\n{instruction}\n[/INST]\n"
- Grammar File: inst_format.gbnf
Prompt Template
"\n### System:\nYou craft instructions for generating the given output through reverse engineering.\n### Instruction:\nDecipher the steps used to produce the given output and articulate a refined set of instructions (System & Instruction).\n### OUTPUT:\n {output}"
(use the template without the " ")
Training Dataset
About 21k items of the following datasets were used. (mostly coding-like tasks were removed)
- v1 & v2: reverse-instruct_v1.json
- v3: reverse-instruct_v2.json
The reverse instruct dataset has been compiled with entries from the following datasets:
Training Procedure
!cd LLaMA-Factory && WANDB_DISABLED=True PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256 accelerate launch \
--multi_gpu \
--mixed_precision fp16 \
--num_processes 2 \
--num_machines 1 \
--rdzv_backend static \
--same_network \
--gpu_ids all \
--machine_rank 0 \
--main_training_function main \
-- src/train_bash.py \
--stage sft \
--model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
--adapter_name_or_path path_to_checkpoint \
--flash_attn \
--neftune_noise_alpha 5 \
--do_train \
--dataset default \
--template vanilla \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--output_dir path_to_sft_checkpoint \
--overwrite_cache \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 10 \
--save_total_limit 3 \
--learning_rate 5e-5 \
--num_train_epochs 9.0 \
--plot_loss \
--fp16 \
--overwrite_output_dir \
--cutoff_len 4096 \
--quantization_bit 4
Training Time
- v1: ~12h on Kaggle's P100 GPU
- v2: >30h on Kaggle's T4 x2
- v3: coming soon
Framework versions
- LLaMA-Factory