metadata

language:
  - en
library_name: peft
pipeline_tag: text-generation
tags:
  - Mistral
license: llama2
model-index:
  - name: SpeechlessCoder
    results:
      - task:
          type: text-generation
        dataset:
          type: openai_humaneval
          name: HumanEval
        metrics:
          - name: pass@1
            type: pass@1
            value: 0
            verified: false

Mistral-7b-OpenOrca-lora

This is a test.

This LoRA model is extracted from the efficient parameter fine-tuned model (Mistral-7B-OpenOra), and now it needs to be verified whether this LoRA model can achieve comparable performance with the original model.

The final goal is to create a toolkit that can simultaneously load multiple LoRA modules, and automatically switch to the appropriate combination of LoRA modules based on user queries to generate the best answer.

The lora merged model is here

The source code is here

Mistral-7B-OpenOrca

Extract lora model Mistral-7B-OpenOrca-lora from Mistral-7B-OpenOrca;
Merge the base model Mistral-7B-v0.1 with lora model to Mistral-7B-OpenOrca-lora-merged
LLM Evaluation ...

Local Test

	ARC_acc_norm (25-shot)	HellaSwag_acc_norm (10-shot)	MMLU_acc (5-shot)	TruthfulQA_mc2 (0-shot)	GSM8K_acc (8-shot)	Open LLM Score
Mistral-7B-OpenOrca	71	83	61.42	45	40	65.11
r=256	68	84	64.28	46.953	41	65.81
r=64	67	84	64.26	47.32	41	65.65
r=16	65	83	62.84	46.95	38	64.45

Open LLM Leaderboard

	ARC_acc_norm (25-shot)	HellaSwag_acc_norm (10-shot)	MMLU_acc (5-shot)	TruthfulQA_mc2 (0-shot)	Open LLM Score
Mistral-7B-SlimOrca	62.54	83.86	62.77	54.23	65.85
Mistral-7B-OpenOrca	64.08	83.99	62.24	53.05	65.84

lm-evaluation-harness

Open LLM Leaderboard

Metric	Mistral-7B-OpenOrca	Mistral-7B-OpenOrca-lora	Mistral-7B-OpenOrca-lora-merged
ARC	64.08
HellaSwag	83.99
MMLU	62.24
TruthfulQA	53.05
Average	65.84

HumanEval

Metric	Mistral-7B-OpenOrca	Mistral-7B-OpenOrca-lora	Mistral-7B-OpenOrca-lora-merged
humaneval-python	35.976

Training procedure

The following bitsandbytes quantization config was used during training:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: bfloat16

Framework versions

PEFT 0.5.0

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	50.72
ARC (25-shot)	61.95
HellaSwag (10-shot)	83.62
MMLU (5-shot)	64.16
TruthfulQA (0-shot)	42.74
Winogrande (5-shot)	79.08
GSM8K (5-shot)	17.29
DROP (3-shot)	6.19