language:
- en
library_name: peft
pipeline_tag: text-generation
tags:
- Mistral
license: llama2
model-index:
- name: SpeechlessCoder
results:
- task:
type: text-generation
dataset:
type: openai_humaneval
name: HumanEval
metrics:
- name: pass@1
type: pass@1
value: 0
verified: false
Mistral-7b-OpenOrca-lora
This is a test.
This LoRA model is extracted from the efficient parameter fine-tuned model (Mistral-7B-OpenOra), and now it needs to be verified whether this LoRA model can achieve comparable performance with the original model.
The final goal is to create a toolkit that can simultaneously load multiple LoRA modules, and automatically switch to the appropriate combination of LoRA modules based on user queries to generate the best answer.
The lora merged model is here
The source code is here
Mistral-7B-OpenOrca
Extract lora model Mistral-7B-OpenOrca-lora from Mistral-7B-OpenOrca;
Merge the base model Mistral-7B-v0.1 with lora model to Mistral-7B-OpenOrca-lora-merged
LLM Evaluation ...
Local Test
ARC_acc_norm (25-shot) | HellaSwag_acc_norm (10-shot) | MMLU_acc (5-shot) | TruthfulQA_mc2 (0-shot) | GSM8K_acc (8-shot) | Open LLM Score | |
---|---|---|---|---|---|---|
Mistral-7B-OpenOrca | 71 | 83 | 61.42 | 45 | 40 | 65.11 |
r=256 | 68 | 84 | 64.28 | 46.953 | 41 | 65.81 |
r=64 | 67 | 84 | 64.26 | 47.32 | 41 | 65.65 |
r=16 | 65 | 83 | 62.84 | 46.95 | 38 | 64.45 |
Open LLM Leaderboard
ARC_acc_norm (25-shot) | HellaSwag_acc_norm (10-shot) | MMLU_acc (5-shot) | TruthfulQA_mc2 (0-shot) | Open LLM Score | |
---|---|---|---|---|---|
Mistral-7B-SlimOrca | 62.54 | 83.86 | 62.77 | 54.23 | 65.85 |
Mistral-7B-OpenOrca | 64.08 | 83.99 | 62.24 | 53.05 | 65.84 |
lm-evaluation-harness
Metric | Mistral-7B-OpenOrca | Mistral-7B-OpenOrca-lora | Mistral-7B-OpenOrca-lora-merged |
---|---|---|---|
ARC | 64.08 | ||
HellaSwag | 83.99 | ||
MMLU | 62.24 | ||
TruthfulQA | 53.05 | ||
Average | 65.84 |
HumanEval
Metric | Mistral-7B-OpenOrca | Mistral-7B-OpenOrca-lora | Mistral-7B-OpenOrca-lora-merged |
---|---|---|---|
humaneval-python | 35.976 |
Training procedure
The following bitsandbytes
quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
Framework versions
- PEFT 0.5.0
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 50.72 |
ARC (25-shot) | 61.95 |
HellaSwag (10-shot) | 83.62 |
MMLU (5-shot) | 64.16 |
TruthfulQA (0-shot) | 42.74 |
Winogrande (5-shot) | 79.08 |
GSM8K (5-shot) | 17.29 |
DROP (3-shot) | 6.19 |