|
--- |
|
language: |
|
- en |
|
library_name: peft |
|
pipeline_tag: text-generation |
|
tags: |
|
- Mistral |
|
license: llama2 |
|
model-index: |
|
- name: SpeechlessCoder |
|
results: |
|
- task: |
|
type: text-generation |
|
dataset: |
|
type: openai_humaneval |
|
name: HumanEval |
|
metrics: |
|
- name: pass@1 |
|
type: pass@1 |
|
value: 0.0 |
|
verified: false |
|
--- |
|
|
|
# Mistral-7b-OpenOrca-lora |
|
|
|
**This is a test.** |
|
|
|
|
|
This LoRA model is extracted from the efficient parameter fine-tuned model ([Mistral-7B-OpenOra](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)), and now it needs to be verified whether this LoRA model can achieve comparable performance with the original model. |
|
|
|
The final goal is to create a toolkit that can simultaneously load multiple LoRA modules, and automatically switch to the appropriate combination of LoRA modules based on user queries to generate the best answer. |
|
|
|
The lora merged model is [here](https://huggingface.co/uukuguy/Mistral-7B-OpenOrca-lora-merged) |
|
|
|
The source code is [here](https://github.com/uukuguy/multi_loras) |
|
|
|
## Mistral-7B-OpenOrca |
|
|
|
- Extract lora model [Mistral-7B-OpenOrca-lora](https://huggingface.co/uukuguy/Mistral-7B-OpenOrca-lora) from [Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca); |
|
|
|
- Merge the base model [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with lora model to [Mistral-7B-OpenOrca-lora-merged](https://huggingface.co/uukuguy/Mistral-7B-OpenOrca-lora-merged) |
|
|
|
- LLM Evaluation ... |
|
|
|
### Local Test |
|
|
|
| | ARC_acc_norm (25-shot) | HellaSwag_acc_norm (10-shot) | MMLU_acc (5-shot) | TruthfulQA_mc2 (0-shot) | GSM8K_acc (8-shot) | Open LLM Score | |
|
| ------ | ------ | ------ | ------ | ------ | ------ | ------ | |
|
| Mistral-7B-OpenOrca | **71** | 83 | 61.42 | 45 | 40 | 65.11 | |
|
| **r=256** | 68 | **84** | **64.28** | 46.953 | **41** | **65.81** | |
|
| r=64 | 67 | 84 | 64.26 | **47.32** | **41** | 65.65 | |
|
| *r=16* | *65* | *83* | *62.84* | *46.95* | *38* | *64.45* | |
|
|
|
### Open LLM Leaderboard |
|
| | ARC_acc_norm (25-shot) | HellaSwag_acc_norm (10-shot) | MMLU_acc (5-shot) | TruthfulQA_mc2 (0-shot) | Open LLM Score | |
|
| ------ | ------ | ------ | ------ | ------ | ------ | |
|
| Mistral-7B-SlimOrca | 62.54 | 83.86 | **62.77** | **54.23** | **65.85** | |
|
| Mistral-7B-OpenOrca | **64.08** | **83.99** | 62.24 | 53.05 | 65.84 | |
|
|
|
|
|
## lm-evaluation-harness |
|
|
|
[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
|
|
| Metric | Mistral-7B-OpenOrca | Mistral-7B-OpenOrca-lora| Mistral-7B-OpenOrca-lora-merged | |
|
| --- | --- |--- | --- | |
|
| ARC | 64.08 | | | |
|
| HellaSwag | 83.99 | | | |
|
| MMLU | 62.24 | | | |
|
| TruthfulQA | 53.05 | | | |
|
| Average | 65.84 | | | |
|
|
|
## HumanEval |
|
|
|
| Metric | Mistral-7B-OpenOrca | Mistral-7B-OpenOrca-lora|Mistral-7B-OpenOrca-lora-merged | |
|
| --- | --- | --- | --- | |
|
| humaneval-python | 35.976 | | | |
|
|
|
|
|
## Training procedure |
|
|
|
The following `bitsandbytes` quantization config was used during training: |
|
- quant_method: bitsandbytes |
|
- load_in_8bit: False |
|
- load_in_4bit: True |
|
- llm_int8_threshold: 6.0 |
|
- llm_int8_skip_modules: None |
|
- llm_int8_enable_fp32_cpu_offload: False |
|
- llm_int8_has_fp16_weight: False |
|
- bnb_4bit_quant_type: nf4 |
|
- bnb_4bit_use_double_quant: True |
|
- bnb_4bit_compute_dtype: bfloat16 |
|
### Framework versions |
|
|
|
|
|
- PEFT 0.5.0 |
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_uukuguy__Mistral-7B-OpenOrca-lora) |
|
|
|
| Metric | Value | |
|
|-----------------------|---------------------------| |
|
| Avg. | 50.72 | |
|
| ARC (25-shot) | 61.95 | |
|
| HellaSwag (10-shot) | 83.62 | |
|
| MMLU (5-shot) | 64.16 | |
|
| TruthfulQA (0-shot) | 42.74 | |
|
| Winogrande (5-shot) | 79.08 | |
|
| GSM8K (5-shot) | 17.29 | |
|
| DROP (3-shot) | 6.19 | |
|
|