File size: 2,450 Bytes
4a34f4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36fa8d1
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
该模型使用llama-13b,使用UltraChat数据集进行指令微调,约140万多轮对话数据。仅需一张显卡即可完成训练。

firefly-llama-13b在🤗Hugging Face的Open LLM榜单上进行了客观的评测。

在榜单上,firefly-llama-13b取得了不错的效果,比vicuna-13b-1.1略高0.2分,比llama-2-13b-chat略低0.5分,比vicuna-13b-v1.3略低0.6分。从评测分数来看,firefly-llama-13b与vicuna-13b、llama-2-13b-chat的水平非常接近😎。

| 模型                           | Average  | ARC                 | HellaSwag | MMLU | TruthfulQA (MC)   | 
|--------------------------------------------------------------------------------|-------|----------------------|------------|------------|------|
| Llama-2-70b-chat-hf        | 66.8 | 64.6                 | 85.9         | 63.9        | 52.8 | 
| vicuna-13b-v1.3        | 60 | 54.6                 | 80.4        |    52.9        | 52.1 |
| Llama-2-13b-chat-hf | 59.9 | 59                 | 81.9         |   54.6         | 44.1 | 
| firefly-llama-13b      |59.4 | 59 | 79.7         |   49.1         | 49.6 | 
| vicuna-13b-1.1   | 59.2 | 52.7           | 80.1         |51.9     | 52.1 |        
| guanaco-13B-HF       | 59.1    | 57.8  | 83.8         |48.3     | 46.7|         

值得注意的是,vicuna-13b模型采用的是全量参数微调,对训练资源的要求十分高。而firefly-llama-13b采用的则是QLoRA微调,最少仅需16G显存,即可对13B的模型进行微调。

详细介绍见文章:[Firefly单卡复刻Vicuna-13B,Open LLM榜单🤗略高0.2分](https://mp.weixin.qq.com/s/QG2YMo_QxaxS_Rr2yJrIeA)

更多详情见[Firefly项目](https://github.com/yangjianxin1/Firefly)

[Open LLM排行榜](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_YeungNLP__firefly-llama-13b)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 49.12   |
| ARC (25-shot)         | 58.96          |
| HellaSwag (10-shot)   | 79.71    |
| MMLU (5-shot)         | 49.1         |
| TruthfulQA (0-shot)   | 49.59   |
| Winogrande (5-shot)   | 75.61   |
| GSM8K (5-shot)        | 8.19        |
| DROP (3-shot)         | 22.69         |