File size: 3,438 Bytes
4ff23fc
 
d65dc36
 
 
 
 
 
 
 
 
 
 
 
 
4ff23fc
d65dc36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e14c233
d65dc36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: gpl-3.0
datasets:
- JosephusCheung/GuanacoDataset
- yahma/alpaca-cleaned
language:
- en
- zh
- ja
tags:
- llama
- guanaco
- alpaca
- lora
- finetune
---

# Guanaco-leh-V2: A Multilingual Instruction-Following Language Model Based on LLaMA 7B
This model is trained with [guanaco-lora](https://github.com/KohakuBlueleaf/guanaco-lora) with lora + embed_tokens + lm_head be trained.

The dataset is from [alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) and [guanaco](https://huggingface.co/datasets/JosephusCheung/GuanacoDataset).
With trained embed and head, the model perform better at Chinese and Japanese then original LLaMA, and with instruction based prompt. You can use this model more easily.

Since this model is trained by guanaco dataset, you can also use this as chatbot. just use this format:
```
### Instruction:
User: <Message history>
Assistant: <Message history>

### Input:
System: <System response for next message, optional>
User: <Next message>

### Response:
```

**Tips: I just removed the first line of original prompt to reduce token comsumption, plz consider remove it when you want to use this model**

## Difference between previous model
The main differences are:
* model is trained on bf16 not 8bit
* ctx cut off length increased to 1024
* use larger dataset (latest guanaco + alpaca cleand = 540k entries)
* use larger batch size (64->128)

And since the train data has more chat-based data.
This model is more fit in chatbot usage.


## Try this model:
You can try this model with this [colab](https://colab.research.google.com/drive/1nn6TCAKyFrgDEgA6X3o3YbxfbMm8Skp4).
Or using generate.py in the [guanaco-lora](https://github.com/KohakuBlueleaf/guanaco-lora), all the examples are generated by guanaco-lora.

If you want to use the lora model from guanaco-7b-leh-v2-adapter/ , remember to turn off the load_in_8bit, or manually merge it into 7B model!

### Recommend Generation parameters:
* temperature: 0.5~0.7
* top p: 0.65~1.0
* top k: 30~50
* repeat penalty: 1.03~1.17


## Training Setup
* 2x3090 with model parallel
* batch size = bsz 8 * grad acc 16 = 128
* ctx cut off length = 1024
* only train on output (with loss mask)
* enable group of len
* 538k entries, 2epoch (about 8400 step)
* lr 2e-4


## Some Example
(As you can see, although guanaco can reply fluently, the content is quite confusing. So you may want to add some thing in the system part.)
![](https://i.imgur.com/Hxyf3tR.png)
![](https://i.imgur.com/Mu06jxn.png)

I use guanaco with instruction to let it translate a chinese article to JP/DE/EN.
And use gpt-4 to scoring them and get this:
![](https://i.imgur.com/NfFQbZ2.png)

## Some more information

### Why use lora+embed+head
First, I think it is obvious that when a LLM isn't good at some language and you want to ft for it. You should train the embed and head part.<br>
But the question is: "Why not just native finetune?"<br>
If you have searched for some alpaca model or training thing, you may notice that lot of them has 1 problem: "memorize".<br>
The loss will drop at the begin of every epoch, just like some kind of "overfit".<br>
And in my opinion, this is because that the number of params of LLaMA is too large. So it just memorize all the training data.

But if I use lora for attention part(ignore MLP part), the param number is not large enough for "memorizing training data", so it is more unlikely to memorize all the things.