Update README.md
Browse files
README.md
CHANGED
@@ -44,6 +44,7 @@ pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory
|
|
44 |
```
|
45 |
You can load the model directly from the Hugging Face model hub using
|
46 |
```python
|
|
|
47 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
48 |
|
49 |
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct")
|
@@ -51,7 +52,7 @@ model = AutoModelForCausalLM.from_pretrained("togethercomputer/Llama-2-7B-32K-In
|
|
51 |
trust_remote_code=True, torch_dtype=torch.float16)
|
52 |
input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
|
53 |
output = model.generate(input_ids, max_length=128,
|
54 |
-
temperature=0.7,
|
55 |
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
56 |
```
|
57 |
|
@@ -103,7 +104,9 @@ This poem captures the essence of cats, highlighting their beauty, independence,
|
|
103 |
We evaluate the model from three aspects: 1) [Alpaca Eval](https://tatsu-lab.github.io/alpaca_eval/);
|
104 |
2) [Rouge score over BookSum](https://together.ai/blog/Llama-2-7B-32K); and
|
105 |
3) [Accuracy over Multi-document Question Answering (MQA)](https://together.ai/blog/Llama-2-7B-32K).
|
106 |
-
We compare with models including
|
|
|
|
|
107 |
[Longchat-7b-16k](https://huggingface.co/lmsys/longchat-7b-16k)
|
108 |
and [Longchat-7b-v1.5-32k](https://huggingface.co/lmsys/longchat-7b-v1.5-32k).
|
109 |
We summarize the results below:
|
@@ -126,6 +129,7 @@ We summarize the results below:
|
|
126 |
| Llama-2-7B-Chat-hf | 0.055 | 0.008 | 0.046 |
|
127 |
| Longchat-7b-16k | 0.303 | 0.055 | 0.160 |
|
128 |
| Longchat-7b-v1.5-32k | 0.308 | 0.057 | 0.163 |
|
|
|
129 |
| Llama-2-7B-32K-Instruct (ours) | 0.336 | 0.076 | 0.184 |
|
130 |
|
131 |
* Accuracy over MQA
|
@@ -134,10 +138,9 @@ We summarize the results below:
|
|
134 |
| Llama-2-7B-Chat-hf | 0.384 | 0.375 | 0.313 |
|
135 |
| Longchat-7b-16k | 0.510 | 0.473 | 0.428 |
|
136 |
| Longchat-7b-v1.5-32k | 0.534 | 0.516 | 0.479 |
|
|
|
137 |
| Llama-2-7B-32K-Instruct (ours) | 0.622 | 0.604 | 0.589 |
|
138 |
|
139 |
-
We observe that our finetuned Llama-2-7B-32K-Instruct consistently outperforms other baseline models including Llama-2-7b-chat, Longchat-7b-16k and Longchat-7b-v1.5-32k.
|
140 |
-
|
141 |
## Limitations and Bias
|
142 |
|
143 |
As with all language models, Llama-2-7B-32K-Instruct may generate incorrect or biased content. It's important to keep this in mind when using the model.
|
|
|
44 |
```
|
45 |
You can load the model directly from the Hugging Face model hub using
|
46 |
```python
|
47 |
+
import torch
|
48 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
49 |
|
50 |
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct")
|
|
|
52 |
trust_remote_code=True, torch_dtype=torch.float16)
|
53 |
input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
|
54 |
output = model.generate(input_ids, max_length=128,
|
55 |
+
temperature=0.7, repetition_penalty=1.1, top_p=0.7, top_k=50)
|
56 |
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
57 |
```
|
58 |
|
|
|
104 |
We evaluate the model from three aspects: 1) [Alpaca Eval](https://tatsu-lab.github.io/alpaca_eval/);
|
105 |
2) [Rouge score over BookSum](https://together.ai/blog/Llama-2-7B-32K); and
|
106 |
3) [Accuracy over Multi-document Question Answering (MQA)](https://together.ai/blog/Llama-2-7B-32K).
|
107 |
+
We compare with models including
|
108 |
+
[GPT-3.5-Turbo-16K](https://platform.openai.com/docs/models/gpt-3-5),
|
109 |
+
[https://huggingface.co/meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),
|
110 |
[Longchat-7b-16k](https://huggingface.co/lmsys/longchat-7b-16k)
|
111 |
and [Longchat-7b-v1.5-32k](https://huggingface.co/lmsys/longchat-7b-v1.5-32k).
|
112 |
We summarize the results below:
|
|
|
129 |
| Llama-2-7B-Chat-hf | 0.055 | 0.008 | 0.046 |
|
130 |
| Longchat-7b-16k | 0.303 | 0.055 | 0.160 |
|
131 |
| Longchat-7b-v1.5-32k | 0.308 | 0.057 | 0.163 |
|
132 |
+
| GPT-3.5-Turbo-16K | 0.324 | 0.066 | 0.178 |
|
133 |
| Llama-2-7B-32K-Instruct (ours) | 0.336 | 0.076 | 0.184 |
|
134 |
|
135 |
* Accuracy over MQA
|
|
|
138 |
| Llama-2-7B-Chat-hf | 0.384 | 0.375 | 0.313 |
|
139 |
| Longchat-7b-16k | 0.510 | 0.473 | 0.428 |
|
140 |
| Longchat-7b-v1.5-32k | 0.534 | 0.516 | 0.479 |
|
141 |
+
| GPT-3.5-Turbo-16K | 0.622 | 0.609 | 0.577 |
|
142 |
| Llama-2-7B-32K-Instruct (ours) | 0.622 | 0.604 | 0.589 |
|
143 |
|
|
|
|
|
144 |
## Limitations and Bias
|
145 |
|
146 |
As with all language models, Llama-2-7B-32K-Instruct may generate incorrect or biased content. It's important to keep this in mind when using the model.
|