bofenghuang commited on
Commit
3142af5
·
2 Parent(s): 336a168 e6b5ed2

Merge branch 'main' into v0.0

Browse files
Files changed (2) hide show
  1. README.md +123 -24
  2. tokenizer_config.json +1 -0
README.md CHANGED
@@ -1,11 +1,10 @@
1
  ---
2
- language:
3
- - fr
4
  pipeline_tag: text-generation
5
- library_name: transformers
6
  inference: false
7
  tags:
8
  - LLM
 
9
  - llama
10
  - llama-2
11
  ---
@@ -14,11 +13,11 @@ tags:
14
  <img src="https://huggingface.co/bofenghuang/vigogne-2-7b-chat/resolve/v2.0/logo_v2.jpg" alt="Vigogne" style="width: 30%; min-width: 300px; display: block; margin: auto;">
15
  </p>
16
 
17
- # Vigogne-2-7B-Chat-V2.0: A Llama-2 based French chat LLM
18
 
19
- Vigogne-2-7B-Chat-V2.0 is a French chat LLM, based on [LLaMA-2-7B](https://ai.meta.com/llama), optimized to generate helpful and coherent responses in user conversations.
20
 
21
- Check out our [blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) and [GitHub repository](https://github.com/bofenghuang/vigogne) for more information.
22
 
23
  **Usage and License Notices**: Vigogne-2-7B-Chat-V2.0 follows Llama-2's [usage policy](https://ai.meta.com/llama/use-policy). A significant portion of the training data is distilled from GPT-3.5-Turbo and GPT-4, kindly use it cautiously to avoid any violations of OpenAI's [terms of use](https://openai.com/policies/terms-of-use).
24
 
@@ -27,14 +26,60 @@ Check out our [blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023
27
  All previous versions are accessible through branches.
28
 
29
  - **V1.0**: Trained on 420K chat data.
30
- - **V2.0**: Trained on 520K data. Check out our [blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Usage
33
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ```python
 
35
  import torch
36
  from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, TextStreamer
37
- from vigogne.preprocess import generate_inference_chat_prompt
38
 
39
  model_name_or_path = "bofenghuang/vigogne-2-7b-chat"
40
  revision = "v2.0"
@@ -45,18 +90,22 @@ model = AutoModelForCausalLM.from_pretrained(model_name_or_path, revision=revisi
45
  streamer = TextStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
46
 
47
 
48
- def infer(
49
- utterances,
50
- system_message=None,
51
- temperature=0.1,
52
- top_p=1.0,
53
- top_k=0,
54
- repetition_penalty=1.1,
55
- max_new_tokens=1024,
56
  **kwargs,
57
  ):
58
- prompt = generate_inference_chat_prompt(utterances, tokenizer, system_message=system_message)
59
- input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(model.device)
 
 
 
 
60
  input_length = input_ids.shape[1]
61
 
62
  generated_outputs = model.generate(
@@ -68,26 +117,76 @@ def infer(
68
  top_k=top_k,
69
  repetition_penalty=repetition_penalty,
70
  max_new_tokens=max_new_tokens,
71
- eos_token_id=tokenizer.eos_token_id,
72
- pad_token_id=tokenizer.pad_token_id,
73
  **kwargs,
74
  ),
75
  streamer=streamer,
76
  return_dict_in_generate=True,
77
  )
 
78
  generated_tokens = generated_outputs.sequences[0, input_length:]
79
  generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
80
- return generated_text
 
 
 
81
 
82
 
83
- user_query = "Expliquez la différence entre DoS et phishing."
84
- infer([[user_query, ""]])
 
 
 
 
 
 
85
  ```
86
 
87
- You can utilize the Google Colab Notebook below for inferring with the Vigogne chat models.
88
 
89
  <a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ## Limitations
92
 
93
  Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.
 
1
  ---
2
+ language: fr
 
3
  pipeline_tag: text-generation
 
4
  inference: false
5
  tags:
6
  - LLM
7
+ - finetuned
8
  - llama
9
  - llama-2
10
  ---
 
13
  <img src="https://huggingface.co/bofenghuang/vigogne-2-7b-chat/resolve/v2.0/logo_v2.jpg" alt="Vigogne" style="width: 30%; min-width: 300px; display: block; margin: auto;">
14
  </p>
15
 
16
+ # Vigogne-2-7B-Chat-V2.0: A Llama-2-based French Chat LLM
17
 
18
+ Vigogne-2-7B-Chat-V2.0 is a French chat LLM, based on [LLaMA-2-7B](https://ai.meta.com/llama), optimized to generate helpful and coherent responses in conversations with users.
19
 
20
+ Check out our [release blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) and [GitHub repository](https://github.com/bofenghuang/vigogne) for more information.
21
 
22
  **Usage and License Notices**: Vigogne-2-7B-Chat-V2.0 follows Llama-2's [usage policy](https://ai.meta.com/llama/use-policy). A significant portion of the training data is distilled from GPT-3.5-Turbo and GPT-4, kindly use it cautiously to avoid any violations of OpenAI's [terms of use](https://openai.com/policies/terms-of-use).
23
 
 
26
  All previous versions are accessible through branches.
27
 
28
  - **V1.0**: Trained on 420K chat data.
29
+ - **V2.0**: Trained on 520K data. Check out our [release blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) for more details.
30
+
31
+ ## Prompt Template
32
+
33
+ We utilized prefix tokens `<user>:` and `<assistant>:` to distinguish between user and assistant utterances.
34
+
35
+ You can apply this formatting using the [chat template](https://huggingface.co/docs/transformers/main/chat_templating) through the `apply_chat_template()` method.
36
+
37
+ ```python
38
+ from transformers import AutoTokenizer
39
+
40
+ tokenizer = AutoTokenizer.from_pretrained("bofenghuang/vigogne-2-7b-chat")
41
+
42
+ conversation = [
43
+ {"role": "user", "content": "Bonjour ! Comment ça va aujourd'hui ?"},
44
+ {"role": "assistant", "content": "Bonjour ! Je suis une IA, donc je n'ai pas de sentiments, mais je suis prêt à vous aider. Comment puis-je vous assister aujourd'hui ?"},
45
+ {"role": "user", "content": "Quelle est la hauteur de la Tour Eiffel ?"},
46
+ {"role": "assistant", "content": "La Tour Eiffel mesure environ 330 mètres de hauteur."},
47
+ {"role": "user", "content": "Comment monter en haut ?"},
48
+ ]
49
+
50
+ print(tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True))
51
+ ```
52
+
53
+ You will get
54
+
55
+ ```
56
+ <s><|system|>: Vous êtes l'assistant IA nommé Vigogne, créé par Zaion Lab (https://zaion.ai). Vous suivez extrêmement bien les instructions. Aidez autant que vous le pouvez.
57
+ <|user|>: Bonjour ! Comment ça va aujourd'hui ?
58
+ <|assistant|>: Bonjour ! Je suis une IA, donc je n'ai pas de sentiments, mais je suis prêt à vous aider. Comment puis-je vous assister aujourd'hui ?</s>
59
+ <|user|>: Quelle est la hauteur de la Tour Eiffel ?
60
+ <|assistant|>: La Tour Eiffel mesure environ 330 mètres de hauteur.</s>
61
+ <|user|>: Comment monter en haut ?
62
+ <|assistant|>:
63
+ ```
64
 
65
  ## Usage
66
 
67
+ ### Inference using the quantized versions
68
+
69
+ The quantized versions of this model are generously provided by [TheBloke](https://huggingface.co/TheBloke)!
70
+
71
+ - AWQ for GPU inference: [TheBloke/Vigogne-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-AWQ)
72
+ - GTPQ for GPU inference: [TheBloke/Vigogne-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-GPTQ)
73
+ - GGUF for CPU+GPU inference: [TheBloke/Vigogne-2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-GGUF)
74
+
75
+ These versions facilitate testing and development with various popular frameworks, including [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [vLLM](https://github.com/vllm-project/vllm), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), and more.
76
+
77
+ ### Inference using the unquantized model with 🤗 Transformers
78
+
79
  ```python
80
+ from typing import Dict, List, Optional
81
  import torch
82
  from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, TextStreamer
 
83
 
84
  model_name_or_path = "bofenghuang/vigogne-2-7b-chat"
85
  revision = "v2.0"
 
90
  streamer = TextStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
91
 
92
 
93
+ def chat(
94
+ query: str,
95
+ history: Optional[List[Dict]] = None,
96
+ temperature: float = 0.7,
97
+ top_p: float = 1.0,
98
+ top_k: float = 0,
99
+ repetition_penalty: float = 1.1,
100
+ max_new_tokens: int = 1024,
101
  **kwargs,
102
  ):
103
+ if history is None:
104
+ history = []
105
+
106
+ history.append({"role": "user", "content": query})
107
+
108
+ input_ids = tokenizer.apply_chat_template(history, add_generation_prompt=True, return_tensors="pt").to(model.device)
109
  input_length = input_ids.shape[1]
110
 
111
  generated_outputs = model.generate(
 
117
  top_k=top_k,
118
  repetition_penalty=repetition_penalty,
119
  max_new_tokens=max_new_tokens,
120
+ pad_token_id=tokenizer.eos_token_id,
 
121
  **kwargs,
122
  ),
123
  streamer=streamer,
124
  return_dict_in_generate=True,
125
  )
126
+
127
  generated_tokens = generated_outputs.sequences[0, input_length:]
128
  generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
129
+
130
+ history.append({"role": "assistant", "content": generated_text})
131
+
132
+ return generated_text, history
133
 
134
 
135
+ # 1st round
136
+ response, history = chat("Un escargot parcourt 100 mètres en 5 heures. Quelle est sa vitesse ?", history=None)
137
+
138
+ # 2nd round
139
+ response, history = chat("Quand il peut dépasser le lapin ?", history=history)
140
+
141
+ # 3rd round
142
+ response, history = chat("Écris une histoire imaginative qui met en scène une compétition de course entre un escargot et un lapin.", history=history)
143
  ```
144
 
145
+ You can also use the Google Colab Notebook provided below.
146
 
147
  <a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
148
 
149
+ ### Inference using the unquantized model with vLLM
150
+
151
+ Set up an OpenAI-compatible server with the following command:
152
+
153
+ ```bash
154
+ # Install vLLM
155
+ # This may take 5-10 minutes.
156
+ # pip install vllm
157
+
158
+ # Start server for Vigogne-Chat models
159
+ python -m vllm.entrypoints.openai.api_server --model bofenghuang/vigogne-2-7b-chat
160
+
161
+ # List models
162
+ # curl http://localhost:8000/v1/models
163
+ ```
164
+
165
+ Query the model using the openai python package.
166
+
167
+ ```python
168
+ import openai
169
+
170
+ # Modify OpenAI's API key and API base to use vLLM's API server.
171
+ openai.api_key = "EMPTY"
172
+ openai.api_base = "http://localhost:8000/v1"
173
+
174
+ # First model
175
+ models = openai.Model.list()
176
+ model = models["data"][0]["id"]
177
+
178
+ # Chat completion API
179
+ chat_completion = openai.ChatCompletion.create(
180
+ model=model,
181
+ messages=[
182
+ {"role": "user", "content": "Parle-moi de toi-même."},
183
+ ],
184
+ max_tokens=1024,
185
+ temperature=0.7,
186
+ )
187
+ print("Chat completion results:", chat_completion)
188
+ ```
189
+
190
  ## Limitations
191
 
192
  Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.
tokenizer_config.json CHANGED
@@ -19,6 +19,7 @@
19
  "single_word": false
20
  },
21
  "legacy": false,
 
22
  "model_max_length": 1000000000000000019884624838656,
23
  "pad_token": null,
24
  "padding_side": "right",
 
19
  "single_word": false
20
  },
21
  "legacy": false,
22
+ "chat_template": "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% elif true == true %}{% set loop_messages = messages %}{% set system_message = 'Vous êtes l\\'assistant IA nommé Vigogne, créé par Zaion Lab (https://zaion.ai). Vous suivez extrêmement bien les instructions. Aidez autant que vous le pouvez.' %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% if system_message != false %}{{ '<|system|>: ' + system_message + '\\n' }}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '<|user|>: ' + message['content'].strip() + '\\n' }}{% elif message['role'] == 'assistant' %}{{ '<|assistant|>: ' + message['content'].strip() + eos_token + '\\n' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>:' }}{% endif %}",
23
  "model_max_length": 1000000000000000019884624838656,
24
  "pad_token": null,
25
  "padding_side": "right",