bofenghuang commited on
Commit
e572238
1 Parent(s): 9a7a6e9
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: fr
4
+ pipeline_tag: text-generation
5
+ inference:
6
+ parameters:
7
+ temperature: 0.7
8
+ tags:
9
+ - LLM
10
+ - finetuned
11
+ ---
12
+
13
+ # Vigogne-Stablelm-3B-4E1T-Chat
14
+
15
+ An attempt to fine-tune the [stablelm-3b-4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t) model to explore the feasibility of adapting a "smaller-scale" language model, primarily pretrained on English datasets, for French chat.
16
+
17
+ **License**: A significant portion of the training data is distilled from GPT-3.5-Turbo and GPT-4, kindly use it cautiously to avoid any violations of OpenAI's [terms of use](https://openai.com/policies/terms-of-use).
18
+
19
+ ## Usage
20
+
21
+ ```python
22
+ from typing import Dict, List, Optional
23
+ import torch
24
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, TextStreamer
25
+
26
+ model_name_or_path = "bofenghuang/vigogne-stablelm-3b-4e1t-chat"
27
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, padding_side="right", use_fast=False)
28
+ model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True)
29
+
30
+ streamer = TextStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
31
+
32
+
33
+ def chat(
34
+ query: str,
35
+ history: Optional[List[Dict]] = None,
36
+ temperature: float = 0.7,
37
+ top_p: float = 1.0,
38
+ top_k: float = 0,
39
+ repetition_penalty: float = 1.1,
40
+ max_new_tokens: int = 1024,
41
+ **kwargs,
42
+ ):
43
+ if history is None:
44
+ history = []
45
+
46
+ history.append({"role": "user", "content": query})
47
+
48
+ input_ids = tokenizer.apply_chat_template(history, return_tensors="pt").to(model.device)
49
+ input_length = input_ids.shape[1]
50
+
51
+ generated_outputs = model.generate(
52
+ input_ids=input_ids,
53
+ generation_config=GenerationConfig(
54
+ temperature=temperature,
55
+ do_sample=temperature > 0.0,
56
+ top_p=top_p,
57
+ top_k=top_k,
58
+ repetition_penalty=repetition_penalty,
59
+ max_new_tokens=max_new_tokens,
60
+ pad_token_id=tokenizer.eos_token_id,
61
+ **kwargs,
62
+ ),
63
+ streamer=streamer,
64
+ return_dict_in_generate=True,
65
+ )
66
+
67
+ generated_tokens = generated_outputs.sequences[0, input_length:]
68
+ generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
69
+
70
+ history.append({"role": "assistant", "content": generated_text})
71
+
72
+ return generated_text, history
73
+
74
+
75
+ # 1st round
76
+ response, history = chat("Un escargot parcourt 100 mètres en 5 heures. Quelle est sa vitesse ?", history=None)
77
+ ```