nunonmg commited on
Commit
c1778eb
1 Parent(s): 407df6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -33
README.md CHANGED
@@ -61,8 +61,7 @@ Here's how you can run the model using the `pipeline()` function from 🤗 Trans
61
  import torch
62
  from transformers import pipeline
63
 
64
- pipe = pipeline(“text-generation”, model=“Unbabel/TowerInstruct-v0.1", torch_dtype=torch.bfloat16, device_map=“cuda:3”)
65
-
66
  # We use the tokenizer’s chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
67
  messages = [
68
  {“role”: “user”, “content”: “Translate the following text from Portuguese into English.\nPortuguese: Um grupo de investigadores lançou um novo modelo para tarefas relacionadas com tradução.\nEnglish:“},
@@ -78,37 +77,48 @@ print(outputs[0][“generated_text”])
78
  # A group of researchers has launched a new model for translation-related tasks.
79
  ```
80
 
81
-
82
- ### Direct Use
83
-
84
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
85
-
86
- ### Downstream Use [optional]
87
-
88
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
89
-
90
  ### Out-of-Scope Use
91
 
92
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
93
 
94
  ## Bias, Risks, and Limitations
95
 
96
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
97
-
98
- ### Recommendations
99
-
100
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
101
-
102
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
103
 
104
  ## Prompt Format
105
 
106
- Mention mlchat here (no system prompt)
 
 
 
 
 
 
 
 
107
 
108
  ### Supervised tasks
109
 
110
- Prompts for different tasks.
111
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  [More Information Needed]
113
 
114
  ## Training Details
@@ -125,17 +135,21 @@ Write sth about Axolotl.
125
 
126
  The following hyperparameters were used during training:
127
 
128
- learning_rate: 7e-06
129
- seed: 42
130
- distributed_type: multi-GPU
131
- num_devices: 4
132
- total_train_batch_size: 256
133
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
134
- lr_scheduler_type: cosine
135
- lr_scheduler_warmup_steps: 500
136
- weight_decay: 0.01
137
- num_epochs: 4
138
- max_seq_length: 2048
 
 
 
 
139
 
140
  ## Citation
141
 
 
61
  import torch
62
  from transformers import pipeline
63
 
64
+ pipe = pipeline(“text-generation”, model=“Unbabel/TowerInstruct-v0.1“, torch_dtype=torch.bfloat16, device_map=“auto”)
 
65
  # We use the tokenizer’s chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
66
  messages = [
67
  {“role”: “user”, “content”: “Translate the following text from Portuguese into English.\nPortuguese: Um grupo de investigadores lançou um novo modelo para tarefas relacionadas com tradução.\nEnglish:“},
 
77
  # A group of researchers has launched a new model for translation-related tasks.
78
  ```
79
 
 
 
 
 
 
 
 
 
 
80
  ### Out-of-Scope Use
81
 
82
+ The model is not guaranteed to perform for languages other than the 10 languages it supports. Even though we trained the model on conversational data and code instructions, it is not intended to be used as a conversational chatbot or code assistant.
83
 
84
  ## Bias, Risks, and Limitations
85
 
86
+ TowerInstruct-v0.1 has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).
 
 
 
 
 
 
87
 
88
  ## Prompt Format
89
 
90
+ TowerInstruct-v0.1 was trained using the ChatML prompt templates without any system prompts. An example follows below:
91
+ ```
92
+ <|im_start|>user
93
+ {USER PROMPT}<|im_end|>
94
+ <|im_start|>assistant
95
+ {MODEL RESPONSE}<|im_end|>
96
+ <|im_start|>user
97
+ [...]
98
+ ```
99
 
100
  ### Supervised tasks
101
 
102
+ - Translation
103
+ ```
104
+ Translate the following text from $SRC_LANG into $TGT_LANG.
105
+ $SRC_LANG: $SRC_TEXT
106
+ $TGT_LANG: # make sure to add a white space the target placeholder "$TGT_LANG:" for best results
107
+ ```
108
+ - Automatic Post Edition
109
+ ```
110
+ Translate the following text from $SRC_LANG into $TGT_LANG.
111
+ $SRC_LANG: $SRC_TEXT
112
+ $TGT_LANG:
113
+ ```
114
+ - Machine Translation Evaluation
115
+ - Context-aware Translation
116
+ - Terminology-aware Translation
117
+ - Multi-reference Translation
118
+ - Named-entity Recognition
119
+ - Paraphrase Generation
120
+ - Synthetic Chat data
121
+ - Code instructions
122
  [More Information Needed]
123
 
124
  ## Training Details
 
135
 
136
  The following hyperparameters were used during training:
137
 
138
+ - total_train_batch_size: 256
139
+
140
+ - learning_rate: 7e-06
141
+
142
+ - lr_scheduler_type: cosine
143
+
144
+ - lr_scheduler_warmup_steps: 500
145
+
146
+ - weight_decay: 0.01
147
+
148
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
149
+
150
+ - num_epochs: 4
151
+
152
+ - max_seq_length: 2048
153
 
154
  ## Citation
155