mjbuehler commited on
Commit
721b899
1 Parent(s): 71065d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md CHANGED
@@ -146,6 +146,45 @@ moe_model = Phi3VForCausalLMMoE(moe_config, base_model, expert_models, layer_dt
146
  count_parameters(expert_models[0]),count_parameters(moe_model)
147
  ```
148
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149
  ####################
150
 
151
 
 
146
  count_parameters(expert_models[0]),count_parameters(moe_model)
147
  ```
148
 
149
+ ### Training the gating networks
150
+
151
+ To train the gating networks, you need to provide sample prompts for each of the experts. Sample prompts consist of text and image data. You must match the number of experts you have, designed by k above.
152
+
153
+ To get text data, you can use the processor/chat template:
154
+
155
+ ```python
156
+ messages = [ {"role": "user", "content": "<|image_1|>\nWhat is shown in this image, and what is the relevance for materials design?"}, ]
157
+ prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
158
+ prompt
159
+ ```
160
+
161
+ In the following example we show how it is done. The training set consists of images and prompt. The first item in the list are the prompts for expert 1, the second item the prompts for expert 2, and so on.
162
+
163
+ Sample training set and process to train (for simplicity we use only three images, one characteristic of each expert):
164
+ ```python
165
+
166
+ image_1 = Image.open(requests.get("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg", stream=True).raw)
167
+ image_2 = Image.open(requests.get("https://media.wired.com/photos/5aa32b912ba43111d1213e0c/master/w_2240,c_limit/akhacouple.jpg", stream=True).raw)
168
+ image_3 = Image.open(requests.get("https://upload.wikimedia.org/wikipedia/commons/a/a0/Euplectella_aspergillum_Okeanos.jpg", stream=True).raw)
169
+
170
+ prompts_per_expert = [
171
+ [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]},
172
+ {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]}],
173
+
174
+ [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]},
175
+ {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]}],
176
+
177
+ [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]},
178
+ {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]}],
179
+ ]
180
+
181
+ # Train gating layers using the provided prompts
182
+ gating_layer_params = moe_model.preselect_gating_layer_params(processor, prompts_per_expert)
183
+
184
+ # Set parameters for a specific layer (example for layer 0)
185
+ moe_model.set_gating_layer_params(gating_layer_params)
186
+ ```
187
+
188
  ####################
189
 
190