Update README.md
Browse files
README.md
CHANGED
@@ -146,6 +146,45 @@ moe_model = Phi3VForCausalLMMoE(moe_config, base_model, expert_models, layer_dt
|
|
146 |
count_parameters(expert_models[0]),count_parameters(moe_model)
|
147 |
```
|
148 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
149 |
####################
|
150 |
|
151 |
|
|
|
146 |
count_parameters(expert_models[0]),count_parameters(moe_model)
|
147 |
```
|
148 |
|
149 |
+
### Training the gating networks
|
150 |
+
|
151 |
+
To train the gating networks, you need to provide sample prompts for each of the experts. Sample prompts consist of text and image data. You must match the number of experts you have, designed by k above.
|
152 |
+
|
153 |
+
To get text data, you can use the processor/chat template:
|
154 |
+
|
155 |
+
```python
|
156 |
+
messages = [ {"role": "user", "content": "<|image_1|>\nWhat is shown in this image, and what is the relevance for materials design?"}, ]
|
157 |
+
prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
158 |
+
prompt
|
159 |
+
```
|
160 |
+
|
161 |
+
In the following example we show how it is done. The training set consists of images and prompt. The first item in the list are the prompts for expert 1, the second item the prompts for expert 2, and so on.
|
162 |
+
|
163 |
+
Sample training set and process to train (for simplicity we use only three images, one characteristic of each expert):
|
164 |
+
```python
|
165 |
+
|
166 |
+
image_1 = Image.open(requests.get("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg", stream=True).raw)
|
167 |
+
image_2 = Image.open(requests.get("https://media.wired.com/photos/5aa32b912ba43111d1213e0c/master/w_2240,c_limit/akhacouple.jpg", stream=True).raw)
|
168 |
+
image_3 = Image.open(requests.get("https://upload.wikimedia.org/wikipedia/commons/a/a0/Euplectella_aspergillum_Okeanos.jpg", stream=True).raw)
|
169 |
+
|
170 |
+
prompts_per_expert = [
|
171 |
+
[{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]},
|
172 |
+
{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]}],
|
173 |
+
|
174 |
+
[{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]},
|
175 |
+
{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]}],
|
176 |
+
|
177 |
+
[{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]},
|
178 |
+
{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]}],
|
179 |
+
]
|
180 |
+
|
181 |
+
# Train gating layers using the provided prompts
|
182 |
+
gating_layer_params = moe_model.preselect_gating_layer_params(processor, prompts_per_expert)
|
183 |
+
|
184 |
+
# Set parameters for a specific layer (example for layer 0)
|
185 |
+
moe_model.set_gating_layer_params(gating_layer_params)
|
186 |
+
```
|
187 |
+
|
188 |
####################
|
189 |
|
190 |
|