lamm-mit
/

Cephalo-Phi-3-MoE-vision-128k-3x4b-beta

Model card Files Files and versions Community

mjbuehler commited on May 31

Commit

721b899

•

1 Parent(s): 71065d4

Update README.md

Files changed (1) hide show

README.md +39 -0

README.md CHANGED Viewed

@@ -146,6 +146,45 @@ moe_model = Phi3VForCausalLMMoE(moe_config, base_model, expert_models,  layer_dt
 count_parameters(expert_models[0]),count_parameters(moe_model)
 ```
 ####################

 count_parameters(expert_models[0]),count_parameters(moe_model)
 ```
+### Training the gating networks
+To train the gating networks, you need to provide sample prompts for each of the experts. Sample prompts consist of text and image data. You must match the number of experts you have, designed by k above.
+To get text data, you can use the processor/chat template:
+```python
+messages = [ {"role": "user", "content": "<|image_1|>\nWhat is shown in this image, and what is the relevance for materials design?"}, ]
+prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+prompt
+```
+In the following example we show how it is done. The training set consists of images and prompt. The first item in the list are the prompts for expert 1, the second item the prompts for expert 2, and so on.
+Sample training set and process to train (for simplicity we use only three images, one characteristic of each expert):
+```python
+image_1 = Image.open(requests.get("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg", stream=True).raw)
+image_2 = Image.open(requests.get("https://media.wired.com/photos/5aa32b912ba43111d1213e0c/master/w_2240,c_limit/akhacouple.jpg", stream=True).raw)
+image_3 = Image.open(requests.get("https://upload.wikimedia.org/wikipedia/commons/a/a0/Euplectella_aspergillum_Okeanos.jpg", stream=True).raw)
+prompts_per_expert = [
+    [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]},
+     {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]}],
+    [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]},
+     {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]}],
+    [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]},
+     {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]}],
+]
+# Train gating layers using the provided prompts
+gating_layer_params = moe_model.preselect_gating_layer_params(processor, prompts_per_expert)
+# Set parameters for a specific layer (example for layer 0)
+moe_model.set_gating_layer_params(gating_layer_params)
+```
 ####################