lamm-mit
/

Cephalo-Phi-3-MoE-vision-128k-3x4b-beta

Model card Files Files and versions Community

mjbuehler commited on May 31, 2024

Commit

7ef950e

verified ·

1 Parent(s): 721b899

Update README.md

Browse files

Files changed (1) hide show

README.md +33 -8

README.md CHANGED Viewed

@@ -39,9 +39,15 @@ The model is developed to process diverse inputs, including images and text, fac
 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
-This version of Cephalo, lamm-mit/Cephalo-Phi-3-MoE-vision-128k-3x4b-beta, is a Mixture-of-Expert model based on the Phi-3-Vision-128K-Instruct model.
-###
 ```python
 import torch
@@ -67,7 +73,12 @@ count_parameters(moe_model)
 ## Make a Phi-3-V-MoE model from several pre-trained models
-Download .py files
 ```python
 from huggingface_hub import HfApi, hf_hub_download
 from tqdm.notebook import tqdm
@@ -162,6 +173,8 @@ In the following example we show how it is done. The training set consists of im
 Sample training set and process to train (for simplicity we use only three images, one characteristic of each expert):
 ```python
 image_1 = Image.open(requests.get("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg", stream=True).raw)
 image_2 = Image.open(requests.get("https://media.wired.com/photos/5aa32b912ba43111d1213e0c/master/w_2240,c_limit/akhacouple.jpg", stream=True).raw)
@@ -169,13 +182,13 @@ image_3 = Image.open(requests.get("https://upload.wikimedia.org/wikipedia/common
 prompts_per_expert = [
     [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]},
-     {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]}],
     [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]},
-     {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]}],
     [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]},
-     {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]}],
 ]
 # Train gating layers using the provided prompts
@@ -185,8 +198,20 @@ gating_layer_params = moe_model.preselect_gating_layer_params(processor, prompts
 moe_model.set_gating_layer_params(gating_layer_params)
 ```
-####################
 ### Chat Format
@@ -233,7 +258,7 @@ prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_g
 inputs = processor(prompt, [image], return_tensors="pt").to("cuda:0")
 generation_args = {
-                    "max_new_tokens": 512,
                     "temperature": 0.1,
                     "do_sample": True,
                     "stop_strings": ['<|end|>',

 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
+This version of Cephalo, lamm-mit/Cephalo-Phi-3-MoE-vision-128k-3x4b-beta, is a Mixture-of-Expert model based on the Phi-3-Vision-128K-Instruct model. The model architecture is as follows:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/b7BK8ZtDzTMsyFDi0wP3w.png)
+### Download MoE Model
+```markdown
+pip install transformers -U
+```
 ```python
 import torch
 ## Make a Phi-3-V-MoE model from several pre-trained models
+Download .py files that implement the Phi-3-V and the Mixture-of-Expert Vision model
+```markdown
+pip install huggingface_hub
+```
 ```python
 from huggingface_hub import HfApi, hf_hub_download
 from tqdm.notebook import tqdm
 Sample training set and process to train (for simplicity we use only three images, one characteristic of each expert):
 ```python
+from PIL import Image
+import requests
 image_1 = Image.open(requests.get("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg", stream=True).raw)
 image_2 = Image.open(requests.get("https://media.wired.com/photos/5aa32b912ba43111d1213e0c/master/w_2240,c_limit/akhacouple.jpg", stream=True).raw)
 prompts_per_expert = [
     [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]},
+     {"text": "<|user|>\n<|image_1|>\nPrompt 2 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]}],
     [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]},
+     {"text": "<|user|>\n<|image_1|>\nPrompt 2 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]}],
     [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]},
+     {"text": "<|user|>\n<|image_1|>\nPrompt 2 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]}],
 ]
 # Train gating layers using the provided prompts
 moe_model.set_gating_layer_params(gating_layer_params)
 ```
+### Peparing gating network for training
+To freeze all parameters in the model except for the gating neural networks, you can use:
+```python
+freeze_except_gating_layers(moe_model)
+count_parameters(moe_model)
+```
+You can unfreeze:
+```python
+un_freeze_all(moe_model)
+```
+## Inference
 ### Chat Format
 inputs = processor(prompt, [image], return_tensors="pt").to("cuda:0")
 generation_args = {
+                    "max_new_tokens": 256,
                     "temperature": 0.1,
                     "do_sample": True,
                     "stop_strings": ['<|end|>',