mjbuehler commited on
Commit
7ef950e
·
verified ·
1 Parent(s): 721b899

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -8
README.md CHANGED
@@ -39,9 +39,15 @@ The model is developed to process diverse inputs, including images and text, fac
39
 
40
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
41
 
42
- This version of Cephalo, lamm-mit/Cephalo-Phi-3-MoE-vision-128k-3x4b-beta, is a Mixture-of-Expert model based on the Phi-3-Vision-128K-Instruct model.
43
 
44
- ###
 
 
 
 
 
 
45
 
46
  ```python
47
  import torch
@@ -67,7 +73,12 @@ count_parameters(moe_model)
67
 
68
  ## Make a Phi-3-V-MoE model from several pre-trained models
69
 
70
- Download .py files
 
 
 
 
 
71
  ```python
72
  from huggingface_hub import HfApi, hf_hub_download
73
  from tqdm.notebook import tqdm
@@ -162,6 +173,8 @@ In the following example we show how it is done. The training set consists of im
162
 
163
  Sample training set and process to train (for simplicity we use only three images, one characteristic of each expert):
164
  ```python
 
 
165
 
166
  image_1 = Image.open(requests.get("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg", stream=True).raw)
167
  image_2 = Image.open(requests.get("https://media.wired.com/photos/5aa32b912ba43111d1213e0c/master/w_2240,c_limit/akhacouple.jpg", stream=True).raw)
@@ -169,13 +182,13 @@ image_3 = Image.open(requests.get("https://upload.wikimedia.org/wikipedia/common
169
 
170
  prompts_per_expert = [
171
  [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]},
172
- {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]}],
173
 
174
  [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]},
175
- {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]}],
176
 
177
  [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]},
178
- {"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]}],
179
  ]
180
 
181
  # Train gating layers using the provided prompts
@@ -185,8 +198,20 @@ gating_layer_params = moe_model.preselect_gating_layer_params(processor, prompts
185
  moe_model.set_gating_layer_params(gating_layer_params)
186
  ```
187
 
188
- ####################
 
 
 
 
 
 
 
 
 
 
 
189
 
 
190
 
191
  ### Chat Format
192
 
@@ -233,7 +258,7 @@ prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_g
233
  inputs = processor(prompt, [image], return_tensors="pt").to("cuda:0")
234
 
235
  generation_args = {
236
- "max_new_tokens": 512,
237
  "temperature": 0.1,
238
  "do_sample": True,
239
  "stop_strings": ['<|end|>',
 
39
 
40
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
41
 
42
+ This version of Cephalo, lamm-mit/Cephalo-Phi-3-MoE-vision-128k-3x4b-beta, is a Mixture-of-Expert model based on the Phi-3-Vision-128K-Instruct model. The model architecture is as follows:
43
 
44
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/b7BK8ZtDzTMsyFDi0wP3w.png)
45
+
46
+ ### Download MoE Model
47
+
48
+ ```markdown
49
+ pip install transformers -U
50
+ ```
51
 
52
  ```python
53
  import torch
 
73
 
74
  ## Make a Phi-3-V-MoE model from several pre-trained models
75
 
76
+ Download .py files that implement the Phi-3-V and the Mixture-of-Expert Vision model
77
+
78
+ ```markdown
79
+ pip install huggingface_hub
80
+ ```
81
+
82
  ```python
83
  from huggingface_hub import HfApi, hf_hub_download
84
  from tqdm.notebook import tqdm
 
173
 
174
  Sample training set and process to train (for simplicity we use only three images, one characteristic of each expert):
175
  ```python
176
+ from PIL import Image
177
+ import requests
178
 
179
  image_1 = Image.open(requests.get("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg", stream=True).raw)
180
  image_2 = Image.open(requests.get("https://media.wired.com/photos/5aa32b912ba43111d1213e0c/master/w_2240,c_limit/akhacouple.jpg", stream=True).raw)
 
182
 
183
  prompts_per_expert = [
184
  [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]},
185
+ {"text": "<|user|>\n<|image_1|>\nPrompt 2 for expert 1<|end|>\n<|assistant|>\n", "image": [image_1]}],
186
 
187
  [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]},
188
+ {"text": "<|user|>\n<|image_1|>\nPrompt 2 for expert 2<|end|>\n<|assistant|>\n", "image": [image_2]}],
189
 
190
  [{"text": "<|user|>\n<|image_1|>\nPrompt 1 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]},
191
+ {"text": "<|user|>\n<|image_1|>\nPrompt 2 for expert 3<|end|>\n<|assistant|>\n", "image": [image_3]}],
192
  ]
193
 
194
  # Train gating layers using the provided prompts
 
198
  moe_model.set_gating_layer_params(gating_layer_params)
199
  ```
200
 
201
+ ### Peparing gating network for training
202
+
203
+ To freeze all parameters in the model except for the gating neural networks, you can use:
204
+
205
+ ```python
206
+ freeze_except_gating_layers(moe_model)
207
+ count_parameters(moe_model)
208
+ ```
209
+ You can unfreeze:
210
+ ```python
211
+ un_freeze_all(moe_model)
212
+ ```
213
 
214
+ ## Inference
215
 
216
  ### Chat Format
217
 
 
258
  inputs = processor(prompt, [image], return_tensors="pt").to("cuda:0")
259
 
260
  generation_args = {
261
+ "max_new_tokens": 256,
262
  "temperature": 0.1,
263
  "do_sample": True,
264
  "stop_strings": ['<|end|>',