lamm-mit
/

Cephalo-Phi-3-MoE-vision-128k-3x4b-beta

Model card Files Files and versions Community

mjbuehler commited on Jun 2, 2024

Commit

c14b44d

·

verified ·

1 Parent(s): 541c55e

Update README.md

Files changed (1) hide show

README.md +81 -1

README.md CHANGED Viewed

@@ -211,7 +211,7 @@ moe_model.set_gating_layer_params(gating_layer_params)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/xzZwBIw1yYr9v7xYblCNZ.png)
-### Peparing gating network for training
 To freeze all parameters in the model except for the gating neural networks, you can use:
@@ -224,6 +224,86 @@ You can unfreeze:
 un_freeze_all(moe_model)
 ```
 ## Inference
 ### Chat Format

 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/xzZwBIw1yYr9v7xYblCNZ.png)
+### Peparing gating network for full training
 To freeze all parameters in the model except for the gating neural networks, you can use:
 un_freeze_all(moe_model)
 ```
+Define FT_repo_id to push on HF hub/save model:
+```
+FT_repo_id='xxxxx/' #<repo_ID>
+```
+```
+from datasets import load_dataset
+train_dataset = load_dataset("lamm-mit/Cephalo-Wikipedia-Materials", split="train")
+```
+```python
+import random
+class MyDataCollator:
+    def __init__(self, processor):
+        self.processor = processor
+    def __call__(self, examples):
+        texts = []
+        images = []
+        for example in examples:
+            image = example["image"]
+            question = example["query"]
+            answer = example["answer"]
+            messages = [ {
+                            "role": "user",  "content": '<|image_1|>\n'+question},
+                           {"role": "assistant", "content": f"{answer}"}, ]
+            text = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
+            images.append(image)
+        batch = processor(text=text, images=[image], return_tensors="pt", padding=True
+        labels = batch["input_ids"].clone()
+        labels[labels <0] = -100
+        batch["labels"] = labels
+        return batch
+data_collator = MyDataCollator(processor)
+```
+Then set up trainer, and train:
+```python
+from transformers import TrainingArguments, Trainer
+optim = "paged_adamw_8bit"
+training_args = TrainingArguments(
+    num_train_epochs=2,
+    per_device_train_batch_size=1,
+    gradient_accumulation_steps=4,
+    warmup_steps=250,
+    learning_rate=1e-5,
+    weight_decay=0.01,
+    logging_steps=25,
+    output_dir="output_training",
+    optim=optim,
+    save_strategy="steps",
+    save_steps=1000,
+    save_total_limit=16,
+    #fp16=True,
+    bf16=True,
+    push_to_hub_model_id=FT_repo_id,
+    remove_unused_columns=False,
+    report_to="none",
+)
+trainer = Trainer(
+    model=moe_model,
+    args=training_args,
+    data_collator=data_collator,
+    train_dataset=train_dataset,
+)
+trainer.train()
+```
 ## Inference
 ### Chat Format