LLaMA-MoE-v1-3.5B (2/8) SFT

[πŸ’» Code] | [πŸ“œ Technical Report]

This is the supervised fine-tuned version of LLaMA-MoE-v1-3_5B-2_8 on Deita-6k for 2 epochs.

Model #Activated Experts #Experts #Activated Params Foundation Model SFT Model
LLaMA-MoE-3.0B 2 16 3.0B πŸ€— base πŸ€— SFT
LLaMA-MoE-3.5B (4/16) 4 16 3.5B πŸ€— base πŸ€— SFT
LLaMA-MoE-3.5B (2/8) 2 8 3.5B πŸ€— base πŸ€— SFT

πŸš€ QuickStart

# python>=3.10

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_dir = "llama-moe/LLaMA-MoE-v1-3_5B-2_8-sft"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.bfloat16, trust_remote_code=True)
model.eval()
model.cuda()

input_text = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. human: Give me a three-day plan in Suzhou. gpt:"
inputs = tokenizer(input_text, return_tensors="pt")
input_ids = inputs["input_ids"].cuda()

pred = model.generate(input_ids, max_length=100, temperature=1.0, do_sample=True, use_cache=True)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
"""
Sure, I can provide you with a three-day itinerary in Suzhou. Here's what we can do:

Day 1:

* Visit Suzhou Industrial Park, a major commercial and manufacturing district ...
"""

πŸ“Š Performance

Model MMLU ARC-c HellaSeag TruthfulQA MT-Bench
Sheared LLaMA-2.7B ShareGPT 28.41 41.04 71.21 47.65 3.79
Sheared LLaMA-2.7B Deita6K (Our Impl.) 25.24 43.69 71.70 49.00 4.06
LLaMA-MoE-v1-3.0B (2/16) 23.61 43.43 72.28 44.24 4.15
LLaMA-MoE-v1-3.5B (4/16) 26.49 48.29 75.10 45.91 4.60
LLaMA-MoE-v1-3.5B (2/8) 25.53 45.99 74.95 44.39 4.72

πŸ“ƒ Citation

@article{llama-moe,
  title={LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training},
  author={Tong Zhu and Xiaoye Qu and Daize Dong and Jiacheng Ruan and Jingqi Tong and Conghui He and Yu Cheng},
  journal={arXiv preprint arXiv:2406.16554},
  year={2024},
  url={https://arxiv.org/abs/2406.16554},
}
Downloads last month
12
Safetensors
Model size
6.74B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.