LibMoE: A Library for Comprehensive Benchmarking of Mixture of Experts in Large Language Models
Introduction
Mixture of Experts (MoEs) plays an essential role in the development of more efficient and effective large language models (LLMs). Due to the enormous resource requirements, studying large-scale MoE algorithms remains inaccessible to many researchers. This work introduces LibMoE, a comprehensive and modular framework designed to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design, (ii) efficient training, and (iii) comprehensive evaluation, LibMoE makes MoEs in LLMs more accessible to a wider range of researchers by standardizing the training and evaluation pipelines. Using LibMoE, we extensively benchmarked five state-of-the-art MoE algorithms across three different LLMs and 11 datasets under a zero-shot setting. The results show that, despite unique characteristics, all MoE algorithms perform similarly when averaged across a broad range of tasks. With its modular design and extensive evaluation capabilities, we believe LibMoE will be invaluable for researchers striving to make meaningful progress toward the next generation of MoE and LLMs.
Model and Evaluation Benchmarks
We have released five MoE algorithms trained based on microsoft/Phi-3-mini-4k-instruct
for LLMs and CLIP
for vision encoding. These models were trained on the LLAVA-665K dataset. We evaluated these state-of-the-art algorithms on 11 benchmarks, examining various aspects of MoE algorithm performance.
Model | MoE Method | AI2D | Text VQA | GQA | Hallusion Benchmark |
MathVista Validation |
MMBenchEN / dev |
MMMU Validation |
MMStar | POPE | SQA IMG Full |
MME | AVG |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CLIP + Phi3 | SMoE-R | 64.25 | 46.57 | 62.12 | 40.48 | 31.00 | 68.12 | 39.89 | 37.13 | 87.50 | 77.74 | 1,700.61 | 55.48 |
Cosine-R | 64.51 | 49.79 | 61.38 | 40.80 | 31.30 | 67.01 | 40.67 | 39.36 | 87.52 | 77.48 | 1,687.37 | 55.98 | |
Sigmoid-R | 64.38 | 47.12 | 61.65 | 40.80 | 31.90 | 67.87 | 40.11 | 39.20 | 86.93 | 77.17 | 1,710.42 | 55.71 | |
Hyper-R | 64.37 | 47.59 | 59.70 | 40.38 | 31.30 | 68.30 | 40.78 | 38.33 | 85.70 | 80.33 | 1,726.87 | 55.68 | |
Perturbed Cosine-R | 64.70 | 47.16 | 61.90 | 39.43 | 32.80 | 69.50 | 39.89 | 40.33 | 87.42 | 77.64 | 1,672.70 | 56.08 |
Run LibMoE
We provide detailed instructions for setting up and running experiments in this repository: https://github.com/Fsoft-AIC/LibMoE
Hardware Resources
Stage | MoE Method | Hardware |
---|---|---|
Pre-Training | 4xA100 | |
Pre-FineTuning | 4xA100 | |
VIT | SMoE-R | 6xA100 |
Cosine-R | 6xA100 | |
Sigmoid-R | 6xA100 | |
Hyper-R | 6xA100 | |
Perturbed Cosine-R | 6xA100 |
Citation Information
More details can be found in our paper.
If you use LibMoE, please cite it using this BibTeX:
@misc{nguyen2024libmoelibrarycomprehensivebenchmarking,
title={LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models},
author={Nam V. Nguyen and Thong T. Doan and Luong Tran and Van Nguyen and Quang Pham},
year={2024},
eprint={2411.00918},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.00918},
}
Model tree for Fsoft-AIC/Phi3-CLIP-MoE
Base model
microsoft/Phi-3-mini-4k-instruct