DavidNguyen commited on
Commit
5f9460e
1 Parent(s): 0d4e8c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -1
README.md CHANGED
@@ -16,8 +16,11 @@ Here is the revised text with grammatical improvements:
16
  Mixture of Experts (MoEs) plays an essential role in the development of more efficient and effective large language models (LLMs). Due to the enormous resource requirements, studying large-scale MoE algorithms remains inaccessible to many researchers. This work introduces LibMoE, a comprehensive and modular framework designed to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design, (ii) efficient training, and (iii) comprehensive evaluation, LibMoE makes MoEs in LLMs more accessible to a wider range of researchers by standardizing the training and evaluation pipelines. Using LibMoE, we extensively benchmarked five state-of-the-art MoE algorithms across three different LLMs and 11 datasets under a zero-shot setting. The results show that, despite unique characteristics, all MoE algorithms perform similarly when averaged across a broad range of tasks. With its modular design and extensive evaluation capabilities, we believe LibMoE will be invaluable for researchers striving to make meaningful progress toward the next generation of MoE and LLMs.
17
 
18
  ### Model and Evaluation Benchmarks
19
- We have released five MoE algorithms trained based on `microsoft/Phi-3-mini-4k-instruct` for LLMs and `SigLIP` for vision encoding. We evaluated these state-of-the-art algorithms on 11 benchmarks, examining various aspects of MoE algorithm performance.
20
 
 
 
 
 
21
  | Model | MoE Method | AI2D | Text VQA | GQA | Hallusion<br>Benchmark | MathVista<br>Validation | MMBenchEN<br>/ dev | MMMU<br>Validation | MMStar | POPE | SQA IMG<br>Full | MME | AVG |
22
  |---------------------|---------------------|-------|----------|-------|-------------------------|-------------------------|---------------------|---------------------|--------|--------|------------------|-----------|-------|
23
  | SigLIP 224 + Phi3 | SMoE-R | 64.35 | 40.35 | 60.03 | **41.75** | 28.7 | 67.96 | 40.22 | 39.47 | 84.31 | 80.71 | 1,655.81 | 54.78 |
@@ -26,6 +29,25 @@ We have released five MoE algorithms trained based on `microsoft/Phi-3-mini-4k-i
26
  | | Hyper-R | **65.12** | 41.67 | 59.88 | 41.32 | 30.3 | 69.33 | 41.44 | 39.86 | 85.4 | 79.03 | 1,752.39 | 55.34 |
27
  | | Perturbed Cosine-R | 64.8 | 41.89 | **61.0** | 40.9 | **31.8** | **70.7** | **42.0** | **39.6** | **86.43** | **81.44** | **1,776.54** | **56.06** |
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ### Citation Information
30
  More details can be found in our paper.
31
 
 
16
  Mixture of Experts (MoEs) plays an essential role in the development of more efficient and effective large language models (LLMs). Due to the enormous resource requirements, studying large-scale MoE algorithms remains inaccessible to many researchers. This work introduces LibMoE, a comprehensive and modular framework designed to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design, (ii) efficient training, and (iii) comprehensive evaluation, LibMoE makes MoEs in LLMs more accessible to a wider range of researchers by standardizing the training and evaluation pipelines. Using LibMoE, we extensively benchmarked five state-of-the-art MoE algorithms across three different LLMs and 11 datasets under a zero-shot setting. The results show that, despite unique characteristics, all MoE algorithms perform similarly when averaged across a broad range of tasks. With its modular design and extensive evaluation capabilities, we believe LibMoE will be invaluable for researchers striving to make meaningful progress toward the next generation of MoE and LLMs.
17
 
18
  ### Model and Evaluation Benchmarks
 
19
 
20
+ We have released five MoE algorithms trained based on `microsoft/Phi-3-mini-4k-instruct` for LLMs and `SigLIP` for vision encoding. These models were trained on the [LLAVA-665K dataset](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K). We evaluated these state-of-the-art algorithms on 11 benchmarks, examining various aspects of MoE algorithm performance.
21
+
22
+
23
+ Replace `<link-to-LLAVA-665K>` with the actual URL to the LLAVA-665K dataset. Let me know if you need more adjustments!
24
  | Model | MoE Method | AI2D | Text VQA | GQA | Hallusion<br>Benchmark | MathVista<br>Validation | MMBenchEN<br>/ dev | MMMU<br>Validation | MMStar | POPE | SQA IMG<br>Full | MME | AVG |
25
  |---------------------|---------------------|-------|----------|-------|-------------------------|-------------------------|---------------------|---------------------|--------|--------|------------------|-----------|-------|
26
  | SigLIP 224 + Phi3 | SMoE-R | 64.35 | 40.35 | 60.03 | **41.75** | 28.7 | 67.96 | 40.22 | 39.47 | 84.31 | 80.71 | 1,655.81 | 54.78 |
 
29
  | | Hyper-R | **65.12** | 41.67 | 59.88 | 41.32 | 30.3 | 69.33 | 41.44 | 39.86 | 85.4 | 79.03 | 1,752.39 | 55.34 |
30
  | | Perturbed Cosine-R | 64.8 | 41.89 | **61.0** | 40.9 | **31.8** | **70.7** | **42.0** | **39.6** | **86.43** | **81.44** | **1,776.54** | **56.06** |
31
 
32
+ ### Run LibMoE
33
+
34
+ We provide detailed instructions for setting up and running experiments in this repository: [https://github.com/Fsoft-AIC/LibMoE](https://github.com/Fsoft-AIC/LibMoE)
35
+
36
+ ### Hardware Resources
37
+
38
+ | Stage | MoE Method | Hardware |
39
+ |-------------------|----------------------|-----------|
40
+ | Pre-Training | | 4xA100 |
41
+ | Pre-FineTuning | | 4xA100 |
42
+ | VIT | SMoE-R | 6xA100 |
43
+ | | Cosine-R | 6xA100 |
44
+ | | Sigmoid-R | 6xA100 |
45
+ | | Hyper-R | 6xA100 |
46
+ | | Perturbed Cosine-R | 6xA100 |
47
+
48
+ ---
49
+
50
+
51
  ### Citation Information
52
  More details can be found in our paper.
53