TobDeBer/PowerMoe-3b-GGUF

Model Summary

PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359

This is a GGUF quantized version.

Usage

Requires latest llama.cpp to run.

Generation

This is a simple example of how to use the PowerMoe GGUF:

./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"

Model tree for TobDeBer/PowerMoe-3b-GGUF

Base model

ibm/PowerMoE-3b

Quantized

this model

Evaluation results

accuracy-norm on ARC
self-reported

58.100
accuracy on BoolQ
self-reported

65.000
accuracy-norm on Hellaswag
self-reported

71.500
accuracy-norm on OpenBookQA
self-reported

41.000
accuracy-norm on PIQA
self-reported

79.100
accuracy-norm on Winogrande
self-reported

65.000
accuracy on MMLU (5 shot)
self-reported

42.800
accuracy on GSM8k (5 shot)
self-reported

25.900
accuracy on math (4 shot)
self-reported

14.800
pass@1 on humaneval
self-reported

20.100

View on Papers With Code