Original model description:

pipeline_tag: text-generation inference: false license: apache-2.0 library_name: transformers model-index: - name: ibm/PowerMoE-3b results: - task: type: text-generation dataset: type: lm-eval-harness name: ARC metrics: - name: accuracy-norm type: accuracy-norm value: 58.1 verified: false - task: type: text-generation dataset: type: lm-eval-harness name: BoolQ metrics: - name: accuracy type: accuracy value: 65.0 verified: false - task: type: text-generation dataset: type: lm-eval-harness name: Hellaswag metrics: - name: accuracy-norm type: accuracy-norm value: 71.5 verified: false - task: type: text-generation dataset: type: lm-eval-harness name: OpenBookQA metrics: - name: accuracy-norm type: accuracy-norm value: 41.0 verified: false - task: type: text-generation dataset: type: lm-eval-harness name: PIQA metrics: - name: accuracy-norm type: accuracy-norm value: 79.1 verified: false - task: type: text-generation dataset: type: lm-eval-harness name: Winogrande metrics: - name: accuracy-norm type: accuracy-norm value: 65.0 verified: false - task: type: text-generation dataset: type: lm-eval-harness name: MMLU (5 shot) metrics: - name: accuracy type: accuracy value: 42.8 verified: false - task: type: text-generation dataset: type: lm-eval-harness name: GSM8k (5 shot) metrics: - name: accuracy type: accuracy value: 25.9 verified: false - task: type: text-generation dataset: type: lm-eval-harness name: math (4 shot) metrics: - name: accuracy type: accuracy value: 14.8 verified: false - task: type: text-generation dataset: type: bigcode-eval name: humaneval metrics: - name: pass@1 type: pass@1 value: 20.1 verified: false - task: type: text-generation dataset: type: bigcode-eval name: MBPP metrics: - name: pass@1 type: pass@1 value: 32.4 verified: false

Model Summary

PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359

Usage

Note: Requires installing HF transformers from source.

Generation

This is a simple example of how to use PowerMoE-3b model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # or "cpu"
model_path = "ibm/PowerMoE-3b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."
# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# transfer tokenized inputs to the device
for i in input_tokens:
    input_tokens[i] = input_tokens[i].to(device)
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
    print(i)

Downloads last month: 4

Safetensors

Model size

3.2B params

Tensor type

F32

FP16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.