Instructions to use thomasgauthier/Unmixtraled-22B-v0.1-expert-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use thomasgauthier/Unmixtraled-22B-v0.1-expert-2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="thomasgauthier/Unmixtraled-22B-v0.1-expert-2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("thomasgauthier/Unmixtraled-22B-v0.1-expert-2")
model = AutoModelForCausalLM.from_pretrained("thomasgauthier/Unmixtraled-22B-v0.1-expert-2")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use thomasgauthier/Unmixtraled-22B-v0.1-expert-2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "thomasgauthier/Unmixtraled-22B-v0.1-expert-2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thomasgauthier/Unmixtraled-22B-v0.1-expert-2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-2

SGLang

How to use thomasgauthier/Unmixtraled-22B-v0.1-expert-2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "thomasgauthier/Unmixtraled-22B-v0.1-expert-2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thomasgauthier/Unmixtraled-22B-v0.1-expert-2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "thomasgauthier/Unmixtraled-22B-v0.1-expert-2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thomasgauthier/Unmixtraled-22B-v0.1-expert-2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use thomasgauthier/Unmixtraled-22B-v0.1-expert-2 with Docker Model Runner:
```
docker model run hf.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-2
```

Unmixtraled 22B expert 2

This model outputs gibberish as it was not trained under the dense configuration. Finetuning or merging is needed to make this model useful.

This is a 22B Mistral model recycling weights from mistral-community/Mixtral-8x22B-v0.1. The model was adapted from a Mixtral architecture to a dense Mistral architecture with the same number of layers, attention heads and hidden dimensions.
Embeddings, attention, layer norms and LM head weights were taken directly from the 8x22B model, all MLP weights were taken from expert 2.

The following named weight correspondance was used:

Mistral weight	Mixtral weight
`gate_proj`	`experts.2.w1`
`down_proj`	`experts.2.w2`
`up_proj`	`experts.2.w3`

Unmixtraled models

Expert	Source	Wikitext perplexity
Unmixtraled-22B-v0.1-expert-0	Mixtral 8x22B embed, attn, layernorm, lm_head + expert 0 MLPs	696.6932983398438
Unmixtraled-22B-v0.1-expert-1	Mixtral 8x22B embed, attn, layernorm, lm_head + expert 1 MLPs	6853.04248046875
Unmixtraled-22B-v0.1-expert-2	Mixtral 8x22B embed, attn, layernorm, lm_head + expert 2 MLPs	4689.181640625
Unmixtraled-22B-v0.1-expert-3	Mixtral 8x22B embed, attn, layernorm, lm_head + expert 3 MLPs	782.3755493164062
Unmixtraled-22B-v0.1-expert-4	Mixtral 8x22B embed, attn, layernorm, lm_head + expert 4 MLPs	2844.943603515625
Unmixtraled-22B-v0.1-expert-5	Mixtral 8x22B embed, attn, layernorm, lm_head + expert 5 MLPs	1099.32373046875
Unmixtraled-22B-v0.1-expert-6	Mixtral 8x22B embed, attn, layernorm, lm_head + expert 6 MLPs	341.5309753417969
Unmixtraled-22B-v0.1-expert-7	Mixtral 8x22B embed, attn, layernorm, lm_head + expert 7 MLPs	2099.63818359375
Unmixtraled-22B-v0.1-lerp	Mixtral 8x22B embed, attn, layernorm, lm_head + linear merge of expert 0-7 MLPs	1873.9874267578125

Code

The following code was used to extract the experts and construct the dense models:

# pip install -U transformers huggingface_hub "git+https://github.com/arcee-ai/mergekit@7467108c05d56ef2bb4b8f33936d437dc448f7dd"

import fnmatch
import json
import os
import re
import shutil

import torch
from huggingface_hub import snapshot_download
from mergekit.architecture import get_architecture_info
from mergekit.common import ModelReference
from mergekit.io import LazyTensorLoader, TensorWriter
from tqdm import tqdm

MIXTRAL_MODEL_ID = "mistral-community/Mixtral-8x22B-v0.1"
MIXTRAL_PATH = snapshot_download(repo_id=MIXTRAL_MODEL_ID)
print(f"Mixtral downloaded to: {MIXTRAL_PATH}")

MISTRAL_PATH = snapshot_download(
    repo_id="mistralai/Mistral-7B-v0.1", allow_patterns=["config.json"]
)
print(f"Mistral config downloaded to: {MISTRAL_PATH}")

with open(os.path.join(MISTRAL_PATH, "config.json"), "r") as f:
    mistral_config = json.load(f)

with open(os.path.join(MIXTRAL_PATH, "config.json"), "r") as f:
    mixtral_config = json.load(f)

combined_config = {
    key: mixtral_config[key] for key in mistral_config if key in mixtral_config
}
combined_config["architectures"] = ["MistralForCausalLM"]
combined_config["model_type"] = "mistral"

mixtral_model_ref = ModelReference.parse(MIXTRAL_PATH)
mixtral_architecture_info = get_architecture_info(mixtral_model_ref.config())
mixtral_loader = LazyTensorLoader(mixtral_model_ref.tensor_index(), lazy_unpickle=True)

ALLOW_LIST = ["generation_config.json", "tokenizer.model", "tokenizer_config.json"]

def copy_directory(src, dest, allowed_patterns):
    os.makedirs(dest, exist_ok=True)
    for root, dirs, files in os.walk(src):
        # Only keep directories that match at least one of the allowed patterns
        dirs[:] = [d for d in dirs if any(fnmatch.fnmatch(d, pattern) for pattern in allowed_patterns)]
        for file in files:
            # Only copy files that match at least one of the allowed patterns
            if any(fnmatch.fnmatch(file, pattern) for pattern in allowed_patterns):
                src_path = os.path.join(root, file)
                dest_path = os.path.join(dest, os.path.relpath(src_path, src))
                os.makedirs(os.path.dirname(dest_path), exist_ok=True)
                shutil.copy2(src_path, dest_path)

def get_tensor(layer_num, expert_num, tensor_type):
    weight_name = f"model.layers.{layer_num}.block_sparse_moe.experts.{expert_num}.{tensor_type}.weight"
    return mixtral_loader.get_tensor(weight_name)


def extract_layer_number(string):
    match = re.search(r"layers\.(\d+)\.", string)
    return int(match.group(1)) if match else None


def save_expert_as_dense(output_path, expert_num):
    dense_model_ref = ModelReference.parse(output_path)
    dense_architecture_info = get_architecture_info(dense_model_ref.config())

    writer = TensorWriter(output_path, safe_serialization=True)

    for weight_info in tqdm(dense_architecture_info.all_weights(dense_model_ref.config())):
        if weight_info.name.endswith(".up_proj.weight"):
            layer_num = extract_layer_number(weight_info.name)
            writer.save_tensor(weight_info.name, get_tensor(layer_num, expert_num, "w3"))
        elif weight_info.name.endswith(".down_proj.weight"):
            layer_num = extract_layer_number(weight_info.name)
            writer.save_tensor(weight_info.name, get_tensor(layer_num, expert_num, "w2"))
        elif weight_info.name.endswith(".gate_proj.weight"):
            layer_num = extract_layer_number(weight_info.name)
            writer.save_tensor(weight_info.name, get_tensor(layer_num, expert_num, "w1"))
        else:
            writer.save_tensor(weight_info.name, mixtral_loader.get_tensor(weight_info.name))

    writer.finalize()


num_experts = mixtral_config["num_local_experts"]

for expert_num in range(num_experts):
    dense_path = f"./dense_expert_{expert_num}"
    copy_directory(MIXTRAL_PATH, dense_path, ALLOW_LIST)

    with open(os.path.join(dense_path, "config.json"), "w") as f:
        json.dump(combined_config, f, indent=2)

    save_expert_as_dense(dense_path, expert_num)
    print(f"Dense model #{expert_num} saved to {os.path.abspath(dense_path)}")

Downloads last month: 8

Safetensors

Model size

22B params

Tensor type

BF16

Collection including thomasgauthier/Unmixtraled-22B-v0.1-expert-2

Unmixtraled experts

Collection

This collections contains all 8 experts of Mixtral 8x22B converted to single dense 22B models. The models are intended as basis for merges or finetune • 9 items • Updated Apr 11, 2024 • 1